Accepting Applications
Full-time
Hybrid
LinkedIn
Posted 3 weeks, 3 days ago
6 views
0 applications
Job Description
Title: Site Reliability Engineer (SRE)/DevOps Engineer – AWS Serverless
Location: Toronto GTA, Hybrid
Experience: 8+ years
JD Below -
Resiliency \& Operational Excellence — AWS Serverless | Dynatrace
"The client is looking for candidates with stronger development experience along with AWS expertise."
Reliability, resiliency, and operational excellence for mission‑critical AWS serverless platforms, ensuring high availability, low MTTR, and strong production governance using Dynatrace‑driven observability.
- Resiliency strategy for serverless architectures (Lambda, API Gateway, async/event‑driven systems)
- SLOs / SLIs / Error Budgets for critical API’s
- Incident analysis and post‑incident reviews
- Dynatrace observability: dashboards, alert tuning, dependency mapping, RCA acceleration
- Operational excellence improvements: incident reduction, MTTR improvement, toil automation
- Reliability guardrails embedded into CI/CD and production readiness reviews
Core Responsibilities
- Design \& enforce resiliency patterns: timeouts, retries, circuit breakers, throttling, graceful degradation
- Lead major incidents and drive actionable RCAs with sustained fixes
- Build signal‑driven alerts aligned to SLOs (noise reduction focus)
- Enable automation \& self‑healing where feasible
Required Experience
- 5-6+ years in SRE/DevOps/Production Engineering
- Deep hands‑on with AWS serverless (Lambda, API Gateway, SQS/SNS, DynamoDB/RDS)
- Strong expertise in Dynatrace for serverless monitoring \& triage
- Proven success improving availability, MTTR, and incident trends
- Solid coding/scripting (Python / Java / Node.js)