Site Reliability Engineer- Terraform, Backstage
Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!
Canada
Accepting Applications
Full-time
On-site
Posted 1 week ago
1 views
0 applications
Job Description
**Years of Experience: 6\-8**
We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of platform services. The ideal candidate will bring strong expertise in SRE practices, observability, infrastructure automation, and developer platform enablement, with exposure to modern technologies including policy\-as\-code and emerging GenAI\-driven systems.
Required Skills
Strong experience in SRE practices and reliability engineering
Hands\-on expertise with:
Monitoring/logging platforms and distributed tracing
SLO/SLI frameworks and observability design
Experience in incident management and performance engineering
Strong understanding of DORA metrics and operational excellence
Proficiency in:
Terraform (Infrastructure as Code)
Policy as Code (OPA/Rego, Sentinel)
Experience with:
Developer platform tools (Backstage, service catalogs)
Golden paths and platform standardization
Key Responsibilities
Implement and manage SRE practices including:
Incident management, root cause analysis, and postmortems
Reliability engineering and performance optimization
Tracking and improving DORA metrics
Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
Build and manage monitoring, logging, and distributed tracing frameworks
Ensure platform reliability through proactive alerting, observability, and automation
Automate infrastructure and governance using:
Terraform (Infrastructure as Code)
Policy\-as\-Code tools (OPA/Rego, Sentinel)
Enhance developer experience and productivity by:
Designing self\-service platform capabilities
Managing service catalogs and platform standards
Building reusable templates and golden paths
Work with tools like Backstage to enable internal developer platforms
Collaborate with engineering teams to improve system stability, deployment reliability, and operational efficiency
Support integration and reliability considerations for GenAI\-based systems (RAG, prompt workflows, model evaluation)
Nice to Have
Exposure to GenAI platforms, RAG, and prompt engineering concepts
Experience in developer productivity measurement and platform engineering initiatives
Tools \& Methodologies
Experience with Agile methodologies (Jira, Confluence)
Familiarity with DevOps and platform engineering practices
Soft Skills
Strong problem\-solving and analytical skills
Ability to work in high\-pressure production environments
Excellent communication and cross\-team collaboration
More jobs from Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!
Login to Apply
Don't have an account? Register