Site Reliability Engineer (Canada)

Jobright.ai

Canada

Accepting Applications Full-time On-site
Posted 5 days, 8 hours ago 2 views 0 applications
Job Description
Jobright is a next\-generation AI job search platform built to make career navigation faster, smarter, and more personal. They are looking for a Site Reliability Engineer to keep the systems behind our AI agents fast, resilient, and ready to scale as millions of job seekers depend on them every day. **Why Join Us** * Own the infrastructure that keeps real\-time AI agents running reliably for users making important career decisions * Tackle problems unique to LLM\-powered systems, from inference latency and cost optimization to handling unpredictable traffic spikes * Work with engineers who treat reliability as a product feature, not a clean\-up job that happens after the fact * Join a team where automation, observability, and thoughtful on\-call practices are first\-class investments **Responsibilities** * Design, build, and maintain the cloud infrastructure that powers Jobright's AI agents, APIs, and user\-facing services * Improve system observability through metrics, logging, and tracing, making it easier for the whole team to understand what's happening in production * Partner with product and engineering teammates to harden new features before launch, owning capacity planning, performance testing, and rollout strategies * Lead incident response when things go wrong, run blameless post\-mortems, and turn each incident into durable improvements in reliability and tooling **Qualifications** **Required** * Early to mid\-career engineer with 1 to 3 years of experience in site reliability, DevOps, platform, or backend engineering * Strong communicator who can break down complex infrastructure tradeoffs for engineers, product partners, and leadership alike * Solid grounding in cloud platforms, containerization, CI/CD pipelines, and the fundamentals of distributed systems **Preferred** * Prior experience supporting production AI/ML workloads or high\-throughput API services at a tech or AI\-focused organization * Demonstrated comfort operating in fast\-moving environments where on\-call coverage, incident response, and infrastructure changes happen in parallel * Hands\-on skills in AWS or GCP, Kubernetes, Terraform, monitoring stacks like Datadog or Prometheus, and scripting in Python or Go
Login to Apply

Don't have an account? Register

About Company
Jobright.ai
View All Jobs
Share this job