Jobright.ai

Site Reliability Engineer (Canada)

Jobright.ai

Canada

Accepting Applications Full-time On-site LinkedIn
Posted 2 weeks, 5 days ago 21 views 0 applications
Job Description
Jobright is a next\-generation AI job search platform built to make career navigation faster, smarter, and more personal. They are looking for a Site Reliability Engineer to keep the systems behind our AI agents fast, resilient, and ready to scale as millions of job seekers depend on them every day. **Why Join Us** * Own the infrastructure that keeps real\-time AI agents running reliably for users making important career decisions * Tackle problems unique to LLM\-powered systems, from inference latency and cost optimization to handling unpredictable traffic spikes * Work with engineers who treat reliability as a product feature, not a clean\-up job that happens after the fact * Join a team where automation, observability, and thoughtful on\-call practices are first\-class investments **Responsibilities** * Design, build, and maintain the cloud infrastructure that powers Jobright's AI agents, APIs, and user\-facing services * Improve system observability through metrics, logging, and tracing, making it easier for the whole team to understand what's happening in production * Partner with product and engineering teammates to harden new features before launch, owning capacity planning, performance testing, and rollout strategies * Lead incident response when things go wrong, run blameless post\-mortems, and turn each incident into durable improvements in reliability and tooling **Qualifications** **Required** * Early to mid\-career engineer with 1 to 3 years of experience in site reliability, DevOps, platform, or backend engineering * Strong communicator who can break down complex infrastructure tradeoffs for engineers, product partners, and leadership alike * Solid grounding in cloud platforms, containerization, CI/CD pipelines, and the fundamentals of distributed systems **Preferred** * Prior experience supporting production AI/ML workloads or high\-throughput API services at a tech or AI\-focused organization * Demonstrated comfort operating in fast\-moving environments where on\-call coverage, incident response, and infrastructure changes happen in parallel * Hands\-on skills in AWS or GCP, Kubernetes, Terraform, monitoring stacks like Datadog or Prometheus, and scripting in Python or Go
Max 3 MB. JPEG or PNG recommended.
About Company
Jobright.ai
Jobright.ai
View All Jobs
Share this job