DevOps Engineer

Vumedi

United States

Accepting Applications Full-time Hybrid
Posted 1 hour, 22 minutes ago 0 views 0 applications
Job Description
**About Vumedi:** Vumedi is the largest video education platform for doctors worldwide, dedicated to advancing medical education through innovative video\-based learning. Our mission is to empower healthcare professionals by providing them with access to the latest clinical knowledge and surgical techniques from experts around the globe. We curate a vast library of high\-quality educational content, enabling users to enhance their skills, stay informed about industry trends, and improve patient outcomes. We are headquartered in **Oakland, CA** , and have additional offices in Minneapolis, MN, and Zagreb, Croatia. We're hiring a **Senior/Staff/Principal** **DevOps Engineer** to lead the development of our digital platform and products at this critical stage of Vumedi's growth. **Why join Vumedi right now?** * **Build technology that matters in a fast\-scaling Silicon Valley digital healthcare company** : Your work directly impacts how doctors across the world learn and make decisions that save lives. * **Grow as we grow:** Be part of a company in an accelerated growth phase, where expanding teams, products, and markets create real opportunities for ownership, leadership, and career progression. * **Build with AI** : Work on applied LLM systems \- from intelligent search to AI\-driven content agents \- and shape how AI transforms medical knowledge delivery. * **Own your craft end\-to\-end** : Take full responsibility for building systems that scale globally and power mission\-critical workflows. * **Collaborate globally:** Join a world\-class team of passionate engineers on modern tech stack which will further drive your career development. * **Have real product impact** : Influence the direction of product development by collaborating closely with product and leadership teams. **About the role:** We are looking for a DevOps Engineer to join our engineering team and take ownership of our infrastructure, deployment processes, and overall platform reliability. You will work closely with backend and data teams to support a growing video and data platform used by millions of healthcare professionals worldwide. In this role, you will focus on improving our CI/CD pipelines, system reliability, and developer experience, while helping scale our cloud infrastructure in a secure and cost\-efficient way. You will work extensively with AWS services (compute, storage, networking, IAM, monitoring) and help ensure our systems are reliable, observable, and well\-architected. You'll also support and enable emerging AI/ML and LLM\-powered systems used for large\-scale medical content processing, helping build and operate the infrastructure required for these workloads. This includes improving data pipelines, optimizing resource usage, and ensuring production\-grade reliability of AI\-driven services. This is a high\-impact role with a broad scope—from supporting production systems and data pipelines to driving long\-term improvements in how we build, deploy, and operate our platform, with strong ownership and autonomy in shaping DevOps practices. **What you will do:** * Own and improve our infrastructure, CI/CD pipelines, and deployment processes across multiple environments * Work with AWS services (compute, storage, networking, IAM, monitoring) to ensure scalable, secure, and reliable systems * Collaborate closely with backend and data teams to support production systems, data pipelines, and overall platform reliability * Continuously improve developer experience by streamlining workflows, reducing friction, and enabling faster, safer deployments * Contribute to improving security practices, access control, and compliance of our infrastructure * Automate infrastructure and workflows using Python * Improve observability by implementing and maintaining monitoring, logging, and alerting systems * Troubleshoot production issues, participate in incident response, and implement long\-term fixes to improve system stability * Identify and drive improvements in performance, scalability, and cost efficiency across the platform * Support and scale AI/ML and LLM\-based systems, ensuring reliable infrastructure for data processing and content classification workloads **Who you are:** * You have 5\+ years of experience in DevOps, SRE, or infrastructure engineering, with a strong focus on cloud\-native environments (preferably AWS) * You have managed cloud infrastructure (networking, IAM, compute, storage) with a strong understanding of security best practices and cost optimization * You have experience building and maintaining CI/CD pipelines to support rapid, reliable software delivery across multiple environments * You are comfortable writing Python for automation, scripting, and building internal tooling to improve infrastructure and developer workflows * You have a strong understanding of monitoring, logging, and observability (e.g., Datadog, Prometheus, CloudWatch), and proactively identifying and resolve issues * You are comfortable debugging production issues across systems and collaborating with engineering teams to resolve them * You are proactive, take ownership, and enjoy working in environments with high autonomy and evolving processes * You communicate clearly and collaborate effectively with engineers, product managers, and other stakeholders * You are curious and motivated to learn, especially in areas like AI/ML infrastructure and large\-scale systems **Required Qualifications:** * 5\+ years of experience in DevOps, Site Reliability Engineering, or infrastructure\-focused roles * Proven experience designing and operating scalable, reliable, and secure cloud infrastructure (preferably AWS) in production environments * Strong understanding of cloud security best practices (IAM, network security, secrets management), preferably within AWS * Proficiency in Python for automation, scripting, and tooling * Hands\-on experience building and maintaining CI/CD pipelines * Experience with monitoring, logging, and alerting tools (e.g., Datadog, CloudWatch, Prometheus) * Experience working in a Linux\-based environment * Ability to drive infrastructure and DevOps strategy, balancing scalability, reliability, and cost * Experience working cross\-functionally and influencing engineering teams on best practices and architectural decisions * Strong ownership mindset with the ability to operate autonomously in ambiguous environments **Preferred Qualifications:** * Experience supporting or scaling AI/ML or LLM\-based systems in production * You have worked with containerized applications (Docker) and are familiar with orchestration concepts (Kubernetes or ECS is a plus) * You are familiar with Infrastructure as Code principles (e.g., Terraform) and have experience implementing Infrastructure as Code from scratch in existing environments * You have experience working with or supporting backend systems and data platforms (e.g., Postgres, Airflow is a plus) * Background in backend engineering or software development * Experience working in a fast\-paced startup or scale\-up environment * Experience leading and mentoring engineers, while contributing to team\-wide best practices **This is a hybrid role, working 3 days a week (Monday, Wednesday, and Friday) in our Oakland office.**
Login to Apply

Don't have an account? Register

About Company
Share this job