Accepting Applications
Full-time
Hybrid
Posted 1 hour, 22 minutes ago
0 views
0 applications
Job Description
**About Vumedi:**
Vumedi is the largest video education platform for doctors worldwide, dedicated to advancing medical education through innovative video\-based learning. Our mission is to empower healthcare professionals by providing them with access to the latest clinical knowledge and surgical techniques from experts around the globe. We curate a vast library of high\-quality educational content, enabling users to enhance their skills, stay informed about industry trends, and improve patient outcomes. We are headquartered in
**Oakland, CA**
, and have additional offices in Minneapolis, MN, and Zagreb, Croatia.
We're hiring a
**Senior/Staff/Principal**
**DevOps Engineer**
to lead the development of our digital platform and products at this critical stage of Vumedi's growth.
**Why join Vumedi right now?**
* **Build technology that matters in a fast\-scaling Silicon Valley digital healthcare company**
: Your work directly impacts how doctors across the world learn and make decisions that save lives.
* **Grow as we grow:**
Be part of a company in an accelerated growth phase, where expanding teams, products, and markets create real opportunities for ownership, leadership, and career progression.
* **Build with AI**
: Work on applied LLM systems \- from intelligent search to AI\-driven content agents \- and shape how AI transforms medical knowledge delivery.
* **Own your craft end\-to\-end**
: Take full responsibility for building systems that scale globally and power mission\-critical workflows.
* **Collaborate globally:**
Join a world\-class team of passionate engineers on modern tech stack which will further drive your career development.
* **Have real product impact**
: Influence the direction of product development by collaborating closely with product and leadership teams.
**About the role:**
We are looking for a DevOps Engineer to join our engineering team and take ownership of our infrastructure, deployment processes, and overall platform reliability. You will work closely with backend and data teams to support a growing video and data platform used by millions of healthcare professionals worldwide.
In this role, you will focus on improving our CI/CD pipelines, system reliability, and developer experience, while helping scale our cloud infrastructure in a secure and cost\-efficient way. You will work extensively with AWS services (compute, storage, networking, IAM, monitoring) and help ensure our systems are reliable, observable, and well\-architected.
You'll also support and enable emerging AI/ML and LLM\-powered systems used for large\-scale medical content processing, helping build and operate the infrastructure required for these workloads. This includes improving data pipelines, optimizing resource usage, and ensuring production\-grade reliability of AI\-driven services.
This is a high\-impact role with a broad scope—from supporting production systems and data pipelines to driving long\-term improvements in how we build, deploy, and operate our platform, with strong ownership and autonomy in shaping DevOps practices.
**What you will do:**
* Own and improve our infrastructure, CI/CD pipelines, and deployment processes across multiple environments
* Work with AWS services (compute, storage, networking, IAM, monitoring) to ensure scalable, secure, and reliable systems
* Collaborate closely with backend and data teams to support production systems, data pipelines, and overall platform reliability
* Continuously improve developer experience by streamlining workflows, reducing friction, and enabling faster, safer deployments
* Contribute to improving security practices, access control, and compliance of our infrastructure
* Automate infrastructure and workflows using Python
* Improve observability by implementing and maintaining monitoring, logging, and alerting systems
* Troubleshoot production issues, participate in incident response, and implement long\-term fixes to improve system stability
* Identify and drive improvements in performance, scalability, and cost efficiency across the platform
* Support and scale AI/ML and LLM\-based systems, ensuring reliable infrastructure for data processing and content classification workloads
**Who you are:**
* You have 5\+ years of experience in DevOps, SRE, or infrastructure engineering, with a strong focus on cloud\-native environments (preferably AWS)
* You have managed cloud infrastructure (networking, IAM, compute, storage) with a strong understanding of security best practices and cost optimization
* You have experience building and maintaining CI/CD pipelines to support rapid, reliable software delivery across multiple environments
* You are comfortable writing Python for automation, scripting, and building internal tooling to improve infrastructure and developer workflows
* You have a strong understanding of monitoring, logging, and observability (e.g., Datadog, Prometheus, CloudWatch), and proactively identifying and resolve issues
* You are comfortable debugging production issues across systems and collaborating with engineering teams to resolve them
* You are proactive, take ownership, and enjoy working in environments with high autonomy and evolving processes
* You communicate clearly and collaborate effectively with engineers, product managers, and other stakeholders
* You are curious and motivated to learn, especially in areas like AI/ML infrastructure and large\-scale systems
**Required Qualifications:**
* 5\+ years of experience in DevOps, Site Reliability Engineering, or infrastructure\-focused roles
* Proven experience designing and operating scalable, reliable, and secure cloud infrastructure (preferably AWS) in production environments
* Strong understanding of cloud security best practices (IAM, network security, secrets management), preferably within AWS
* Proficiency in Python for automation, scripting, and tooling
* Hands\-on experience building and maintaining CI/CD pipelines
* Experience with monitoring, logging, and alerting tools (e.g., Datadog, CloudWatch, Prometheus)
* Experience working in a Linux\-based environment
* Ability to drive infrastructure and DevOps strategy, balancing scalability, reliability, and cost
* Experience working cross\-functionally and influencing engineering teams on best practices and architectural decisions
* Strong ownership mindset with the ability to operate autonomously in ambiguous environments
**Preferred Qualifications:**
* Experience supporting or scaling AI/ML or LLM\-based systems in production
* You have worked with containerized applications (Docker) and are familiar with orchestration concepts (Kubernetes or ECS is a plus)
* You are familiar with Infrastructure as Code principles (e.g., Terraform) and have experience implementing Infrastructure as Code from scratch in existing environments
* You have experience working with or supporting backend systems and data platforms (e.g., Postgres, Airflow is a plus)
* Background in backend engineering or software development
* Experience working in a fast\-paced startup or scale\-up environment
* Experience leading and mentoring engineers, while contributing to team\-wide best practices
**This is a hybrid role, working 3 days a week (Monday, Wednesday, and Friday) in our Oakland office.**
Login to Apply
Don't have an account? Register