Machine Learning Engineer

Valiance Solutions

British Indian Ocean Territory

Accepting Applications Full-time On-site
Posted 4 days, 3 hours ago 1 views 0 applications
Job Description
**About Valiance** Valiance is a deeptech AI company building sovereign and mission\-critical AI solutions for enterprises, public sector, and government institutions. From predictive maintenance and demand planning to sovereign AI for citizen services, we design systems that thrive in high\-stakes environments. Recognized with the NASSCOM AI Game Changers Award and the Aegis Graham Bell Award, and a certified Google Cloud Partner, our 200\+ engineers and data scientists are shaping the future of industries and societies through responsible AI. **The Role** We are looking for a senior LLMOps Engineer who has taken LLM inference optimization from idea to production — not just proof of concept. You will own the end\-to\-end efficiency of our LLM inference infrastructure running on H200 GPUs, driving down cost and latency while maintaining the reliability our enterprise and government clients demand. This is a high\-ownership, high\-impact role on a team building some of India's most consequential AI systems. **What You Will Do** * Design and operate production\-grade LLM inference pipelines on H200 GPU clusters, optimizing for throughput, latency, and cost per token. * Evaluate and deploy small\-to\-medium open\-source LLMs (e.g., Mistral, Llama, Phi, Gemma) as cost\-efficient alternatives to large models without sacrificing output quality. * Tune and manage vLLM deployments — including continuous batching, paged attention, tensor parallelism, and quantization (GPTQ, AWQ, FP8\) — in production environments. * Build and maintain model\-serving APIs with robust observability: latency percentiles, GPU utilization, queue depths, and cost\-per\-request dashboards. * Architect Kubernetes\-based autoscaling strategies for inference workloads, balancing cold\-start penalties against cost at scale. * Run structured A/B experiments comparing model variants, quantization levels, and batching strategies using production traffic — not synthetic benchmarks. * Collaborate with applied ML engineers and solution architects to identify latency and cost bottlenecks across the model serving stack. * Establish and enforce SLOs for inference reliability, and build alerting and runbooks for production incidents. **What We Are Looking For** **Non\-Negotiables** * 3\+ years of hands\-on experience operating LLM inference in production — demonstrable cost and latency improvements, not POC results. * Deep expertise with vLLM in production: batching strategies, memory management, quantization tradeoffs. * Strong Python engineering skills — clean, testable, production\-ready code. * Proficiency with Docker and Kubernetes for deploying and scaling GPU inference workloads. * Experience building and maintaining REST/gRPC APIs for model serving at scale. * Hands\-on experience with open\-source LLMs and the ability to evaluate model\-quality vs. cost tradeoffs for real use cases. **Strong Advantages** * Experience with GPU memory profiling and optimization (CUDA\-level awareness a plus). * Familiarity with model distillation, speculative decoding, or flash attention implementations. * Exposure to multi\-GPU and multi\-node inference setups. * Experience with inference frameworks beyond vLLM: TGI, TensorRT\-LLM, Triton Inference Server. * Familiarity with sovereign AI or air\-gapped deployment constraints. **Why Valiance** * You will work on AI systems that are actually deployed at scale — used by government institutions and large enterprises, not just demoed. * Direct access to H200 infrastructure with meaningful compute budgets — no GPU rationing. * A culture that rewards engineering depth and production ownership over slide decks. * Competitive compensation with performance\-linked incentives. * Opportunity to define how Valiance builds its AI platform as we scale. **How to Apply** Upload your resume and a brief note on a specific inference optimization you shipped in production — the problem, your approach, and the measurable outcome. We do not conduct screening rounds for this role. Shortlisted candidates will move directly to a technical discussion with our engineering leadership.
Login to Apply

Don't have an account? Register

About Company
Valiance Solutions
View All Jobs
Share this job