Machine Learning Engineer

British Indian Ocean Territory

Accepting Applications Full-time On-site

Posted 2 hours, 15 minutes ago 0 views 0 applications

Job Description

**Job Profile: LLM Observability Engineer** **Location – Bhubaneshwar** **Job Description:** We are looking for a skilled **LLM Observability Engineer** to join our team and ensure optimal performance, reliability, and cost\-efficiency of our large language model (LLM) applications. You will be instrumental in designing performance tests, implementing observability practices, and providing insights to enhance model quality and system robustness. **Key Responsibilities:** **Must Have** * Design and Execute Performance Tests: Develop and implement comprehensive test plans and scripts to evaluate the performance, scalability, and stability of LLM applications under various loads. * Implement LLM Observability: Instrument LLM applications to capture rich telemetry data, including prompts, responses, token usage, latency, and error information, using specialized tools and frameworks like **Datadog** , **LangChain** , or OpenTelemetry. * Monitor and Analyze Metrics: Track key performance indicators (KPIs) such as response time, throughput, cost per query, accuracy, and resource utilization using real\-time dashboards and monitoring systems. * Identify and Mitigate Bottlenecks: Analyze performance test results and production data to pinpoint performance bottlenecks, errors, and potential issues (e.g., high latency in RAG pipelines) and collaborate with development teams on optimization. * Conduct Automated Evaluations: Implement automated quality checks and evaluations (e.g., hallucination detection, toxicity classifiers, relevance scoring) to continuously assess model output quality. * Strong knowledge of performance testing methodologies and load testing tools such as **JMeter** , LoadRunner, or Gatling. Familiarity with the unique challenges of LLMs, including non\-determinism, hallucinations, and prompt sensitivity. * Experience with LLM observability platforms and tools (e.g., Datadog LLM Observability, Arize AI, **Langfuse** ) is highly desirable. Proficiency in programming/scripting languages (e.g., **Python** , Java). **Nice to have** * Troubleshoot Production Issues: Utilize tracing and logging data to quickly diagnose the root cause of issues in complex LLM workflows and agentic applications. * Ensure Security and Compliance: Monitor model behavior for potential security risks, such as prompt injections or sensitive data leaks, and ensure compliance with data protection regulations. * Optimize Costs: Track and manage token usage and computational resource consumption to ensure cost\-effectiveness and alert teams to potential budget overruns. * Collaborate and Report: Work closely with data scientists, ML engineers, and QA teams to provide actionable insights and recommendations for model fine\-tuning and system architecture improvements. * Excellent analytical problem\-solving and communication skills. **Experience Required:** * 4 positions (2\-8 years’ experience)

More jobs from PwC India

Machine Learning Engineer

Job Description

More jobs from PwC India

Site Reliability Engineer

DevOps Engineer

Associate-CA fresher-TC

About Company

PwC India

Share this job