Accepting Applications
Full-time
On-site
Posted 2 hours, 15 minutes ago
0 views
0 applications
Job Description
**Job Profile: LLM Observability Engineer**
**Location – Bhubaneshwar**
**Job Description:**
We are looking for a skilled
**LLM Observability Engineer**
to join our team and ensure optimal performance, reliability, and cost\-efficiency of our large language model (LLM) applications. You will be instrumental in designing performance tests, implementing observability practices, and providing insights to enhance model quality and system robustness.
**Key Responsibilities:**
**Must Have**
* Design and Execute Performance Tests: Develop and implement comprehensive test plans and scripts to evaluate the performance, scalability, and stability of LLM applications under various loads.
* Implement LLM Observability: Instrument LLM applications to capture rich telemetry data, including prompts, responses, token usage, latency, and error information, using specialized tools and frameworks like
**Datadog**
,
**LangChain**
, or OpenTelemetry.
* Monitor and Analyze Metrics: Track key performance indicators (KPIs) such as response time, throughput, cost per query, accuracy, and resource utilization using real\-time dashboards and monitoring systems.
* Identify and Mitigate Bottlenecks: Analyze performance test results and production data to pinpoint performance bottlenecks, errors, and potential issues (e.g., high latency in RAG pipelines) and collaborate with development teams on optimization.
* Conduct Automated Evaluations: Implement automated quality checks and evaluations (e.g., hallucination detection, toxicity classifiers, relevance scoring) to continuously assess model output quality.
* Strong knowledge of performance testing methodologies and load testing tools such as
**JMeter**
, LoadRunner, or Gatling. Familiarity with the unique challenges of LLMs, including non\-determinism, hallucinations, and prompt sensitivity.
* Experience with LLM observability platforms and tools (e.g., Datadog LLM Observability, Arize AI,
**Langfuse**
) is highly desirable. Proficiency in programming/scripting languages (e.g.,
**Python**
, Java).
**Nice to have**
* Troubleshoot Production Issues: Utilize tracing and logging data to quickly diagnose the root cause of issues in complex LLM workflows and agentic applications.
* Ensure Security and Compliance: Monitor model behavior for potential security risks, such as prompt injections or sensitive data leaks, and ensure compliance with data protection regulations.
* Optimize Costs: Track and manage token usage and computational resource consumption to ensure cost\-effectiveness and alert teams to potential budget overruns.
* Collaborate and Report: Work closely with data scientists, ML engineers, and QA teams to provide actionable insights and recommendations for model fine\-tuning and system architecture improvements.
* Excellent analytical problem\-solving and communication skills.
**Experience Required:**
* 4 positions (2\-8 years’ experience)
Login to Apply
Don't have an account? Register