AI/ML (0-3 years)

Meditab India

British Indian Ocean Territory

Accepting Applications Full-time On-site
Posted 3 weeks, 4 days ago 4 views 0 applications
Job Description
We are looking for candidates who have 0–3 years of experience in an AI/ML role. **Core Responsibilities:** * Multimodal Solution Design: Architect end\-to\-end AI systems that integrate Speech\-to\-Text (STT), Text\-to\-Speech (TTS), and Computer Vision (OCR/Object Detection). * Agentic Workflow Development: Design and implement LLM\-based agents capable of tool\-use, reasoning, and multi\-turn interaction for IVR and automated voice bots. * Vision \& Document Intelligence: Develop high\-accuracy OCR pipelines and vision models to extract structured data from complex, unstructured documents or real\-world imagery. * Audio Pipeline Optimization: Build and tune low\-latency audio processing pipelines specifically for real\-time IVR and voice\-bot responsiveness (VAD, Echo Cancellation, Diarization). * Model Fine\-Tuning \& RAG: Implement Retrieval\-Augmented Generation (RAG) and fine\-tune foundation models (LLMs, Whisper, Florence\-2\) for domain\-specific accuracy. * Performance Engineering: Optimize model inference for production (quantization, pruning) to meet the strict latency requirements of live voice and video streams. **Technical Requirements:** **1\. Generative AI \& NLP** * LLM Frameworks: Expertise in LangChain, LlamaIndex, or CrewAI for building complex agentic workflows. * Prompt Engineering: Mastery of advanced prompting techniques (Chain\-of\-Thought, ReAct) and evaluation frameworks (Ragas, TruLens). * Vector Databases: Proficiency with Pinecone, Milvus, or Weaviate for efficient context retrieval. **2\. Audio \& Conversational AI (IVR/Bots)** * Speech Tech: Experience with OpenAI Whisper, ElevenLabs, or Deepgram. * Telephony Integration: Knowledge of Twilio, Asterisk, or Vapi for deploying voice bots into IVR systems. * Signal Processing: Familiarity with WebRTC and real\-time streaming protocols. **3\. Computer Vision \& OCR** * OCR Engines: Advanced use of Tesseract, PaddleOCR, or cloud\-native vision APIs (Azure Document Intelligence, AWS Textract). * Visual LLMs: Experience with multimodal models like GPT\-4o, Claude 3\.5 Sonnet, or LLaVA for "Chat\-with\-Image" capabilities. **4\. Engineering \& Infrastructure** * Deep Learning: Advanced PyTorch or TensorFlow; experience with Hugging Face transformers and diffusers libraries. * Deployment: Containerization (Docker/K8s) and GPU optimization (NVIDIA Triton, vLLM, or TensorRT). * API Design: Building robust FastAPI/GRPC endpoints for high\-throughput multimodal data. **Soft Skills:** * Analytical Mindset: Ability to debug non\-deterministic AI behaviors (hallucinations in LLMs or "drift" in vision models). Domain Agility: Quick * Domain Agility: Quick pivot between audio sampling rates, pixel normalization, and token limits. Communication: Translating complex "black\-box" model behaviors into actionable business insights for stakeholders. * Communication: Translating complex "black\-box" model behaviors into actionable business insights for stakeholders.
Login to Apply

Don't have an account? Register

About Company
Meditab India
View All Jobs
Share this job