Accepting Applications
Full-time
On-site
Posted 3 weeks, 4 days ago
4 views
0 applications
Job Description
We are looking for candidates who have 0–3 years of experience in an AI/ML role.
**Core Responsibilities:**
* Multimodal Solution Design: Architect end\-to\-end AI systems that integrate Speech\-to\-Text (STT), Text\-to\-Speech (TTS), and Computer Vision (OCR/Object Detection).
* Agentic Workflow Development: Design and implement LLM\-based agents capable of tool\-use, reasoning, and multi\-turn interaction for IVR and automated voice bots.
* Vision \& Document Intelligence: Develop high\-accuracy OCR pipelines and vision models to extract structured data from complex, unstructured documents or real\-world imagery.
* Audio Pipeline Optimization: Build and tune low\-latency audio processing pipelines specifically for real\-time IVR and voice\-bot responsiveness (VAD, Echo Cancellation, Diarization).
* Model Fine\-Tuning \& RAG: Implement Retrieval\-Augmented Generation (RAG) and fine\-tune foundation models (LLMs, Whisper, Florence\-2\) for domain\-specific accuracy.
* Performance Engineering: Optimize model inference for production (quantization, pruning) to meet the strict latency requirements of live voice and video streams.
**Technical Requirements:**
**1\. Generative AI \& NLP**
* LLM Frameworks: Expertise in LangChain, LlamaIndex, or CrewAI for building complex agentic workflows.
* Prompt Engineering: Mastery of advanced prompting techniques (Chain\-of\-Thought, ReAct) and evaluation frameworks (Ragas, TruLens).
* Vector Databases: Proficiency with Pinecone, Milvus, or Weaviate for efficient context retrieval.
**2\. Audio \& Conversational AI (IVR/Bots)**
* Speech Tech: Experience with OpenAI Whisper, ElevenLabs, or Deepgram.
* Telephony Integration: Knowledge of Twilio, Asterisk, or Vapi for deploying voice bots into IVR systems.
* Signal Processing: Familiarity with WebRTC and real\-time streaming protocols.
**3\. Computer Vision \& OCR**
* OCR Engines: Advanced use of Tesseract, PaddleOCR, or cloud\-native vision APIs (Azure Document Intelligence, AWS Textract).
* Visual LLMs: Experience with multimodal models like GPT\-4o, Claude 3\.5 Sonnet, or LLaVA for "Chat\-with\-Image" capabilities.
**4\. Engineering \& Infrastructure**
* Deep Learning: Advanced PyTorch or TensorFlow; experience with Hugging Face transformers and diffusers libraries.
* Deployment: Containerization (Docker/K8s) and GPU optimization (NVIDIA Triton, vLLM, or TensorRT).
* API Design: Building robust FastAPI/GRPC endpoints for high\-throughput multimodal data.
**Soft Skills:**
* Analytical Mindset: Ability to debug non\-deterministic AI behaviors (hallucinations in LLMs or "drift" in vision models). Domain Agility: Quick
* Domain Agility: Quick pivot between audio sampling rates, pixel normalization, and token limits. Communication: Translating complex "black\-box" model behaviors into actionable business insights for stakeholders.
* Communication: Translating complex "black\-box" model behaviors into actionable business insights for stakeholders.
Login to Apply
Don't have an account? Register