Accepting Applications
Full-time
On-site
Posted 1 hour, 14 minutes ago
0 views
0 applications
Job Description
**About the Role**
We’re hiring a
**Machine Learning Engineer \- OCR Agentic Systems**
**with six\+ years experience**
to build and own a next\-generation multi\-agent document processing platform.
This is not a basic OCR integration role. You’ll design and develop a multi\-layer AI\-driven pipeline that processes complex business documents (POs, invoices, RFQs, shipping notices), extracts structured data, validates it against dynamic schemas, and delivers high\-accuracy outputs to downstream systems.
A key focus is building intelligent self\-healing systems—especially a Recovery Agent that reprocesses only low\-confidence fields instead of sending entire documents for manual review.
About the Role
We’re hiring a Software Engineer \- OCR Agentic Systems to build and own a next\-generation multi\-agent document processing platform.
This is not a basic OCR integration role. You’ll design and develop a multi\-layer AI\-driven pipeline that processes complex business documents (POs, invoices, RFQs, shipping notices), extracts structured data, validates it against dynamic schemas, and delivers high\-accuracy outputs to downstream systems.
A key focus is building intelligent self\-healing systems—especially a Recovery Agent that reprocesses only low\-confidence fields instead of sending entire documents for manual review.
**What You’ll Build**
🔹
**Multi\-Agent OCR Pipeline**
You’ll design and implement a five\-layer agent system:
* Format Detection Agent — identifies document type/format and routes processing
* Primary Extraction Agent — orchestrates multiple OCR engines for optimal results
* Document Classification Agent — classifies document types using AI models
* Schema Matching Agent — maps extracted data to structured schemas with confidence scoring
* Recovery Agent — re\-extracts low\-confidence fields using targeted AI processing
🔹
**Beyond the Pipeline**
* MCP servers for agent\-tool communication
* Confidence scoring engine with validation logic
* Active learning feedback loops
* OpenSearch\-based state tracking and observability
* AWS\-based infrastructure (Lambda, ECS, SQS, Step Functions, Terraform)
🔹
**What You’ll Do**
* Agent Architecture \& Orchestration
* Build stateful agent workflows using LangGraph or similar
* Design orchestration with AWS Step Functions
* Implement multi\-engine OCR strategies (accuracy vs cost optimization)
* AI \& Backend EngineeringDevelop LLM\-powered systems with structured outputs and tool usage
* Build FastAPI services following clean architecture/DDD principles
* Design efficient prompts and context flows
* OCR \& Document ProcessingIntegrate and optimize OCR engines (Textract, DeepSeek\-OCR, vision models)
* Implement image preprocessing (deskewing, denoising, enhancement)
* Reliability \& ObservabilityImplement confidence\-based routing logic
* Track pipeline execution using OpenSearch
* Build monitoring dashboards and alerts
* Data \& InfrastructureDesign schemas in PostgreSQL (pgVector)
* Deploy scalable services using Docker, ECS/Fargate, Terraform
* Build robust, well\-tested APIs
🔹
**Requirements**
* Strong experience in Python and backend development
* **Six\+ years experience**
* Hands\-on experience building AI/LLM\-powered applications
* Experience with agent frameworks (LangGraph, LangChain, etc.)
* Solid understanding of OCR/document processing systems
* Experience working with AWS services (Lambda, S3, ECS, Step Functions, etc.)
* Strong API design and system design skills
* Experience with PostgreSQL and search systems (OpenSearch/Elasticsearch)
* Good testing practices, including AI system validation
* Strong English communication skills (C1 preferred)
🔹
**Nice to Have**
* Computer vision experience (OpenCV, Pillow)
* Schema matching / fuzzy matching / semantic mapping
* MCP or tool\-based LLM integrations
* Human\-in\-the\-Loop (HITL) workflows
* Experience in logistics, manufacturing, or document\-heavy domains
* LLM observability tools (LangSmith, LangFuse, Arize Phoenix)
* Experience with self\-hosted models or GPU workloads
🔹Why Join Us?
* Work on real\-world agentic AI systems (not wrappers)
* Build high\-impact systems improving accuracy from \~85% → 95%\+
* Own architecture and execution end\-to\-end
* Collaborate directly with leadership and product teams
* Solve complex, meaningful engineering challenges
Login to Apply
Don't have an account? Register