Accepting Applications
Full-time
On-site
Posted 2 weeks, 1 day ago
2 views
0 applications
Job Description
ML Engineer — Video \& Audio → Text Event Detection
**Location:**
Remote
**Level:**
Mid to Senior
**Reports to:**
Engineering Lead / Head of ML
We are open to hiring only candidates who are in Oakistan for this role.
About the Company
We are an early\-stage company building machine learning–powered visibility for time\-sensitive, high\-stakes environments. Our platform leverages video and audio from fixed cameras to detect structured workflow events, enabling real\-time coordination and insights—while maintaining a strong focus on privacy and compliance.
⚠️ Important Requirement (Please Read Before Applying)
We are specifically looking for professionals who can
**demonstrate and present their work**
.
All candidates must be able to:
* Showcase
**real\-world projects or systems**
they have built or contributed to
* Clearly explain
**their role, decisions, and impact**
* **Demonstrate high professional standards**
in their current or previous positions
Applications without demonstrable work or the ability to present it will not be considered.
Role Summary
As an ML Engineer, you will design, build, and improve systems that detect structured events from video and audio streams in controlled environments. You will work across computer vision, speech\-to\-text pipelines, and multimodal ML systems.
Key Responsibilities
**Event Detection Pipeline**
* Build and optimize object detection systems (e.g., YOLO\-based models)
* Develop temporal models (e.g., transformer\-based) for event classification
* Optimize inference for edge (e.g., Jetson) and cloud environments
**Audio\-Based Event Detection**
* Implement speech\-to\-text pipelines (e.g., Whisper)
* Detect protocol or safety\-related events using keyword/phrase recognition
* Ensure anonymization and timestamp accuracy for downstream use
**Multimodal Fusion**
* Combine video and audio signals for improved detection accuracy
* Define fusion strategies and confidence calibration
**Training \& Evaluation**
* Design annotation strategies and leverage active learning
* Define and track key metrics (accuracy, F1, false positives, temporal precision)
**Model Lifecycle**
* Manage model versioning, training, and deployment
* Support A/B testing, monitoring, and rollback strategies
**Documentation**
* Maintain clear documentation for models, experiments, and design decisions
Required Qualifications
* Bachelor’s degree (or equivalent experience) in a relevant technical field
* 3\+ years of hands\-on experience in at least two of the following:
* Computer vision (object detection, tracking, activity recognition)
* Speech recognition or NLP for event detection
* Multimodal ML systems
* Strong Python skills and experience with PyTorch (or similar frameworks)
* Experience with inference optimization (TensorRT, ONNX, quantization)
* Experience building and evaluating ML training pipelines
* Ability to work from structured requirements and iterate with stakeholders
* Strong communication skills in a collaborative, remote environment
Preferred Qualifications
* Experience in healthcare or other high\-stakes, real\-time systems
* Familiarity with edge deployment (e.g., NVIDIA Jetson) and/or cloud ML (e.g., AWS)
* Experience with privacy\-aware ML and data handling
* Knowledge of multi\-object tracking (e.g., ByteTrack, BoT\-SORT)
* Experience with Whisper\-based pipelines and voice activity detection
Nice to Have
* Exposure to clinical or regulated environments
* Experience with structured workflows and event sequencing
* Interest in explainability and confidence calibration
* Experience working in distributed, remote teams
What We Offer
* Fully remote, globally distributed team
* Opportunity to own and shape a core ML pipeline
* Work on meaningful, real\-world ML applications in high\-impact environments
* Collaborative and fast\-moving engineering culture
* Competitive compensation, benefits, and equity (based on experience)
-
Login to Apply
Don't have an account? Register