Job Description
Job Description
Job Title: Python(Machine learning) Quality Assurance Lead
Job Type: Contract
Location: Remote
About This Role In this hourly, remote contractor role, you will work as a Python(Machine learning) Quality Assurance Lead to oversee quality, consistency, and trainer performance across Python machine learning AI training projects. You will review AI-generated Python code, ML workflows, model explanations, and trainer/QA work; evaluate output quality against project guidelines; provide precise written feedback; and ensure contributors follow expected quality standards. You will assess work for code correctness, machine learning methodology, statistical validity, reproducibility, model-evaluation quality, data leakage risks, package usage, debugging accuracy, readability, maintainability, formatting, instruction-following, and adherence to project-specific rubrics. This role requires strong Python and ML expertise, English communication skills, excellent attention to detail, and the ability to manage quality workflows across remote technical teams. This role is a fast-growing AI Data Services company delivering training data for many of the world’s largest AI companies and foundation-model labs. Your Python ML quality leadership will help ensure training data is accurate, executable, statistically sound, reproducible, clearly explained, and aligned with client expectations. Selection process involves an AI interview, a domain-specific task, and an interview with a recruiter. Important: There is no immediate project for this role; however, if qualified, you will be among the first experts we reach out to when relevant opportunities arise. This will also provide you with access to future projects available through our expert network.
Your Profile
- Bachelor’s, Master’s, or PhD degree in Computer Science, Machine Learning, Data Science, Statistics, Mathematics, Engineering, or a closely related quantitative field.
- Strong grasp of English to follow guidelines, communicate with teams, and provide clear technical feedback.
- 3+ years of professional experience in Python development, machine learning, data science, ML engineering, model evaluation, research engineering, technical review, or ML education.
- Strong understanding of Python fundamentals such as data structures, functions, classes, iterators, comprehensions, exception handling, virtual environments, package management, testing, and debugging.
- Strong understanding of ML topics such as supervised/unsupervised learning, feature engineering, train/test splits, cross-validation, model selection, data leakage, regression, classification, clustering, metrics, bias/variance, regularization, and reproducibility.
- Ability to evaluate ML content against detailed rubrics and identify issues such as flawed methodology, wrong metrics, data leakage, non-reproducible code, invalid assumptions, hallucinated APIs, misleading conclusions, or incomplete explanations.
- Familiarity with NumPy, pandas, scikit-learn, PyTorch, TensorFlow/Keras, XGBoost/LightGBM, Jupyter, matplotlib, seaborn, MLflow, Hugging Face, SQL, GitHub, Docker, and CI/CD is preferred.
- Experience leading or supporting remote teams of trainers, annotators, reviewers, engineers, data scientists, ML researchers, coding mentors, or QAs is strongly preferred.
- Comfortable working in fast-moving remote environments using Discord, Google Sheets, Google Docs, trackers, dashboards, GitHub, and project management systems.
- Highly organized and able to maintain style guides, trackers, FAQs, onboarding materials, honeypots, calibration tasks, and quality documentation.
- Experience with AI training, data annotation, LLM evaluation, code QA, ML QA, or rubric-based technical review is a strong plus.
Key Responsibilities
- Quality monitoring: Spot-check Python ML items, identify quality issues, provide feedback through DMs, and escalate recurring or critical issues.
- Code and ML review: Evaluate AI-generated Python code, ML pipelines, data-preprocessing steps, model training workflows, evaluation logic, debugging responses, and explanations for correctness and reproducibility.
- Trainer and QA communication: Update contributors on Discord about guideline changes, workflow updates, and Python/ML-specific review standards.
- Question handling: Respond to questions around Python syntax, package usage, data leakage, model validation, metrics, statistical assumptions, reproducibility, notebooks, and rubric interpretation.
- Trainer/QA activation management: DM inactive contributors, encourage activation, track follow-ups, and flag availability issues.
- Documentation: Create and maintain Python ML style guides, trackers, FAQs, examples, honeypots, calibration tasks, and onboarding materials.
- Onboarding and training: Run onboarding/training calls for Python ML contributors.
- Risk review: Flag misleading, overconfident, statistically invalid, non-reproducible, insecure, or non-production-ready Python ML recommendations.
- Process improvement: Identify recurring quality gaps and build scalable QA processes.