AI Evaluation Engineer (Data Analysis & Multi-Agent Systems)

Gramian Consulting

Pakistan

Accepting Applications Full-time On-site
Posted 1 hour, 49 minutes ago 0 views 0 applications
Job Description
**About Us** Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high\-performing teams by matching them with professionals who truly fit their needs. **Role Overview** We are looking for an **AI Evaluation Engineer specialized in data analysis** to design benchmark tasks that simulate real\-world analytical workflows. You will create scenarios where AI systems must analyze **large, messy, multi\-source datasets** , decompose tasks across multiple agents, and produce clear, verifiable conclusions. **Commitments Required: 8 hours per day with an overlap of 4 hours with PST.** **Employment type: Contractor assignment (no medical/paid leave)** **Duration of contract: 4 weeks\+** **Location:** **Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam** **Interview: take home assessment (60min)** **Responsibilities** * Design and develop multi\-agent benchmark tasks focused on complex data analysis workflows * Create or curate realistic datasets (CSV, JSON, logs, reports, financial or operational data) * Build tasks requiring: + Cross\-referencing across multiple data sources + Anomaly detection and contradiction identification + Statistical analysis and interpretation * Define task decomposition strategies across specialized sub\-agents (e.g., financial, technical, operational analysis) * Develop verification logic to validate precise analytical outputs (not generic summaries) * Implement evaluation pipelines using Python and SQL * Create reproducible environments using Docker * Analyze task performance and refine for clarity, difficulty, and scoring accuracy **Requirements** * 5\+ years of experience in data analysis or analytics\-heavy roles * Strong proficiency in Python (pandas, NumPy) and SQL * Experience working with real\-world, messy datasets (CSV, JSON, logs, reports) * Ability to design analytical problems with clear, verifiable answers * Solid understanding of statistics (distributions, correlations, outliers) * Familiarity with AI benchmarks or evaluation environments (e.g., SWE\-bench or similar) * Hands\-on experience with Docker (Dockerfiles, image builds, debugging) ****Nice to Have**** * Experience in financial analysis, operations analytics, or risk analysis * Exposure to data pipelines or ETL workflows * Experience with data quality validation or anomaly detection systems * Familiarity with AI/ML data workflows or evaluation frameworks
Login to Apply

Don't have an account? Register

About Company
Gramian Consulting
View All Jobs
Share this job