A compelling remote contract role is open for seasoned researchers and technical experts ready to work at the sharp edge of AI evaluation.
This opportunity is centered on a frontier-model evaluation project focused on agentic workflows, where your expertise will help stress-test advanced STEM models by exposing reasoning flaws, execution blind spots, and problem-solving limitations in real-world scenarios.
If your background spans data science, machine learning, finance, or coding, this role offers a rare chance to contribute directly to the advancement of next-generation AI systems.
Job Overview
- Location: Nigeria
- Salary: $70 – $120 per hour
- Job Type: Contract
- Experience Level: 1 – 25 years
- Work Setup: Fully Remote
- Schedule: Flexible / Self-paced
About the Role
This contract position is built for high-caliber professionals who can create demanding, practical benchmark tasks that challenge sophisticated AI systems operating in STEM-heavy environments.
Rather than routine annotation or surface-level review, this role involves architecting complex evaluation scenarios that mirror authentic professional workflows. The goal is to uncover how advanced models and AI agents behave when confronted with nuanced, multi-step tasks requiring judgment, precision, and technical rigor.
You will help shape evaluation frameworks that reveal where these systems excel — and where they quietly break.
What You’ll Be Doing
As part of this frontier-model evaluation initiative, your responsibilities will include:
- Designing challenging, real-world STEM problems that test deep reasoning and applied problem-solving
- Building benchmark tasks across domains such as data science, machine learning, finance, and software development
- Implementing each task inside an agentic development environment using Python
- Creating executable validations and test structures to assess model performance accurately
- Reviewing model or agent behavior to identify reasoning gaps, failure patterns, and execution weaknesses
- Helping define robust evaluation standards for advanced AI systems operating in technical domains
Contract & Payment Terms
Here’s what candidates should know about the engagement structure:
- You will be hired as an independent contractor
- This is a fully remote role, allowing you to work from anywhere in Nigeria
- Work can be completed on your own schedule, offering strong flexibility
- Projects may be extended, shortened, or ended early depending on performance and business needs
- Payments are made weekly
- Payment methods include Stripe or Wise, based on services rendered
Who This Role Suits Best
This opportunity is especially attractive for professionals with solid hands-on experience in:
- Data Science
- Machine Learning
- Quantitative or Technical Finance
- Software Engineering / Coding
- AI Evaluation or Benchmarking
- Research-driven technical problem design
If you enjoy building rigorous tasks, thinking adversarially about model weaknesses, and working on the frontier of AI capability testing, this contract could be an exceptional fit.
How to Apply
if you are interested in this Remote job apply now
Click here to APPLY NOW
Why This Opportunity Stands Out
This is not a standard freelance coding assignment.
It places you in a high-impact position where your technical judgment directly influences how advanced AI systems are tested, measured, and improved. The work is intellectually demanding, highly flexible, and closely tied to the evolving future of intelligent agents.
For experts who prefer meaningful technical work over repetitive gig tasks, this role offers a sharper, more consequential lane.
Join Our Job Update Communities
Get fast job alerts, remote opportunities & visa updates instantly.