Datacurve

Frontier coding data for training and evaluating LLMs

ActiveY Combinator

Website LinkedIn X

Updated: Feb 17, 2026 ·

About

We generate expert quality coding data at scale for fine-tuning LLMs

Recent News & Mentions

Oct 1, 2025FundingDatacurve - Series A Mar 1, 2025FundingDatacurve - Seed Feb 1, 2024FundingDatacurve - Seed

Financial History

Datacurve has raised $18.5M across 3 funding rounds. Most recently, it raised $15.0M Series A in October 2025.

Total Raised

$18.5M

Valuation

N/A

Funding Rounds Raised

Date	Round	Other Investors
Oct 1, 2025	$15.0M Series A	Afore Capital, Chemistry VC, SignalFire, Wing Venture Capital
Mar 1, 2025	$3.0M Seed	Nir Eyal, Tom Blomfield
Feb 1, 2024	$500K Seed	Afore Capital, Andreessen Horowitz, Bullish, Comal Ventures, iNovia Capital, Jude Gomila Rolling Fund, Northside Ventures, Weekend Fund, Wing Venture Capital, Y Combinator, Ed Baker

Financial History

Datacurve has raised $18.5M across 3 funding rounds.

Total Raised

$18.5M

Valuation

N/A

Leadership Team

Key people at Datacurve.

Leadership Team

Key people at Datacurve.

Deep Dive

High-Level Overview

Datacurve is a specialized data factory that creates high-quality coding datasets specifically designed for training and evaluating large language models (LLMs) focused on software development tasks. It serves AI companies and research labs by identifying weaknesses in their models through private benchmarks and then orchestrating targeted data collection projects via a gamified bounty platform where over 14,000 vetted software engineers compete to produce complex coding data. This approach addresses the growing need for expert-level, domain-specific data beyond generic labeling services, enabling improved model performance in coding tasks such as algorithm challenges, debugging, and multimodal UI understanding. Datacurve’s business model is B2B, generating revenue from custom dataset contracts tailored to specific model weaknesses, thus impacting the AI startup ecosystem by providing critical infrastructure for advanced model training and evaluation[1][2][3].

Origin Story

Founded recently with a seed round followed by a $15 million Series A led by Chemistry and notable investors from DeepMind, Anthropic, and OpenAI, Datacurve was co-founded by Serena Ge and Charley Lee. The founders recognized the increasing complexity of AI training data needs, especially for software engineering tasks that require deep expertise. They developed a unique “bounty hunter” system to attract skilled engineers by gamifying data creation, focusing on user experience rather than just financial incentives. This model emerged from the observation that as AI models mature, the remaining data gaps are highly specialized and require expert contributions, which traditional crowd-sourcing cannot efficiently fill. Early traction includes distributing over $1 million in bounties and building a platform that integrates seamlessly with major ML training pipelines[1][3].

Core Differentiators

Expert-driven data creation: Unlike generic labeling, Datacurve uses vetted software engineers to produce complex, high-quality coding datasets.
Gamified bounty platform: Engages and retains top engineering talent through competition and rewards, enhancing data quality and diversity.
Targeted data production: Uses private benchmarks to identify model weaknesses and converts them into precise data collection quests.
Integration-ready datasets: Data conforms to standard LLM training formats and supports reinforcement learning environments with dockerized repos and pytest harnesses.
Specialty datasets: Includes algorithmic puzzles, debugging scenarios, private codebase tasks, and multimodal UI challenges combining code with screenshots or recordings.
Strong technical team: Engineers with research backgrounds enable fast iteration and close collaboration with AI research teams[1][2][3].

Role in the Broader Tech Landscape

Datacurve rides the trend of increasing specialization and sophistication in AI training data, particularly for coding and software development models. As LLMs evolve, simple datasets no longer suffice; complex reinforcement learning environments and domain-specific data are essential. The timing is critical because the AI industry is shifting from broad pretraining to targeted post-training data collection to address nuanced model failures. Datacurve’s approach influences the ecosystem by setting new standards for data quality and developer engagement, potentially expanding beyond software engineering into other expert domains like finance or medicine. Its platform also exemplifies how gamification and expert networks can solve the challenge of sourcing high-quality, specialized training data at scale[1][3].

Quick Take & Future Outlook

Datacurve is positioned to become a key infrastructure provider for next-generation AI coding models by scaling its expert-driven data factory and expanding its bounty platform. Future trends shaping its journey include the growing demand for reinforcement learning from human feedback (RLHF) data, multimodal AI capabilities, and the need for proprietary, realistic codebases in training. As AI models become more agentic and interactive, Datacurve’s ability to produce complex, scenario-based datasets will be increasingly valuable. Its influence may grow by extending its model to other specialized fields and by deepening integration with AI research workflows, potentially becoming a cornerstone in the AI data supply chain. This aligns with its mission to scale the future of AI coding abilities through quality and innovation[1][2][3].

Sources

Frequently Asked Questions

Who founded Datacurve?

Datacurve was founded in 2024 by Charley Lee (Founder) and Serena Ge (Founder).

How much funding has Datacurve raised?

Datacurve has raised $18.5M in total across 3 funding rounds.

Who are Datacurve's investors?

Datacurve's investors include Afore Capital, Chemistry VC, SignalFire, Wing Venture Capital, Nir Eyal, Tom Blomfield, Andreessen Horowitz, Bullish, Comal Ventures, iNovia Capital, Jude Gomila Rolling Fund, Northside Ventures.

Frequently Asked Questions

Who founded Datacurve?

Datacurve was founded in 2024 by Charley Lee (Founder) and Serena Ge (Founder).

How much funding has Datacurve raised?

Datacurve has raised $18.5M in total across 3 funding rounds.

High-Level Overview

Origin Story

Core Differentiators

Expert-driven data creation: Unlike generic labeling, Datacurve uses vetted software engineers to produce complex, high-quality coding datasets.
Gamified bounty platform: Engages and retains top engineering talent through competition and rewards, enhancing data quality and diversity.
Targeted data production: Uses private benchmarks to identify model weaknesses and converts them into precise data collection quests.
Integration-ready datasets: Data conforms to standard LLM training formats and supports reinforcement learning environments with dockerized repos and pytest harnesses.
Specialty datasets: Includes algorithmic puzzles, debugging scenarios, private codebase tasks, and multimodal UI challenges combining code with screenshots or recordings.
Strong technical team: Engineers with research backgrounds enable fast iteration and close collaboration with AI research teams[1][2][3].