The LLM Data Company

Post-training Data Research

ActiveAIAiopsGenerative AIY Combinator

Updated: Feb 17, 2026 ·

About

The LLM Data Company is a research lab studying post-training data.

We work with frontier AI teams to create bespoke tasks, rewards, and environments for models to play and learn at scale.

Recent News & Mentions

Jun 1, 2025FundingThe LLM Data Company - Seed May 23, 2025Product LaunchThe LLM Data Company - Workspace for Evals

Financial History

The LLM Data Company has raised $500K across 1 funding round. Most recently, it raised $500K Seed in June 2025.

Total Raised

$500K

Valuation

N/A

Funding Rounds Raised

Date	Round	Lead Investors	Other Investors
Jun 1, 2025	$500K Seed		CrunchFund, Galaxy Digital, Hashed, Pantera Capital, Y Combinator, Jon Kol

Financial History

The LLM Data Company has raised $500K across 1 funding round.

Total Raised

$500K

Valuation

N/A

Leadership Team

Key people at The LLM Data Company.

Leadership Team

Key people at The LLM Data Company.

Deep Dive

High-Level Overview

The LLM Data Company (TLDC) is a research-driven startup specializing in tooling for evaluating large language models (LLMs) and defining reward functions for reinforcement learning (RL) in AI systems. Its core product, *doteval*, is an AI-assisted collaborative workspace that enables technical and non-technical teams to create, version, and manage high-signal evaluation workflows for LLMs and intelligent agents. This platform helps organizations systematically test, refine, and align model behaviors, improving the reliability and robustness of AI applications. TLDC primarily serves AI research teams, machine learning engineers, and product managers working on LLM or agent-based applications, addressing a critical need for structured, measurable model evaluation in the generative AI ecosystem[1][2][5].

Origin Story

Founded in 2025 and based in San Francisco, TLDC emerged from the recognition that evaluating LLM performance and defining RL rewards are complex, often opaque challenges undermining AI progress. The founding team includes Gavin Bains, Joseph Besgen, and Daanish Khazi, who brought together expertise in AI research and software tooling. The company was part of Y Combinator’s Spring 2025 batch, signaling strong early investor confidence. Since inception, TLDC has focused on building infrastructure-first, developer-friendly tools that bring rigor and automation to AI evaluation workflows, evolving from simple evals to broader data tooling for agents and model monitoring[2][3][5].

Core Differentiators

Integrated Workspace: Combines creation, versioning, and execution of evaluation tasks in one platform, reducing fragmentation.
Cross-Functional Collaboration: Designed for both technical and non-technical users, enabling alignment across teams.
High-Signal Evaluations: Supports fine-grained rubrics, aligned graders, and AI-generated diffs to increase evaluation quality.
RL Reward Definition: Facilitates precise reward function specification for reinforcement learning, accelerating model improvement.
Version Control & Automation: Treats evaluation as code, enabling reproducibility and fast iteration.
Focus on Frontier AI Teams: Tailored to the needs of cutting-edge AI labs working on complex, unstructured tasks[1][2][5].

Role in the Broader Tech Landscape

TLDC rides the critical trend of moving AI development beyond mere scale and data volume toward *measurable, rigorous evaluation* of model performance. As generative AI models grow more complex, the need for structured, automated, and reproducible evaluation workflows becomes paramount to ensure models are not only powerful but also reliable, ethical, and aligned with user needs. The company’s tooling addresses a major bottleneck in AI development—opaque and inconsistent evaluation—thus enabling faster, more confident deployment of advanced LLMs. This positions TLDC as a foundational player in the emerging AI infrastructure ecosystem, influencing how AI labs and applied AI companies benchmark and improve their models[2][5].

Quick Take & Future Outlook

Looking ahead, The LLM Data Company is poised to expand its platform beyond evaluation into broader post-training data tooling, including model monitoring and automated dataset curation. As AI models continue to evolve rapidly, TLDC’s infrastructure-first approach and deep understanding of evaluation workflows will likely make it a key enabler for teams seeking to operationalize and scale trustworthy AI. Trends such as reinforcement learning from human feedback (RLHF) and the increasing complexity of AI agents will further drive demand for TLDC’s solutions. Its influence is expected to grow as evaluation becomes a core software discipline, integral to AI development cycles, ensuring that future AI systems are not only more capable but also more aligned and dependable[2][5].

Sources

Frequently Asked Questions

Who founded The LLM Data Company?

The LLM Data Company was founded in 2025 by Joseph Besgen (Co-Founder) and Gavin Bains (Co-Founder) and Daanish Khazi (Co-Founder).

How much funding has The LLM Data Company raised?

The LLM Data Company has raised $500K in total across 1 funding round.

Who are The LLM Data Company's investors?

The LLM Data Company's investors include CrunchFund, Galaxy Digital, Hashed, Pantera Capital, Y Combinator, Jon Kol.

Frequently Asked Questions

Who founded The LLM Data Company?

The LLM Data Company was founded in 2025 by Joseph Besgen (Co-Founder) and Gavin Bains (Co-Founder) and Daanish Khazi (Co-Founder).

How much funding has The LLM Data Company raised?

The LLM Data Company has raised $500K in total across 1 funding round.

High-Level Overview

Origin Story

Core Differentiators

Integrated Workspace: Combines creation, versioning, and execution of evaluation tasks in one platform, reducing fragmentation.
Cross-Functional Collaboration: Designed for both technical and non-technical users, enabling alignment across teams.
High-Signal Evaluations: Supports fine-grained rubrics, aligned graders, and AI-generated diffs to increase evaluation quality.
RL Reward Definition: Facilitates precise reward function specification for reinforcement learning, accelerating model improvement.
Version Control & Automation: Treats evaluation as code, enabling reproducibility and fast iteration.
Focus on Frontier AI Teams: Tailored to the needs of cutting-edge AI labs working on complex, unstructured tasks[1][2][5].