Protégé
Protégé is a technology company.
Financial History
Protégé has raised $9.0M across 1 funding round.
Frequently Asked Questions
How much funding has Protégé raised?
Protégé has raised $9.0M in total across 1 funding round.
Protégé is a technology company.
Protégé has raised $9.0M across 1 funding round.
Protégé has raised $9.0M in total across 1 funding round.
Protégé has raised $9.0M in total across 1 funding round.
Protégé's investors include Arrive, Atlantic Bridge University Fund, Bennu, Brand Foundry Ventures, Browder Capital, Dreamers VC, Eclipse Ventures, Flex Capital, Founder Collective, G20 Ventures, General Catalyst, Gigascale Capital.
# High-Level Overview
Protege is an AI training data platform that connects data holders with AI developers, enabling secure and compliant access to proprietary datasets for model training[1][2]. Founded in 2024, the company has rapidly become the leading data exchange for artificial intelligence development by solving a critical bottleneck: access to high-quality, diverse training data[2][4].
The platform serves a dual-sided marketplace where data owners—primarily in healthcare and media—can monetize their proprietary information while maintaining control and compliance, and AI builders gain access to the curated datasets they need for model development[2][3]. Protege has achieved remarkable growth momentum, expanding from a $10 million seed round in 2024 to a $25 million Series A in August 2025, with over 100 data partners and more than 20x growth in gross merchandise value (GMV) in 2025[2][4].
# Origin Story
Protege was founded in 2024 by Bobby Samuels (CEO and Co-Founder), Travis May (co-founder and former CEO of LiveRamp and Datavant), Engy Ziedan (Chief Scientific Officer), and Richard Ho (CTO)[2]. The founding team brought deep expertise in data infrastructure and healthcare data—Samuels and May had previously worked together at Datavant, a healthcare data company, giving them firsthand understanding of data fragmentation challenges and the regulatory landscape[4].
The company emerged from a core belief that "the next generation of AI breakthroughs will be powered by enabling data holders to safely allow controlled access to their data," according to Samuels[2]. Within its first year, Protege moved from concept to market leadership, launching with healthcare and media verticals, then rapidly expanding to audio and speech, and motion capture data in August 2025[2]. The company has already partnered with leading foundational model companies and generated tens of millions in revenue for its data partners[2].
# Core Differentiators
# Role in the Broader Tech Landscape
Protege sits at the intersection of two defining trends in AI development: the data scarcity bottleneck and the shift toward real-world, proprietary datasets. As foundation models mature, the limiting factor for AI progress has shifted from compute to access to high-quality, diverse training data—particularly in specialized domains like healthcare[2][4].
The company is riding the wave of AI companies recognizing that proprietary, real-world data is a competitive moat. Rather than relying solely on public internet data, leading AI developers increasingly need access to domain-specific datasets (clinical records, video content, motion capture) that are fragmented across organizations[4]. Protege's infrastructure removes friction from this exchange, creating a marketplace where data holders can participate in AI development while maintaining compliance with regulations like HIPAA[2].
The timing is critical: as AI applications move from research into production, the demand for specialized training data will only intensify. Protege's early dominance in this space—with over 100 partners and most major foundation model companies as customers—positions it as essential infrastructure in the AI development stack[4].
# Quick Take & Future Outlook
Protege has established itself as the critical infrastructure layer for AI training data in less than two years, a remarkable achievement that reflects both the urgency of the data bottleneck and the team's execution capability. The 20x GMV growth in 2025 and rapid vertical expansion suggest the company is still in early innings of market penetration[4].
Looking ahead, Protege's influence will likely deepen as AI development becomes increasingly specialized and regulated. The company's ability to expand into new verticals—potentially financial services, legal, or scientific research—while maintaining compliance and data quality will determine its long-term impact. The founding team's track record suggests they will continue moving at the pace that has defined them: quickly iterating on product, deepening partnerships with major AI companies, and building the data infrastructure that powers the next generation of AI breakthroughs[4].
Protégé has raised $9.0M across 1 funding round. Most recently, it raised $9.0M Seed in March 2022.
| Date | Round | Lead Investors | Other Investors |
|---|---|---|---|
| Mar 1, 2022 | $9.0M Seed | Arrive, Atlantic Bridge University Fund, Bennu, Brand Foundry Ventures, Browder Capital, Dreamers VC, Eclipse Ventures, Flex Capital, Founder Collective, G20 Ventures, General Catalyst, Gigascale Capital, Greylock, Horizon 3 Venture Studio, Khosla Ventures, Mischief Venture Capital, Operator Ventures, Otherwise Fund, Pillar VC, QueensBridge Venture Partners, Quiet Capital, Renegade Partners, Ribbit Capital, SciFi VC, Silicon Ventures, Starting Line, TenOneTen Ventures, Thirty Five Ventures, Todd and Rahul's Angel Fund, Trammell Venture Partners, Aaron Levie, Alex Rigopulos, Amy Chang, Curtis Lee, Dylan Field, Elliot Shmukler, Eric Wei, Eric Wu, Gina Bianchini, Gokul Rajaram, Harris Barton, Hunter Horsley, Joshua Reeves, Karim Atiyeh, Lionel Richie, Mario Götze, Mark Pincus, Matias Woloski, Mike Krieger, Scott Kleper, Sebastien Borget, Tony Xu, Venus Williams |