Baseten has raised $285.0M in total across 5 funding rounds.
Baseten's investors include 01 Advisors, Gautam Gupta, Kevin Hartz, Accel, Acrew Capital, Alt Capital, Andreessen Horowitz, Bain Capital Ventures, Bond, C2 Investment, Conviction Partners, DTCP.
Baseten is a San Francisco-based AI infrastructure company founded in 2019 that provides a comprehensive MLOps platform for deploying, serving, fine-tuning, and scaling machine learning models, particularly large language models (LLMs) and generative AI applications.[1][2][3] It serves engineering and ML teams at organizations like Patreon, Stability AI, Writer, Pipe, and Motive, solving the challenges of moving models from development to production with minimal backend expertise required, while delivering high performance, low latency, cost efficiency, and scalability across clouds like AWS and Google Cloud.[1][2][4][5] Baseten's growth includes $20 million in funding from investors such as Lachy Groom, Greylock Partners, and AI Fund, positioning it as a key enabler for production-grade AI with features like fast inference, GPU acceleration, and open-source tools like Truss.[2][3]
Baseten was founded in 2019 by co-founders Amir Haghighat (former Engineering Manager at Clover Health), Pankaj Gupta (former Software Engineer at Uber), Philip Howes, and Tuhin Srivastava (both former co-founders of Shape).[2] The idea emerged to address the high barriers to productizing machine learning, making it the "fastest way to build applications powered by machine learning" by lowering the need for extensive MLOps knowledge and engineering resources.[1][6] Early traction came from alpha users, including a fintech team's customer-scoring model and a digital therapeutics startup's workflow app using Baseten's model zoo, demonstrating diverse use cases in the nascent ML adoption phase.[6] This foundation evolved into partnerships with AWS, NVIDIA, and Google Cloud, focusing on inference optimization for LLMs.[3][5]
Baseten rides the explosive growth of generative AI and LLMs, where inference—the runtime execution of models—has become the primary bottleneck due to escalating compute demands for multi-step reasoning, real-time applications, and enterprise-scale deployments.[3][5] Its timing aligns perfectly with the AI boom post-2022, as hardware advances like NVIDIA GPUs meet software optimizations, enabling new use cases in financial agentic workflows, real-time media generation, healthcare document processing, and voice agents—areas previously hindered by latency and costs.[4][5] Market forces favoring Baseten include cloud hyperscalers' AI Hypercomputers (e.g., Google Cloud), open-source inference stacks (vLLM, SGLang), and the shift from training to efficient serving, reducing barriers for non-expert teams.[3][5] By powering custom and open models like Llama and Gemma at scale, Baseten influences the ecosystem, accelerating AI adoption for startups and enterprises while contributing tools that democratize production ML.[1][6]
Baseten is poised to expand as a Series C player, leveraging NVIDIA B200 deployments, deeper cloud integrations, and custom model support to capture more enterprise inference workloads amid rising demand for cost-efficient, low-latency AI.[2][5] Trends like agentic AI, multimodal models, and edge inference will shape its path, potentially driving further funding and acquisitions as compute costs stabilize and open-source matures.[3][5] Its influence may evolve from infrastructure enabler to full-stack AI platform, empowering broader ML productization and solidifying its role in the performant AI stack that began with simplifying deployment for early adopters.[1][6]
Baseten has raised $285.0M across 5 funding rounds. Most recently, it raised $150.0M Series D in September 2025.