Loading organizations...

§ Private Profile · San Francisco, CA, USA
AI/ML startup developing generalist speech models for all audio tasks and duplex conversational AI, focused on real-time voice agents.
Key people at Kalpa Labs.
Kalpa Labs was founded in 2025 by Prashant Shishodia (Founder) and Gautam Jha (Founder).
Kalpa Labs, based in San Francisco, California, develops generalist speech models designed to handle all audio tasks, including voice cloning, speech generation, dubbing, and audio understanding, through natural instructions and in-context learning. The company focuses on duplex conversational AI and real-time voice agents, aiming to replace multiple specialized models with one versatile system for low-latency, coherent long calls. Kalpa Labs has raised $500K in seed funding and operates with 2 employees, having been accepted into the Y Combinator Fall 2025 batch. During its time at YC, the company works with YC Partner Pete Koomen. Its technology targets real-time voice applications in sectors such as conversational AI and speech recognition, including contact centers and voice assistants, with multilingual support. Kalpa Labs was founded in 2025 by Prashant Shishodia, formerly of Google, and Gautam Jha, previously at QRT/Squarepoint.
Key people at Kalpa Labs.
Kalpa Labs was founded in 2025 by Prashant Shishodia (Founder) and Gautam Jha (Founder).
Kalpa Labs is pioneering generalist speech models designed to handle every audio-related task—such as speech-to-text, text-to-speech, voice cloning, dubbing, and editing—within a single unified system. Their technology enables natural language instructions to direct complex audio tasks, much like instructing a sound engineer, breaking the current fragmentation in speech AI where specialized models are required for each task. They serve businesses and developers seeking advanced voice agents and audio production tools, solving the problem of brittle workflows and poor context carryover in existing speech AI stacks. Kalpa Labs is gaining momentum by scaling models to billions of parameters and training on millions of hours of audio, aiming to match the flexibility and scale of large language models (LLMs)[1][2][3][4].
Founded in 2025 by Prashant Shishodia (formerly at Google) and Gautam Jha (ex-QRT, Squarepoint), Kalpa Labs emerged from their experience in scaling machine learning systems and low-latency software. The idea arose from recognizing the inefficiency and fragmentation in current speech AI, where separate models handle different audio tasks. Their vision was to create a universal speech model capable of multitasking with natural prompts, inspired by the success of LLMs in text. Early traction includes participation in Y Combinator’s Fall 2025 batch and rapid development of models with emergent capabilities demonstrated in demos[3][4][5].
Kalpa Labs rides the trend of generalist AI models that unify fragmented task-specific systems into single scalable architectures, similar to the evolution seen in natural language processing with GPT-3 and ChatGPT. The timing is critical as speech AI is poised to transition from narrow, specialized models to versatile, instruction-driven systems that can handle complex, multi-modal audio tasks. Market forces such as increasing demand for voice interfaces, multilingual content, and real-time adaptive voice agents favor Kalpa’s approach. By enabling seamless workflows and richer context understanding, Kalpa Labs influences the broader ecosystem by setting new standards for speech AI capabilities and integration[4].
Kalpa Labs is positioned to lead the next wave of speech AI innovation by scaling generalist models that rival the flexibility and power of LLMs. Future trends shaping their journey include the growing adoption of voice interfaces, demand for multilingual and emotionally intelligent AI, and the push for unified AI systems across modalities. Their influence is likely to expand as they refine their models, grow their developer ecosystem, and enable new applications in conversational AI, audio production, and beyond. The company’s vision to replace fragmented speech tools with a single, scalable model could redefine how audio AI is built and deployed, echoing the transformative impact of large language models in text[3][4].