Search

Search

Information Technology

Information Technology

🇺🇸

United States

🇺🇸

United States

Together AI Integrates Cartesia Sonic TTS to Deliver Sub-90ms Voice Synthesis Across 300K+ Developers

Together AI, an AI infrastructure company, expanded developer capabilities by integrating Cartesia's Sonic text-to-speech model to deliver ultra-low latency voice synthesis at scale.

Value Results Summary

Sub-90ms streaming latency (50% faster than human blink rate)

Sub-90ms streaming latency (50% faster than human blink rate)

Sub-90ms streaming latency (50% faster than human blink rate)

Support for 15 languages with complex input handling (phone numbers, technical terminology)

Support for 15 languages with complex input handling (phone numbers, technical terminology)

Support for 15 languages with complex input handling (phone numbers, technical terminology)

Access to 300,000+ developers via Together AI's unified platform

Access to 300,000+ developers via Together AI's unified platform

Access to 300,000+ developers via Together AI's unified platform

Enterprise-scalable, production-ready infrastructure reducing deployment time and operational costs

Enterprise-scalable, production-ready infrastructure reducing deployment time and operational costs

Enterprise-scalable, production-ready infrastructure reducing deployment time and operational costs

Together AI, a research-driven artificial intelligence company providing cloud infrastructure for developers, sought to enhance its platform with cutting-edge voice AI capabilities. The organization identified a need to offer developers fast, affordable, and production-ready models for building voice applications. Together AI partnered with Cartesia to integrate Sonic, an advanced text-to-speech (TTS) model built on a State Space Model (SSM) architecture, directly into its AI acceleration cloud platform. This integration enables Together AI's 300,000+ developer community to access enterprise-grade voice synthesis without additional infrastructure investment.

Cartesia's Sonic TTS delivers measurable performance advantages that differentiate the platform. The model achieves sub-90ms streaming latency—half the time of a human blink—making it the fastest end-to-end voice application solution available for combining speech-to-text, large language models, and text-to-speech synthesis. Sonic masters complex inputs including phone numbers and technical terminology with high accuracy and supports 15 languages, enabling developers to build multilingual voice agents and content applications. The State Space Model architecture provides continuous performance gains while reducing operational overhead, allowing Together AI to offer enterprise-scalable deployments without compromising development speed or costs.

The partnership enables developers to build AI-powered customer support systems, audio content generators, and voice assistants with production-ready infrastructure. Together AI's unified platform simplifies integration, allowing developers to combine Sonic with other LLMs and AI models through a single API. By bringing Cartesia's voice technology to hundreds of thousands of developers, the partnership accelerates adoption of voice AI across industries including customer service, content creation, localization, recruiting, and finance. Developers can access Sonic directly through Together AI's platform or the Cartesia Playground to begin building next-generation voice applications.

Similar stories

Keep exploring