Home / Text to Speech

Text to Speech

Generate natural speech with 24+ AI models. Free tier available — no account required.

Model

Voice

Output Format

Text

0 / 500 characters Free

Credit Costs

Standard	2 / 1K chars
OpenAI TTS, Google Cloud, Azure, Polly Neural, ElevenLabs Flash/Turbo, Cartesia Turbo
Premium	5 / 1K chars
ElevenLabs v2, OpenAI HD/GPT-4o, Google Studio, Azure HD, Cartesia Sonic 2/3, Deepgram

Free Tier

No account required
Trial with Google Cloud Standard and Amazon Polly
Up to 500 characters per request
3 generations per hour
Sign up for 50 free credits

Output Formats

MP3 — Best for web playback
WAV — Uncompressed, ideal for editing
OGG — Open format, good compression
FLAC — Lossless audio quality

Frequently Asked Questions

Text to speech (TTS) is an AI technology that converts written text into natural-sounding spoken audio. Modern TTS APIs from providers like ElevenLabs, OpenAI, Google Cloud, and Azure use deep learning to produce speech that sounds remarkably human, with natural prosody, emotion, and rhythm.

It depends on your needs. For fast, affordable generation, try OpenAI TTS or Google Cloud Neural2 (standard tier). For the highest quality and voice cloning, use ElevenLabs Multilingual v2 or Cartesia Sonic (premium). For ultra-low latency, try Deepgram Aura-2 or ElevenLabs Flash. Each provider has different strengths — experiment to find the best fit.

Yes! You can try text-to-speech without an account using our trial models (up to 300 characters, 3 generations per hour). Sign up for a free account to get starter credits and access all 20+ models from every provider.

Our TTS providers collectively support 140+ languages. Azure Neural covers 140+ languages, Google Cloud covers 50+, ElevenLabs covers 29, and OpenAI covers 57. Language availability varies by model.

Yes, all providers we offer permit commercial use of generated audio. Each provider has specific terms — ElevenLabs, OpenAI, Google Cloud, Azure, and Amazon Polly all allow commercial usage under their respective terms of service.

WhatIsTTS supports MP3, WAV, OGG, and FLAC output formats. MP3 is the default for web playback. WAV is recommended for further audio processing. You can convert between formats using our Audio Converter tool.

Voice cloning uses AI to replicate a specific voice from a short audio sample. Upload a clear recording, and providers like ElevenLabs (instant cloning from 1 minute of audio), Cartesia Sonic, or Azure Custom Neural Voice will generate new speech in that voice.

Trial users can generate up to 300 characters per request. Registered users get up to 5,000 characters per request. For longer texts, the audio is generated in chunks and stitched together automatically. API users can process up to 10,000 characters per request.

WhatIsTTS uses a credit system. Free-tier models (Piper, VITS, MeloTTS) cost 0 credits. Standard models (Kokoro, Bark, CosyVoice) cost 2 credits per 1,000 characters. Premium models (ElevenLabs, OpenAI TTS, Cartesia Sonic) cost 4 credits per 1,000 characters. New accounts receive free starter credits to try all models.

Yes. WhatIsTTS provides a REST API at /api/v1/tts/ that supports all models and voices. Generate an API key (prefixed sk-tts-) from your account page and pass it as a Bearer token. The API supports streaming, SSML input, and all output formats. See our API documentation for code examples in Python, JavaScript, Go, and cURL.

WhatIsTTS gives you access to 20+ models from ElevenLabs, OpenAI, Google Cloud, Azure, Amazon Polly, Deepgram, and Cartesia through a single account and unified API. You avoid managing multiple provider accounts, separate billing, and different API formats. Credits work across all providers, so you can switch models without friction.

Yes, several providers support SSML (Speech Synthesis Markup Language) for fine-tuning pronunciation, pauses, emphasis, and speed. Azure Neural and Amazon Polly have full SSML support. Google Cloud supports a subset. ElevenLabs and OpenAI use plain text but offer speed and stability controls through their own parameters.

Need a specific TTS workflow?

Compare providers, test voices, then run it through one brokered API.

View Plans API Docs

WhatIsTTS

Text to Speech

Credit Costs

Free Tier

Output Formats

Frequently Asked Questions

What is text to speech (TTS)?

Which TTS model should I choose?

Is there a free trial?

What languages are supported?

Can I use the generated audio commercially?

What audio formats are supported?

How does voice cloning work?

What is the maximum text length?

How much does text to speech cost?

Can I access text to speech via API?

How does WhatIsTTS compare to using providers directly?

Do you support SSML or pronunciation controls?

Need a specific TTS workflow?