WhatIsTTS
WhatIsTTS

Home / Text to Speech

Text to Speech

Generate natural speech with 24+ AI models. Free tier available — no account required.

0 / 500 characters Free
Generating...

Generating speech...

Credit Costs

Standard 2 / 1K chars
OpenAI TTS, Google Cloud, Azure, Polly Neural, ElevenLabs Flash/Turbo, PlayHT 3-mini
Premium 5 / 1K chars
ElevenLabs v2, OpenAI HD/GPT-4o, Google Studio, Azure HD, Cartesia Sonic, Deepgram, PlayHT Dialog

Free Tier

  • No account required
  • Trial with Google Cloud Standard and Amazon Polly
  • Up to 500 characters per request
  • 3 generations per hour
  • Sign up for 50 free credits

Output Formats

  • MP3 — Best for web playback
  • WAV — Uncompressed, ideal for editing
  • OGG — Open format, good compression
  • FLAC — Lossless audio quality

Frequently Asked Questions

Text to speech (TTS) is an AI technology that converts written text into natural-sounding spoken audio. Modern TTS APIs from providers like ElevenLabs, OpenAI, Google Cloud, and Azure use deep learning to produce speech that sounds remarkably human, with natural prosody, emotion, and rhythm.

It depends on your needs. For fast, affordable generation, try OpenAI TTS or Google Cloud Neural2 (standard tier). For the highest quality and voice cloning, use ElevenLabs Multilingual v2 or Cartesia Sonic (premium). For ultra-low latency, try Deepgram Aura-2 or ElevenLabs Flash. Each provider has different strengths — experiment to find the best fit.

Yes! You can try text-to-speech without an account using our trial models (up to 300 characters, 3 generations per hour). Sign up for a free account to get starter credits and access all 20+ models from every provider.

Our TTS providers collectively support 140+ languages. Azure Neural covers 140+ languages, Google Cloud covers 50+, ElevenLabs covers 29, and OpenAI covers 57. Language availability varies by model.

Yes, all providers we offer permit commercial use of generated audio. Each provider has specific terms — ElevenLabs, OpenAI, Google Cloud, Azure, and Amazon Polly all allow commercial usage under their respective terms of service.

WhatIsTTS supports MP3, WAV, OGG, and FLAC output formats. MP3 is the default for web playback. WAV is recommended for further audio processing. You can convert between formats using our Audio Converter tool.

Voice cloning uses AI to replicate a specific voice from a short audio sample. Upload a clear recording, and providers like ElevenLabs (instant cloning from 1 minute of audio), PlayHT, Azure Custom Neural Voice, or Cartesia Sonic will generate new speech in that voice.

Trial users can generate up to 300 characters per request. Registered users get up to 5,000 characters per request. For longer texts, the audio is generated in chunks and stitched together automatically. API users can process up to 10,000 characters per request.

WhatIsTTS uses a credit system. Free-tier models (Piper, VITS, MeloTTS) cost 0 credits. Standard models (Kokoro, Bark, CosyVoice) cost 2 credits per 1,000 characters. Premium models (ElevenLabs, OpenAI TTS, Cartesia Sonic) cost 4 credits per 1,000 characters. New accounts receive free starter credits to try all models.

Yes. WhatIsTTS provides a REST API at /api/v1/tts/ that supports all models and voices. Generate an API key (prefixed sk-tts-) from your account page and pass it as a Bearer token. The API supports streaming, SSML input, and all output formats. See our API documentation for code examples in Python, JavaScript, Go, and cURL.

WhatIsTTS gives you access to 20+ models from ElevenLabs, OpenAI, Google Cloud, Azure, Amazon Polly, PlayHT, Deepgram, and Cartesia through a single account and unified API. You avoid managing multiple provider accounts, separate billing, and different API formats. Credits work across all providers, so you can switch models without friction.

Yes, several providers support SSML (Speech Synthesis Markup Language) for fine-tuning pronunciation, pauses, emphasis, and speed. Azure Neural and Amazon Polly have full SSML support. Google Cloud supports a subset. ElevenLabs and OpenAI use plain text but offer speed and stability controls through their own parameters.

Need a specific TTS workflow?

Compare providers, test voices, then run it through one brokered API.