- What is AI text-to-speech?
- AI text-to-speech uses a neural network (deep learning model) to generate human-like speech from written text. Unlike traditional TTS that concatenates pre-recorded sound fragments, AI TTS generates entirely new audio waveforms that capture natural pacing, intonation, and emotion. This tool uses Kokoro 82M, an 82-million parameter model that runs directly in your browser.
- Is this text-to-speech tool free?
- Yes, completely free with no limits. There is no signup, no account creation, no ads, and no daily usage caps. Both AI voices and browser voices are free. You can convert up to 5,000 characters per session with unlimited sessions.
- What AI model powers the voices?
- The AI voices are powered by Kokoro 82M, an open-source neural TTS model with 82 million parameters, developed by hexgrad and distributed by Hugging Face. It runs in your browser via ONNX Runtime (WebAssembly). The model uses a quantized (q8) format at approximately 92MB, which downloads once and is cached for future visits.
- What languages are supported?
- AI voices (Kokoro) currently support English with American and British accents — 28 voices total. Browser voices (fallback) support 50+ languages depending on your browser and OS, including Spanish, French, German, Chinese, Japanese, Korean, Arabic, Hindi, Portuguese, Russian, and many more.
- Can I download the generated audio?
- Yes! When using AI voices (Kokoro), click the 'Download' button after generating speech to save the audio as a WAV file. You can use this audio in videos, presentations, podcasts, or any project. Browser voices do not support download due to Web Speech API limitations.
- Why does the AI model need to download first?
- The Kokoro AI model is approximately 92MB and needs to download to your browser on first use. This is because the neural network runs entirely on your device (no server processing). After the first download, the model is cached in your browser and loads instantly on future visits — no re-download needed.
- Is my text private and secure?
- Yes. When using AI voices, your text is processed entirely in your browser via WebAssembly. It is never sent to our servers or any third party. The generated audio exists only in your browser's memory. When using browser voices, offline voices are also fully private. Some cloud-based browser voices (on Chrome) may send text to Google's speech service.
- What is the difference between AI voices and browser voices?
- AI voices (Kokoro) use an 82M parameter neural network to generate human-like speech with natural intonation and emotion — they sound like a real person. Browser voices use the Web Speech API built into your browser — they work instantly with no download but sound more robotic. AI voices currently support English (28 voices, American & British accents); browser voices support 50+ languages.
- How do I get the best voice quality?
- Use the AI Voices tab with Kokoro for the best quality. Recommended voices: 'Heart' or 'Bella' (American female) and 'Alice' or 'Daniel' (British) for the most natural sound. Keep text segments under 500 words for fastest generation. Set speed to 1.0x for the most natural pacing.
- Can I use the audio commercially?
- Yes. The Kokoro model is released under the Apache-2.0 license, which permits commercial use. The audio you generate is yours to use in YouTube videos, podcasts, ads, e-learning courses, or any commercial project. There are no royalty fees or attribution requirements for the generated audio.
- Does this work on mobile?
- Yes, both engines work on mobile browsers. AI voices (Kokoro via WebAssembly) work on modern mobile browsers, though generation may be slower on older devices. Browser voices work on all mobile browsers — iOS Safari uses Apple voices, Android Chrome uses Google voices. Some mobile browsers may require tapping the Speak button due to autoplay restrictions.
- How is this different from ElevenLabs, Speechify, or TTSMaker?
- Most competitors process your text on their servers and charge for premium voices. This tool runs the AI model entirely in your browser — your text never leaves your device, there are no usage limits, no subscription fees, and the audio is downloadable as WAV for free. The trade-off is that browser-based AI has fewer voices (54 vs hundreds) and requires a one-time ~92MB model download.
- Can I use this for language learning?
- Yes. For English learners, AI voices offer natural American and British pronunciation at adjustable speeds — slow to 0.5x to hear every word clearly. For other languages, switch to the Browser Voices tab which supports 50+ languages including Spanish, French, German, Chinese, Japanese, Korean, Hindi, and more.
- What audio format is the download?
- AI-generated audio is downloaded as a WAV file (uncompressed, lossless audio). WAV is universally supported by video editors (Premiere Pro, Final Cut, DaVinci Resolve), audio editors (Audacity, GarageBand), and all media players. You can convert WAV to MP3 or other formats using any free audio converter.