TTS Glossary
The rabbit-hole engine. Every technical term used on this site has a full definition with examples, related concepts, and links to the relevant reviews and guides.
Voice / Audio Technology
Neural TTS
Neural TTS uses deep learning models to synthesize human-like speech from text — replacing the old sample-stitching approach and enabling the realistic AI voices of 2026.
Prosody
Prosody is the rhythm, stress, intonation, and timing of speech — the difference between robotic and human-sounding voice. The primary dimension on which modern TTS tools still vary.
SSML
Speech Synthesis Markup Language — XML-based markup that controls how TTS engines pronounce text, add pauses, adjust pitch, and handle emphasis.
Voice Cloning
Voice cloning uses AI to replicate a specific person's voice from sample audio. Requires explicit consent — cloning without consent is illegal in most jurisdictions.
Voice Model
A voice model is the trained neural network that defines how a TTS system sounds — each vendor's flagship 'voice' is a model version with specific quality, latency, and language tradeoffs.