Versions

Overview of available model variants

Model	Release	Strengths	Weaknesses	Status
Eleven Music v2 Recommended	May 2026	Generative AI music with studio-quality vocals, instrumentation and arrangement Section-level inpainting: regenerate bridge, verse or chorus selectively Mid-track genre switching with coherent long-form composition Trained exclusively on licensed data – cleared for commercial use	Cloud API (no self-hosting) Verify usage rights and stems for advertising in the enterprise contract	Current
Eleven v3 Conversational Recommended	May 2026	Low latency for live dialogue and voice agents Expressive Mode with emotion detection via Scribe v2 Realtime Audio tags and natural turn-taking incl. interruption handling Sweet spot between Flash latency and v3 expressiveness	Slightly lower audio fidelity than Eleven v3 (studio) Cloud API (no self-hosting)	Current
Eleven v3 (GA) Recommended	March 2026 (GA)	70+ languages, broadest TTS language coverage on the market Audio tags to control emotion and action ([excited], [whispers], [sighs]) Text-to-Dialogue API for multi-speaker scenarios High emotional range and naturalness, 68% fewer errors on complex text vs. alpha	Not real-time capable – use v3 Conversational or Flash v2.5 for live agents Cloud API (no self-hosting) Evaluate pricing for high-volume workloads	Current
Eleven Multilingual v2	2024	Proven high quality for multilingual narration Optimised for long-form audiobooks	Fewer languages than v3 No audio tags	Current
Eleven Flash v2.5	2024	Lowest latency for real-time agents Ideal for phone / voice bots	Lower emotional range	Current

Use Cases

Typical applications for this model

Voice synthesis for marketing and ad videos

Audiobook production

Voice bots / conversational agents

E-learning and training videos

Dubbing and localisation

Accessibility (screen readers, inclusion)

Voice cloning for brand / personal-brand voices

Multi-speaker dialogues (podcasts, audio drama)

Technical Details

API, features and capabilities

API & Availability

Availability Public (API + Web UI)

Latency (TTFT) ~75ms (Flash v2.5), ~1s (v3)

Features & Capabilities

File Upload Realtime API

Training & Knowledge

Knowledge Cutoff Not publicly documented

Fine-Tuning Available (Voice Cloning (Instant and Professional), Voice Library Customisation)

Language Support

Best Quality English, German, French, Spanish, Italian, Polish, Dutch, Japanese, Mandarin, Arabic

Supported 70+ languages (Eleven v3)

One of the broadest language coverages on the TTS market

Hosting & Compliance

GDPR-compliant hosting options and licensing

GDPR-Compliant Hosting Options

License & Hosting

License Proprietary (commercial ToS)

Security Filters Voice cloning verification (KYC for Professional Voice Cloning)

Enterprise Support Yes

SLA Available Yes

Cloud Only

innFactory AI Consulting from Rosenheim, Germany advises DACH-region enterprises on GDPR- and AI-Act-compliant deployment of voice and music AI. In spring 2026 ElevenLabs expanded its portfolio in quick succession: Eleven v3 went GA in March, followed in May by Eleven v3 Conversational for voice agents and Music v2 for commercially cleared AI music.

What is Eleven v3?

Eleven v3 is ElevenLabs’ current flagship model. It generates natural-sounding speech with high emotional range and contextual understanding in more than 70 languages.

Key innovations

Audio tags

For the first time, emotional and acoustic cues can be steered directly in the text – via tags in square brackets:

[excited] We are thrilled, [whispers] that you're here today.
[sighs] After a long day…
[clapping] Well done!

The model interprets tags such as [excited], [whispers], [sighs], [gunshot], [clapping] or [explosion] and adapts tone and audio accordingly.

Text-to-Dialogue API

The Text-to-Dialogue API lets you generate multi-speaker scenarios (podcasts, audio drama, training videos) in a single call – with natural dialogue dynamics between multiple voices.

Broadest language coverage

70+ languages with high quality – covering all major European languages and many smaller languages that competitors miss.

New in May 2026

Eleven v3 Conversational

With Eleven v3 Conversational, ElevenLabs closes the gap between studio quality (Eleven v3) and real-time latency (Flash v2.5). The variant inherits audio tags and multi-speaker logic from v3 but is optimised for live dialogue with voice agents. An Expressive Mode combines the model with Scribe v2 Realtime, allowing agents to infer emotion from the user’s voice and react accordingly. Turn-taking including interruption handling runs natively in the model.

Music v2

Music v2 generates complete songs including vocals and arrangement in studio quality. New features include section-level inpainting (selectively regenerate verse, bridge or chorus) and mid-track genre switching. Critical for enterprise and marketing use cases: Music v2 was trained exclusively on licensed data and is cleared for commercial use – a clear advantage over competitors such as Suno or Udio, both facing lawsuits from major labels. Pricing for Music v1 and v2 was cut by up to 50% at the same time.

Model selection by use case

Use case	Recommended model	Why
Marketing videos / ads	Eleven v3	Highest quality, audio tags
Audiobooks / narration	Multilingual v2	Stable for long-form
Voice agents / telephony	Eleven v3 Conversational	Latency + audio tags + Expressive Mode
Ultra-low-latency bots	Flash v2.5	Lowest latency
Multi-speaker podcasts	Eleven v3 (Text-to-Dialogue)	Multi-voice dialogue
Ad music / jingles / sound branding	Music v2	Commercially cleared, section editing
Accessibility / screen readers	Multilingual v2 or Flash v2.5	Stability over expressiveness

GDPR and AI-Act compliance

Data residency

ElevenLabs offers EU data residency and DPA contracts for Enterprise customers. Standard- and Free-tier usage should not be used for sensitive content – clarify the current contract scope with the ElevenLabs Enterprise team.

EU AI Act and AI-generated speech

From August 2026, synthetic speech output requires labelling under the EU AI Act
ElevenLabs supports AI speech disclosure via metadata
For deepfake risk: Voice Cloning for the professional variant requires KYC verification
Recommendation: internal policy for labelling ElevenLabs audio in customer and employee communications

Copyright and personality rights

Voice cloning only with documented consent of the cloned person
For brand voices: contractual agreements with voice talent are mandatory
Licence durability: When a relationship with a voice talent ends, clarify whether existing cloned voices may continue to be used

Integration into enterprise workflows

REST API with comprehensive streaming options
WebSocket streams for real-time conversations (Flash v2.5)
SDKs: Python, Node.js, multiple community SDKs
Conversational AI agents: Native integration with OpenAI, Anthropic and Gemini as LLM backbone

Our recommendation

For high-quality speech output, Eleven v3 is the leading choice in 2026, for voice agents we recommend Eleven v3 Conversational, and for commercially usable AI music, Music v2 is currently the lowest-risk option thanks to its licensed training data. For GDPR-critical applications we recommend the Enterprise tier with DPA and EU data residency and a clear internal policy for audio labelling.

As alternatives we evaluate OpenAI gpt-4o-mini-tts (well integrated with the OpenAI stack) or Cartesia Sonic (very low latency, state-space models). Contact us for advice on the right audio model strategy.

Cost estimation for this model

For up-to-date token pricing, model variants and EU availability, see our sister project ai-prices.eu. It helps you compare and estimate the operational cost of leading AI models for your specific use case.

Compare prices on ai-prices.eu

ai-prices.eu is a project by innFactory AI Consulting GmbH and provides transparent cost estimates for leading AI models.

ElevenLabs