AI-powered Text-To-Speech (TTS) technology is a big step forward for Chinese learners all over the world. In this blog, I set out to find the best AI-voice generator for the Chinese language currently available.
For me it is difficult to imagine learning Chinese without some kind of text-to-speech technology – whether it is to listen to a dictionary entry or have sentences read out loud. Even primitive options like Google Translate’s audio can be a great help, yet for longer texts I definitely prefer AI-voices. They can’t beat human, native audio in most cases, but do offer lifelike pronunciations, (almost) natural sounding rhythm and tones, and (to some extent) emotional inflections.
Text-to-speech is a technology that becomes a medium for communication to facilitate language learning. Text-to-speech also enhances learning effectiveness and student participation rate. On the other hand, text-to-speech has also slight disadvantages, such as the lack of naturalness, pleasantness, expressiveness, intonation, eye contact, and real-time class interaction. (A. Widyana et al (2022). The Application of Text-to-Speech Technology in Language Learning: Proceedings of the Sixth International Conference on Language, Literature, Culture, and Education (ICOLLITE 2022).)
Focus on free tools
Although I explored plenty of premium options, I want to focus on tools that can be used for free and without registering, because these are ideal for most Chinese learners who want to instantly generate high quality audio for listening practice. I was also keen on trying Chinese AI-tools, but unfortunately I couldn’t access them without registering with Chinese credentials, so I didn’t include them here.
Special criteria for Chinese
I’m by no means an expert for Chinese pronunciation, but I’m aware of these challenges for TTS in Chinese:
Tones: Chinese is a tonal language and tones interact with each other. Google Translate’s pre-AI audio doesn’t seem to be aware of that as the example below shows. Here 不 should have a rising tone as it’s combined with a verb that has a falling tone, except it has not, so the audio fragment is wrong. AI has been trained on authentic speech, so it should do much better, but this is tricky stuff.
Segmentation: if you feed AI a string of Chinese characters, does it ‘understand’ the context well enough to separate it into the right parts?
1. Dubverse
Dubverse is an Indian startup company that wants to democratize video production and remove language barriers by making content multilingual. This advanced AI-tool allows you to record entire podcasts and audiobooks in Chinese or Cantonese with different AI-voices. I was impressed by the quality of the voices. It might be more than the average Chinese learner needs though and you do have to sign up.
Pros:
20+ AI-voices for Chinese
Different voice ‘modes’ available (cheerful, angry, serious, sad etc.)
Supports Mandarin Chinese + Cantonese
Interface allows you to edit text and refresh audio
Cons:
Slow processing for non-premium users
10 downloads per week for non-premium
2. Narakeet
Narakeet is an online text-to-speech video maker designed for a global audience. Supporting 100 languages and 700 voices, it uses AI to create realistic narration from presentation notes or markdown scripts. Users can easily edit videos like text, saving time on recording, syncing audio, and generating subtitles. Surprisingly Narakeet allows you to use a limited number of Chinese accents and even children’s voices. A plus for Chinese learners: the TTS for Chinese can be used without signing up, be it for short texts only.
Pros:
20+ Chinese AI-voices to choose from (including Henan, Shandong accents + children’s voices)
Audio can be downloaded
You can create 20 audio fragments for free
Audio quality is pretty good & no need to register
Cons:
Speed can’t be adjusted
3. Micmonster
MicMonster is a text-to-speech tool that creates natural, emotive AI voiceovers for projects like YouTube videos, e-learning and tutorials. You don’t need to register if you merely want to generate audio for short texts. The AI-voices sound natural, it’s a pity you can’t adjust speed without signing up.
Pros:
About 30 AI-voices available
No need to register to generate short audio fragments
Supports MP3-downloads for max. 300 character texts
Cons:
To unlock all features you need a premium account
4. Listnr.ai
Listnr’s AI text-to-speech editor generates human-like voiceovers for ads, e-learning, product demos, presentations, audiobooks, and YouTube videos, helping streamline content creation with high-quality results. The good news for Chinese learners is that you don’t have to sign up to generate AI-audio. The voices I tested did sound relatively mechanic compared to other AI-tools and there’s not that much choice for Chinese.
Pros:
8 different AI-voices
Instant availability, no registering required
Cons:
Voices sound surprisingly mechanic
For very short texts only
5. Murf.ai
Murf’s AI voice generator has been trained on diverse speech datasets, replicates various languages, accents, and styles, creating accurate voiceovers for all kind of applications. Once you sign in with your Google account, you enter a professional TTS studio that offers 5 AI-voices for Chinese. Not too many, but they are more developed than others and customizable in terms of role, mood, pitch and speed. With a free account you can create 10 minutes of audio only. Murf.ai is the kind of tool that would allow you to create lifelike Chinese dialogues, podcasts and audiobooks, but that might be overkill for the majority of Chinese learners.
Pros:
5 AI-voices available
Very customizable in terms of role, mood, pitch and speed
Cons:
No downloads with a free account
Free account limit to creating 10 minutes of audio
6. PlayHT
PlayHT is a startup providing high-quality text-to-speech and audio accessibility solutions using realistic AI voices in nearly every language. It began in 2016 as a Chrome extension for listening to Medium articles. By 2017, they expanded to help individuals and businesses create realistic audio content with tools like a Text-to-Audio editor. Although their quality differs, PlayHT’s AI-voices for Chinese sound authentic and support Mainland Chinese as well as Cantonese and Taiwanese Chinese. You don’t have to register and are allowed three free downloads.
Pros:
20+ different AI-voices available
Mainland Chinese, Cantonese and Taiwanese Chinese
12500 characters max.
Three free downloads
Cons:
Requires a subscription for extensive usage
7. Speechify
Speechify was founded by Cliff Weitzman, a dyslexic college student at Brown University who built the first version of the tool himself to help him keep up with his class readings. Once you register, you profit from advanced options for generating Chinese audio as the above dashboard shows. Impressive, but probably more than you asked for, even without going premium.
Pros:
20+ Chinese AI-voices to choose from (including Cantonese, Taiwanese and some dialects)
Adjust speed, emotions, speaking tone (newscast, chat, assistant…), pitch and volume
Natural and authentic audio
Cons:
No downloads for free users
8. Synthesis
I couldn’t find out too many details about this company. Like the other TTS-tools discussed here Synthesis is an advanced TTS-studio with plenty of options, but credits are spent rather quickly, therefore not a smooth experience for most learners, yet useful for more advanced purposes.
Pros:
30+ AI-voices
Mainland Chinese, Taiwanese and Cantonese
Filter options for voice search
Cons:
No different moods or voice modes
Credits are used rather quickly
9. ElevenLabs
ElevenLabs develops AI audio models for realistic, versatile speech and sound across 32 languages. Their technology powers audiobooks, video games, media localization, social media content, and accessibility tools. They were founded in 2022 and are headquartered in New York. High quality AI-voices, but they are definitely the pro-option and might be overkill for most of us Chinese learners.
Pros:
Variety of AI-voices for Chinese
Options for customizing AI-voices
High quality and authenticity
Cons:
Not all voices can be used by free users
All the different options and functions can be distracting (Voice changer, dubbing studio, sound effects etc.)
10. Luvvoice
Luvvoice is a free online text-to-speech tool that converts text into natural-sounding speech. With a variety of AI voices, you can input text, select a voice, and either download the mp3 or listen directly. Ideal for content creators, students, and anyone who needs text read aloud. I couldn’t find out where this company is based, but I like it a lot, since you can generate audio for longer texts (up to 5000 characters) and adjust speed which is very helpful for Chinese learners. Biggest plus is that you don’t need to login, so it’s the perfect tool for Chinese learners who want instant availability and not too many advanced options.
Pros:
8 AI-voices for Chinese
Options for customizing AI-voices
No login required
5000 characters max.
Options for customizing AI-voices: speed, pitch and volume
Free downloads
Cons:
Ads and captcha (maybe the ads explain why their premium version is rather cheap in comparison)
Ideas for using TTS-tools in Chinese Learning
Repeat after listening: You can use the TTS-output to mimic tones and pronunciation (although you might still prefer human audio).
Experiment with accents: Choose regional accents to broaden your listening comprehension. Switching between Mainland Chinese and Taiwanese can be interesting for example.
Combine with other tools: Pair TTS with apps like Pleco for a complete learning experience. Alternatively, you can use ChatGPT as a kind of AI-tutor as well and generate content for whatever you’re learning for additional reading and listening practice.
Conclusion: What’s the best pick for Chinese learners?
I found hundreds of AI voice generators but struggled to distinguish their uniqueness or understand which models they use. For Chinese learners, it’s important to differentiate between advanced uses, like creating podcasts and audiobooks which are probably not that relevant, and simpler needs, like quickly generating natural and authentic Chinese speech from text.
The quality of Chinese TTS-output ranges from robotic and unpleasant to highly authentic and human-like. However, the most advanced platforms tend to offer quite similar features and quality, with interfaces that look almost identical. For basic purposes, it often feels like once you’ve tried one or two platforms, you’ve tried them all.
Chinese AI voices still lag behind those for English or Spanish, at least on non-Chinese platforms that I tested. For example, you can’t yet select voice clones of Chinese celebrities. At best, some platforms offer variations in mood and style. That said, this will likely improve in the future.
So in a nutshell:
Fast and simple: Luvvoice, Listnr.ai and PlayHT allow you to generate Chinese AI-voices instantly by copy-pasting your text. No need to log in. This is ideal for most learners. Among these three Luvvoice offers the best features.
For advanced usage: Other options like Speechify, ElevenLabs, Synthesis, murf.ai and others are full-blown AI-audio platforms that allow to create AI-voices for all kind of purposes. Cool stuff, but probably not what the average learner needs.
Ultimately, I expect AI-powered TTS to be integrated in more and more Chinese learning apps. Feel free to share your thoughts about this development in the comments!
Affiliate links
Disclosure: These are affiliate links. They help me to support this blog, meaning, at no additional cost to you, I will earn a small commission if you click through and make a purchase.