Site icon What's New On The Net

New text to speech model: Fast & robust – News

Open Invention

Image from Iconfinder


2023583 / Pixabay

A team of researchers from Microsoft and Zhejiang University has proposed FastSpeech, a “feed-forward network” that claims to add speed & robustness to text to speech model.

Microsoft announced this on its official blog. Explaining, it said FastSpeech generates “mel-spectrograms” with fast generation speed, robustness, controllability, and high quality.

So why are mel-spectograms so important & what’s their role?

Neural network-based TTS models usually first generate a mel-scale spectrogram (or mel-spectrogram) auto-regressively from text input & then synthesize speech from the mel-spectrogram using a vocoder. The Mel scale is used to measure frequency in Hertz.

All this is further, extensively explained in a research paper titled, “FastSpeech: Fast, Robust and Controllable Text to Speech,” has been accepted at the thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019).

FastSpeech utilizes a unique architecture that improves performance in a number of areas when compared to other TTS models, claimed Microsoft. After some experiments with FastSpeech, here are some of the conclusions drawn:

Experiments on the LJ Speech dataset, as well as on other voices and languages, demonstrate that FastSpeech has the following advantages. Briefly outlined, it is:

For more on FastSpeech, click here.


Exit mobile version