Microsoft's new "VALL-E" can mimic anyone's voice

A new text-to-speech artificial intelligence (AI) model named “VALL-E” was unveiled by Microsoft researchers recently.

Given a 3-second audio sample, VALL-E can accurately mimic a person’s voice. When VALL-E learns a particular voice, it can create audio of that person speaking anything while attempting to capture the speaker’s emotional tone.

According to its developers, VALL-E can also be combined with other generative AI models like GPT-3 to create audio Content, & be used for high-quality text-to-speech applications, speech editing, which would allow a person’s voice to be changed & edited from a text transcript (making them say something they didn’t originally say).

Experiment results show that VALL-E significantly “outperformed” the state-of-the-art zero-shot TTS system in terms of speech naturalness & speaker similarity, claimed Microsoft. Also, it preserved the speaker’s emotion & acoustic environment of the acoustic prompt in synthesis.

Image credit: GitHub

Facebook Tweet Pin LinkedIn Like Email

Artificial Intelligence / Cognitive computing / Internet related News · 2023-01-10

Microsoft’s new “VALL-E” can mimic anyone’s voice -Artificial intelligence

You may also like...

Artificial Intelligence / Cognitive computing / Internet related News · 2023-01-10

You may also like...

5 popular Instagram images editing apps – e-Why, What & How

Alexa gets masculine voice + more – Digital lifestyle

With ‘Cora’ you can automate social media sharing on iOS for free – New app