Microsoft VALL-E Clones Your Voice in 3 Seconds

Microsoft VALL-E Clones Your Voice in 3 Seconds
Kaustubh Katdare

Kaustubh Katdare

@thebigk Oct 27, 2024

AI's turned into a mimicry artist. Microsoft's VALL-E, a neural code language model can learn to clone any voice in just 3 seconds. The AI model can work on a small audio clip of the target speaker and train itself to synthesize high-quality, personalised speech.

Microsoft engineers have trained VALL-E on 60K hours of data, which is 100x larger than any existing system used for text to speech synthesis (TTS).

The research paper indicates that VALL-E can preserve the naturalness and emotions and acoustic environment of the target speaker.

Welcome, guest

Join CrazyEngineers to reply, ask questions, and participate in conversations.

CrazyEngineers powered by Jatra Community Platform