Microsoft VALL-E Clones Your Voice in 3 Seconds

AI's turned into a mimicry artist. Microsoft's VALL-E, a neural code language model can learn to clone any voice in just 3 seconds. The AI model can work on a small audio clip of the target speaker and train itself to synthesize high-quality, personalised speech.
Microsoft engineers have trained VALL-E on 60K hours of data, which is 100x larger than any existing system used for text to speech synthesis (TTS).
The research paper indicates that VALL-E can preserve the naturalness and emotions and acoustic environment of the target speaker.
0