Microsoft AI voice clones create flawless fake speech in seconds

share to other networks share to twitter share to facebook
An image representing Microsoft’s AI Voice software Vall-E

Straight out of sci-fi, a new Microsoft AI program is capable of cloning a person’s voice with just three seconds of audio. A lá Mission Impossible, or Predator, Microsoft’s AI voice clones are an impressive fear of software, but are they dangerous?

In recent years, AI technology has been used for creating Deepfakes, plastering someone’s face on the body of someone else. While used for entertainment in media such as The Mandalorian, this technology has also been used for political subterfuge and even adult content without permission of likeness.

Advertisement

Microsoft’s AI voice clones may result in the same problems. Dubbed VALL-E, the new technology is able to perfectly replicate a user’s voice with just a three-second sample, and then say anything they want to.

In a detailed report by Ars, Microsoft’s AI is said to be built off Meta‘s EnCodec technology. This means that the audio AI can create speech from prompts based on an analysis of how a person actually speaks. These generations rely on another Meta creation — LibriLight — to create realistic speech.

The quality of Microsoft’s AI voice clones is reportedly very high. Not just a simple synthesis, the clones are able to replicate realistic emotional tone and timbre of their analysed source. Furthermore, the AI can even replicate acoustic environments, such as talking on the phone or yelling in an empty tunnel.

Microsoft is seemingly aware of the dangers its artificial intelligence software poses. As such, the company is not releasing VALL-E to the public in its current form for the foreseeable future.

Advertisement

"Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker,” reads the study paper. “To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models."

Microsoft is not the only company providing AI voice clone services. In fact, there are multiple competing AIs out there that claim to offer a similar level of quality. However, considering Microsoft’s unwillingness to release its program, it seems the tech giants may be multiple steps ahead.