Nvidia's AI Model Turns Text and Audio into Music Magic

Nvidia's AI Model Fugatto
Credit: Nvidia | Free use for promotional purposes

Nvidia's AI Model Fugatto
Credit: Nvidia | Free use for promotional purposes

Nvidia has unveiled Fugatto, a revolutionary generative audio AI model that can transform the world of sound. This groundbreaking technology, which stands for Foundational Generative Audio Transformer Opus 1, is a significant leap in audio AI, blending human-like sound comprehension with unparalleled creative adaptability.

Fugatto audio AI
expand image
Credit: Nvidia | Free use for promotional purposes

This generative audio AI model offers features that go well beyond traditional audio tools. It can create music snippets from text descriptions, edit existing tracks by adding or removing instruments, and change vocal accents or emotions. Fugatto allows users to create whole new sounds, like a trumpet barking or a saxophone meowing, pushing the creative boundaries in music production, advertising, gaming, and education.

According to Rafael Valle, Nvidia's Applied Audio Research manager, Fugatto is a step towards unsupervised multitask learning in audio synthesis and transformation. Its characteristics allow it to complete complex tasks by using previously learned talents. The model leverages CompostableART, a technology that will enable precise control over features like combining a french accent with a sorrowful tone generation.

Nvidia Fugatto
expand image
Credit: Nvidia | Free use for promotional purposes

With its adaptability, Fugatto becomes a versatile tool across various sectors. Music makers can use it to create and edit compositions in multiple styles or refine audio with added effects. Advertisers can adapt and add regional accents and emotional nuances, while video game makers can create dynamic soundscapes that adapt to gameplay. In education, Fugatto can customize language learning tools by mimicking recognizable voices to create a more relatable experience.

Fugatto's technical foundation is just as impressive. It has 2.5 billion parameters and was trained using Nvidia DGX systems with 32 H100 GPUs, leveraging a vast data set of millions of audio samples. This training allows it to create new sounds like rain storms changing into singing birds.

Audio Fugatto
expand image
Credit: Nvidia | Free use for promotional purposes

The AI's ability to create time-sensitive noises, such as escalating thunderstorms, adds to its charms. Users may additionally combine these features in creative ways, allowing them to control the emphasis placed on certain sound qualities. Nvidia's AI researcher Rohan Badlani compares the experience Fugatto offers to be artistic, despite his background in computer science.

Fugatto's debut is a significant turning point in the history of music and sound technology. As multi-platinum producer Ido Zmishlany notes, "With AI, we're writing the next chapter of music. This is a new instrument for creating and reimagining sound, and it's fascinating."