NVIDIA Unveils BigVGAN v2: Pioneering Zero-Shot Waveform Audio Generation

Zach Anderson
Sep 06, 2024 11:03

NVIDIA’s BigVGAN v2 units a brand new customary in zero-shot waveform audio era, attaining state-of-the-art high quality with as much as 3x quicker synthesis pace.

NVIDIA has introduced the discharge of BigVGAN v2, a groundbreaking generative AI mannequin for zero-shot waveform audio era, in line with the NVIDIA Technical Weblog. The brand new mannequin delivers vital enhancements in pace and high quality, positioning itself as a state-of-the-art answer within the area of audio generative AI.

BigVGAN: A Common Neural Vocoder

BigVGAN is a common neural vocoder designed to synthesize audio waveforms from Mel spectrograms. The mannequin employs a completely convolutional structure with a number of upsampling blocks and residual dilated convolution layers. A key characteristic is the anti-aliased multiperiodicity composition (AMP) module, which is optimized for producing high-frequency and periodic sound waves, lowering artifacts within the course of.

Enhancements in BigVGAN v2

BigVGAN v2 introduces a number of enhancements over its predecessor:

State-of-the-art audio high quality throughout numerous metrics and audio varieties.
As much as 3x quicker synthesis pace by optimized CUDA kernels.
Pretrained checkpoints for numerous audio configurations.
Assist for a sampling charge as much as 44 kHz, overlaying the best frequencies audible to people.

Producing Each Sound within the World

Waveform audio era is essential for digital worlds and has been a big focus of analysis. BigVGAN v2 addresses earlier limitations by delivering high-quality audio with enhanced effective particulars. Skilled utilizing NVIDIA A100 Tensor Core GPUs and a dataset over 100 instances bigger than its predecessor, BigVGAN v2 can generate high-quality sound waves from numerous domains, together with speech, environmental sounds, and music.

Reaching the Highest Frequency Sound the Human Ear Can Detect

Earlier fashions had been restricted to sampling charges between 22 kHz and 24 kHz. BigVGAN v2 extends this vary to 44 kHz, capturing all the human auditory spectrum. This permits the mannequin to breed complete soundscapes, from sturdy drums to crisp cymbals in music.

Quicker Synthesis with Customized CUDA Kernels

BigVGAN v2 additionally options accelerated synthesis pace, utilizing customized CUDA kernels to attain as much as 3x quicker inference than the unique BigVGAN. These kernels allow the era of audio waveforms as much as 240 instances quicker than real-time on a single NVIDIA A100 GPU.

Audio High quality Outcomes

BigVGAN v2 exhibits superior audio high quality for speech and basic audio in comparison with its predecessor, in addition to comparable outcomes to the Descript Audio Codec at a 44 kHz sampling charge. This demonstrates the mannequin’s functionality to supply high-quality waveforms throughout numerous audio varieties.

Conclusion

NVIDIA’s BigVGAN v2 units a brand new benchmark in audio synthesis, attaining state-of-the-art high quality throughout all audio varieties and overlaying the complete vary of human listening to. The mannequin’s synthesis pace is now as much as 3x quicker, making it extremely environment friendly for numerous audio configurations.

For extra data, customers are inspired to evaluation the BigVGAN v2 mannequin card on GitHub.

Picture supply: Shutterstock

What's Hot

Solana price risks $70 drop as buyers retreat

Bithumb Sets 2028 IPO Target, Plans Internal-Control Overhaul

Solo Bitcoin (BTC) miner nets $200,000 as Coldcard wallet hack rocks sentiment: Crypto Daily

NVIDIA Unveils BigVGAN v2: Pioneering Zero-Shot Waveform Audio Generation

Bithumb Sets 2028 IPO Target, Plans Internal-Control Overhaul

Solo Bitcoin (BTC) miner nets $200,000 as Coldcard wallet hack rocks sentiment: Crypto Daily

Bitcoin cold-wallet losses may near $114 million as possible fourth sweep emerges

Four unpatched bugs, a 5-year quantum clock, and a miner standoff are pushing Bitcoin to a critical crossroad

Solana price risks $70 drop as buyers retreat

Bithumb Sets 2028 IPO Target, Plans Internal-Control Overhaul

Solo Bitcoin (BTC) miner nets $200,000 as Coldcard wallet hack rocks sentiment: Crypto Daily

Bitget to end crypto services for Japan residents after regulatory warnings

Backpack Exchange Lists TRX Spot And Perpetual Markets

What's Hot

NVIDIA Unveils BigVGAN v2: Pioneering Zero-Shot Waveform Audio Generation

BigVGAN: A Common Neural Vocoder

Enhancements in BigVGAN v2

Producing Each Sound within the World

Reaching the Highest Frequency Sound the Human Ear Can Detect

Quicker Synthesis with Customized CUDA Kernels

Audio High quality Outcomes

Conclusion

Related Posts