Close Menu
StreamLineCrypto.comStreamLineCrypto.com
  • Home
  • Crypto News
  • Bitcoin
  • Altcoins
  • NFT
  • Defi
  • Blockchain
  • Metaverse
  • Regulations
  • Trading
What's Hot

Bitcoin Hyper ($HYPER) – The 100x Play as Bitcoin Mirrors Gold’s Breakout

June 22, 2025

Why the Rolex rally is on pause in 2025

June 22, 2025

Bitcoin Traders Eye Future BTC Price Gains Despite Iran Nuclear Strikes Keeping Sellers in Control

June 22, 2025
Facebook X (Twitter) Instagram
Sunday, June 22 2025
  • Contact Us
  • Privacy Policy
  • Cookie Privacy Policy
  • Terms of Use
  • DMCA
Facebook X (Twitter) Instagram
StreamLineCrypto.comStreamLineCrypto.com
  • Home
  • Crypto News
  • Bitcoin
  • Altcoins
  • NFT
  • Defi
  • Blockchain
  • Metaverse
  • Regulations
  • Trading
StreamLineCrypto.comStreamLineCrypto.com

NVIDIA Unveils BigVGAN v2: Pioneering Zero-Shot Waveform Audio Generation

September 6, 2024Updated:September 12, 2024No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
NVIDIA Unveils BigVGAN v2: Pioneering Zero-Shot Waveform Audio Generation
Share
Facebook Twitter LinkedIn Pinterest Email
ad


Zach Anderson
Sep 06, 2024 11:03

NVIDIA’s BigVGAN v2 units a brand new customary in zero-shot waveform audio era, attaining state-of-the-art high quality with as much as 3x quicker synthesis pace.





NVIDIA has introduced the discharge of BigVGAN v2, a groundbreaking generative AI mannequin for zero-shot waveform audio era, in line with the NVIDIA Technical Weblog. The brand new mannequin delivers vital enhancements in pace and high quality, positioning itself as a state-of-the-art answer within the area of audio generative AI.

BigVGAN: A Common Neural Vocoder

BigVGAN is a common neural vocoder designed to synthesize audio waveforms from Mel spectrograms. The mannequin employs a completely convolutional structure with a number of upsampling blocks and residual dilated convolution layers. A key characteristic is the anti-aliased multiperiodicity composition (AMP) module, which is optimized for producing high-frequency and periodic sound waves, lowering artifacts within the course of.

Enhancements in BigVGAN v2

BigVGAN v2 introduces a number of enhancements over its predecessor:

  • State-of-the-art audio high quality throughout numerous metrics and audio varieties.
  • As much as 3x quicker synthesis pace by optimized CUDA kernels.
  • Pretrained checkpoints for numerous audio configurations.
  • Assist for a sampling charge as much as 44 kHz, overlaying the best frequencies audible to people.

Producing Each Sound within the World

Waveform audio era is essential for digital worlds and has been a big focus of analysis. BigVGAN v2 addresses earlier limitations by delivering high-quality audio with enhanced effective particulars. Skilled utilizing NVIDIA A100 Tensor Core GPUs and a dataset over 100 instances bigger than its predecessor, BigVGAN v2 can generate high-quality sound waves from numerous domains, together with speech, environmental sounds, and music.

Reaching the Highest Frequency Sound the Human Ear Can Detect

Earlier fashions had been restricted to sampling charges between 22 kHz and 24 kHz. BigVGAN v2 extends this vary to 44 kHz, capturing all the human auditory spectrum. This permits the mannequin to breed complete soundscapes, from sturdy drums to crisp cymbals in music.

Quicker Synthesis with Customized CUDA Kernels

BigVGAN v2 additionally options accelerated synthesis pace, utilizing customized CUDA kernels to attain as much as 3x quicker inference than the unique BigVGAN. These kernels allow the era of audio waveforms as much as 240 instances quicker than real-time on a single NVIDIA A100 GPU.

Audio High quality Outcomes

BigVGAN v2 exhibits superior audio high quality for speech and basic audio in comparison with its predecessor, in addition to comparable outcomes to the Descript Audio Codec at a 44 kHz sampling charge. This demonstrates the mannequin’s functionality to supply high-quality waveforms throughout numerous audio varieties.

Conclusion

NVIDIA’s BigVGAN v2 units a brand new benchmark in audio synthesis, attaining state-of-the-art high quality throughout all audio varieties and overlaying the complete vary of human listening to. The mannequin’s synthesis pace is now as much as 3x quicker, making it extremely environment friendly for numerous audio configurations.

For extra data, customers are inspired to evaluation the BigVGAN v2 mannequin card on GitHub.

Picture supply: Shutterstock


ad
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Related Posts

Why the Rolex rally is on pause in 2025

June 22, 2025

Bitcoin Traders Eye Future BTC Price Gains Despite Iran Nuclear Strikes Keeping Sellers in Control

June 22, 2025

ADA Falls on Spread of War Concerns Despite Ford Advising on Cardano-Based Project

June 22, 2025

Solana Cracks Below Key Structure – Head And Shoulders Breakdown Points To $106

June 22, 2025
Add A Comment
Leave A Reply Cancel Reply

ad
What's New Here!
Bitcoin Hyper ($HYPER) – The 100x Play as Bitcoin Mirrors Gold’s Breakout
June 22, 2025
Why the Rolex rally is on pause in 2025
June 22, 2025
Bitcoin Traders Eye Future BTC Price Gains Despite Iran Nuclear Strikes Keeping Sellers in Control
June 22, 2025
Bitcoin Dominance Breaks Previous High As MidEast Conflict Escalates – Altcoins Under Pressure
June 22, 2025
ADA Falls on Spread of War Concerns Despite Ford Advising on Cardano-Based Project
June 22, 2025
Facebook X (Twitter) Instagram Pinterest
  • Contact Us
  • Privacy Policy
  • Cookie Privacy Policy
  • Terms of Use
  • DMCA
© 2025 StreamlineCrypto.com - All Rights Reserved!

Type above and press Enter to search. Press Esc to cancel.