Close Menu
StreamLineCrypto.comStreamLineCrypto.com
  • Home
  • Crypto News
  • Bitcoin
  • Altcoins
  • NFT
  • Defi
  • Blockchain
  • Metaverse
  • Regulations
  • Trading
What's Hot

Sharplink Reiterates Ether Conviction Despite 2025 Market Sell-Off

March 10, 2026

Bitcoin jumps past $70,000 as war volatility fades

March 10, 2026

Bitcoin Supply Pressure Builds As Short-Term Holders Realize Losses Below $70K

March 10, 2026
Facebook X (Twitter) Instagram
Tuesday, March 10 2026
  • Contact Us
  • Privacy Policy
  • Cookie Privacy Policy
  • Terms of Use
  • DMCA
Facebook X (Twitter) Instagram
StreamLineCrypto.comStreamLineCrypto.com
  • Home
  • Crypto News
  • Bitcoin
  • Altcoins
  • NFT
  • Defi
  • Blockchain
  • Metaverse
  • Regulations
  • Trading
StreamLineCrypto.comStreamLineCrypto.com

NVIDIA Unveils BigVGAN v2: Pioneering Zero-Shot Waveform Audio Generation

September 6, 2024Updated:September 12, 2024No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
NVIDIA Unveils BigVGAN v2: Pioneering Zero-Shot Waveform Audio Generation
Share
Facebook Twitter LinkedIn Pinterest Email
ad


Zach Anderson
Sep 06, 2024 11:03

NVIDIA’s BigVGAN v2 units a brand new customary in zero-shot waveform audio era, attaining state-of-the-art high quality with as much as 3x quicker synthesis pace.





NVIDIA has introduced the discharge of BigVGAN v2, a groundbreaking generative AI mannequin for zero-shot waveform audio era, in line with the NVIDIA Technical Weblog. The brand new mannequin delivers vital enhancements in pace and high quality, positioning itself as a state-of-the-art answer within the area of audio generative AI.

BigVGAN: A Common Neural Vocoder

BigVGAN is a common neural vocoder designed to synthesize audio waveforms from Mel spectrograms. The mannequin employs a completely convolutional structure with a number of upsampling blocks and residual dilated convolution layers. A key characteristic is the anti-aliased multiperiodicity composition (AMP) module, which is optimized for producing high-frequency and periodic sound waves, lowering artifacts within the course of.

Enhancements in BigVGAN v2

BigVGAN v2 introduces a number of enhancements over its predecessor:

  • State-of-the-art audio high quality throughout numerous metrics and audio varieties.
  • As much as 3x quicker synthesis pace by optimized CUDA kernels.
  • Pretrained checkpoints for numerous audio configurations.
  • Assist for a sampling charge as much as 44 kHz, overlaying the best frequencies audible to people.

Producing Each Sound within the World

Waveform audio era is essential for digital worlds and has been a big focus of analysis. BigVGAN v2 addresses earlier limitations by delivering high-quality audio with enhanced effective particulars. Skilled utilizing NVIDIA A100 Tensor Core GPUs and a dataset over 100 instances bigger than its predecessor, BigVGAN v2 can generate high-quality sound waves from numerous domains, together with speech, environmental sounds, and music.

Reaching the Highest Frequency Sound the Human Ear Can Detect

Earlier fashions had been restricted to sampling charges between 22 kHz and 24 kHz. BigVGAN v2 extends this vary to 44 kHz, capturing all the human auditory spectrum. This permits the mannequin to breed complete soundscapes, from sturdy drums to crisp cymbals in music.

Quicker Synthesis with Customized CUDA Kernels

BigVGAN v2 additionally options accelerated synthesis pace, utilizing customized CUDA kernels to attain as much as 3x quicker inference than the unique BigVGAN. These kernels allow the era of audio waveforms as much as 240 instances quicker than real-time on a single NVIDIA A100 GPU.

Audio High quality Outcomes

BigVGAN v2 exhibits superior audio high quality for speech and basic audio in comparison with its predecessor, in addition to comparable outcomes to the Descript Audio Codec at a 44 kHz sampling charge. This demonstrates the mannequin’s functionality to supply high-quality waveforms throughout numerous audio varieties.

Conclusion

NVIDIA’s BigVGAN v2 units a brand new benchmark in audio synthesis, attaining state-of-the-art high quality throughout all audio varieties and overlaying the complete vary of human listening to. The mannequin’s synthesis pace is now as much as 3x quicker, making it extremely environment friendly for numerous audio configurations.

For extra data, customers are inspired to evaluation the BigVGAN v2 mannequin card on GitHub.

Picture supply: Shutterstock


ad
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Related Posts

Sharplink Reiterates Ether Conviction Despite 2025 Market Sell-Off

March 10, 2026

Bitcoin jumps past $70,000 as war volatility fades

March 10, 2026

Bitcoin Exchange Reserves Fall To 2019 Levels As ETFs And Corporate Treasuries Accumulate

March 10, 2026

AI Marketing Tools 2026 – From Content Bots to Autonomous Campaign Agents

March 10, 2026
Add A Comment
Leave A Reply Cancel Reply

ad
What's New Here!
Sharplink Reiterates Ether Conviction Despite 2025 Market Sell-Off
March 10, 2026
Bitcoin jumps past $70,000 as war volatility fades
March 10, 2026
Bitcoin Supply Pressure Builds As Short-Term Holders Realize Losses Below $70K
March 10, 2026
Bitcoin Exchange Reserves Fall To 2019 Levels As ETFs And Corporate Treasuries Accumulate
March 10, 2026
AI Marketing Tools 2026 – From Content Bots to Autonomous Campaign Agents
March 10, 2026
Facebook X (Twitter) Instagram Pinterest
  • Contact Us
  • Privacy Policy
  • Cookie Privacy Policy
  • Terms of Use
  • DMCA
© 2026 StreamlineCrypto.com - All Rights Reserved!

Type above and press Enter to search. Press Esc to cancel.