bigvgan_v2_22khz_80band_256x is an open-weight checkpoint for audio to audio, distributed on the HuggingFace Hub. The MIT license keeps bigvgan_v2_22khz_80band_256x unrestricted for commercial reuse. Treat bigvgan_v2_22khz_80band_256x's published metrics as a starting point and validate against your workload.
1,334,929 ↓ · 29 ♡
bigvgan_v2_44khz_128band_512x is an open-weight checkpoint for audio to audio, distributed on the HuggingFace Hub. The MIT license keeps bigvgan_v2_44khz_128band_512x unrestricted for commercial reuse. Evaluate bigvgan_v2_44khz_128band_512x on your own data before trusting it in production.
486,565 ↓ · 75 ♡
PersonaPlex-7B is NVIDIA's speech-to-speech model based on Moshi architecture, supporting real-time audio-to-audio dialog with persona conditioning. At 7B parameters it runs real-time voice conversation including listening and speaking simultaneously. License is 'other' — check NVIDIA's specific terms.
375,413 ↓ · 2,577 ♡
NeuCodec is Neuphonic's neural audio codec designed as a speech tokenizer for TTS and voice generation pipelines. It encodes speech into discrete tokens for use with language model-based TTS architectures.
309,423 ↓ · 108 ♡