audio to audio models

4 models · ranked by HuggingFace downloads

bigvgan_v2_22khz_80band_256x

bigvgan_v2_22khz_80band_256x is an open-weight checkpoint for audio to audio, distributed on the HuggingFace Hub. The MIT license keeps bigvgan_v2_22khz_80band_256x unrestricted for commercial reuse. Treat bigvgan_v2_22khz_80band_256x's published metrics as a starting point and validate against your workload.

1,334,929 ↓ · 29 ♡

bigvgan_v2_44khz_128band_512x

bigvgan_v2_44khz_128band_512x is an open-weight checkpoint for audio to audio, distributed on the HuggingFace Hub. The MIT license keeps bigvgan_v2_44khz_128band_512x unrestricted for commercial reuse. Evaluate bigvgan_v2_44khz_128band_512x on your own data before trusting it in production.

486,565 ↓ · 75 ♡

personaplex-7b-v1

PersonaPlex-7B is NVIDIA's speech-to-speech model based on Moshi architecture, supporting real-time audio-to-audio dialog with persona conditioning. At 7B parameters it runs real-time voice conversation including listening and speaking simultaneously. License is 'other' — check NVIDIA's specific terms.

375,413 ↓ · 2,577 ♡

neucodec

NeuCodec is Neuphonic's neural audio codec designed as a speech tokenizer for TTS and voice generation pipelines. It encodes speech into discrete tokens for use with language model-based TTS architectures.

309,423 ↓ · 108 ♡