Skip to main content

Speech and Audio

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
·2399 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Speech and Audio Music Generation 🏢 Beihang University
SongGen: Single-stage autoregressive transformer for controllable text-to-song generation, simplifying the process and improving control.
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
·3169 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Speech and Audio Speech Coding 🏢 Concordia University
FocalCodec: a single codebook, low-bitrate speech codec using focal modulation, achieves competitive performance in speech resynthesis and voice conversion.
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
·2407 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Speech and Audio Text-to-Speech 🏢 Chinese University of Hong Kong, Shenzhen
Emilia-Pipe and its resulting datasets, Emilia and Emilia-Large, offer the largest open-source, multilingual speech corpus, enabling more natural and spontaneous AI speech generation.
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
·1883 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Speech and Audio Audio Generation 🏢 Alibaba Group
HiFi-SR: A unified generative network achieves high-fidelity speech super-resolution, outperforming existing methods by seamlessly integrating transformer and convolutional components for end-to-end a…
XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework
·3087 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Speech and Audio Music Generation 🏢 Tencent AI Lab
XMusic: A new framework generates high-quality, emotionally controllable symbolic music from various prompts (images, videos, text, tags, humming).
Whisper-GPT: A Hybrid Representation Audio Large Language Model
·1640 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Speech and Audio Audio Generation 🏢 Stanford University
Whisper-GPT, a hybrid audio LLM, improves music/speech generation by combining audio waveforms and text.