Speech and Audio

Structured Multi-Track Accompaniment Arrangement via Style Prior Modelling

26 September 2024·3908 words·19 mins· loading · loading

AI Generated Speech and Audio Music Generation 🏢 Institute of Data Science, NUS

This AI system generates high-quality multi-track music arrangements from simple lead sheets using a novel style prior modeling approach, significantly improving both efficiency and musical coherence.

Spike-based Neuromorphic Model for Sound Source Localization

26 September 2024·2030 words·10 mins· loading · loading

Speech and Audio Sound Classification 🏢 University of Electronic Science and Technology of China

Energy-efficient neuromorphic SSL model achieves state-of-the-art accuracy and robustness using Resonate-and-Fire neurons and a novel multi-auditory attention module.

SongCreator: Lyrics-based Universal Song Generation

26 September 2024·3881 words·19 mins· loading · loading

AI Generated Speech and Audio Music Generation 🏢 Shenzhen International Graduate School, Tsinghua University

SongCreator: a novel AI system generates complete, high-quality songs from lyrics, surpassing existing methods in lyrics-to-song and lyrics-to-vocals generation.

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection

26 September 2024·1964 words·10 mins· loading · loading

Speech and Audio Speaker Recognition 🏢 Reality Defender Inc.

SLIM: A novel audio deepfake detection model leverages style-linguistics mismatch for superior generalization and explainability.

Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation

26 September 2024·3807 words·18 mins· loading · loading

AI Generated Speech and Audio Speech Recognition 🏢 Sogang University

SepReformer: Asymmetric encoder-decoder model for efficient speech separation, achieving state-of-the-art performance with less computation.

SCOREQ: Speech Quality Assessment with Contrastive Regression

26 September 2024·2555 words·12 mins· loading · loading

Speech and Audio Speech Quality Assessment 🏢 University College Dublin

SCOREQ: a novel triplet loss contrastive regression approach for superior speech quality prediction, addressing generalization issues in no-reference metrics.

FINALLY: fast and universal speech enhancement with studio-like quality

26 September 2024·2546 words·12 mins· loading · loading

Speech and Audio Audio Enhancement 🏢 Samsung Research

FINALLY achieves studio-like speech enhancement speed and quality using a novel GAN-based approach with WavLM-integrated perceptual loss, outperforming existing diffusion models.

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

26 September 2024·2545 words·12 mins· loading · loading

Speech and Audio Speech Recognition 🏢 Johns Hopkins University

CA-SSLR: a novel self-supervised learning model dynamically adapts to various speech tasks by integrating language and speaker embeddings, improving performance and reducing reliance on audio features…

Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing

26 September 2024·2129 words·10 mins· loading · loading

Speech and Audio Speaker Recognition 🏢 Telecom Paris

Annealed Multiple Choice Learning (aMCL) overcomes limitations of Winner-takes-all in multiple choice learning by using annealing, improving robustness and performance.

Acoustic Volume Rendering for Neural Impulse Response Fields

26 September 2024·2052 words·10 mins· loading · loading

Speech and Audio Acoustic Scene Analysis 🏢 University of Pennsylvania

Acoustic Volume Rendering (AVR) revolutionizes realistic audio synthesis by adapting volume rendering to model acoustic impulse responses, achieving state-of-the-art performance in novel pose synthesi…