Speech and Audio
Structured Multi-Track Accompaniment Arrangement via Style Prior Modelling
·3908 words·19 mins·
loading
·
loading
AI Generated
Speech and Audio
Music Generation
🏢 Institute of Data Science, NUS
This AI system generates high-quality multi-track music arrangements from simple lead sheets using a novel style prior modeling approach, significantly improving both efficiency and musical coherence.
Spike-based Neuromorphic Model for Sound Source Localization
·2030 words·10 mins·
loading
·
loading
Speech and Audio
Sound Classification
🏢 University of Electronic Science and Technology of China
Energy-efficient neuromorphic SSL model achieves state-of-the-art accuracy and robustness using Resonate-and-Fire neurons and a novel multi-auditory attention module.
SongCreator: Lyrics-based Universal Song Generation
·3881 words·19 mins·
loading
·
loading
AI Generated
Speech and Audio
Music Generation
🏢 Shenzhen International Graduate School, Tsinghua University
SongCreator: a novel AI system generates complete, high-quality songs from lyrics, surpassing existing methods in lyrics-to-song and lyrics-to-vocals generation.
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
·1964 words·10 mins·
loading
·
loading
Speech and Audio
Speaker Recognition
🏢 Reality Defender Inc.
SLIM: A novel audio deepfake detection model leverages style-linguistics mismatch for superior generalization and explainability.
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation
·3807 words·18 mins·
loading
·
loading
AI Generated
Speech and Audio
Speech Recognition
🏢 Sogang University
SepReformer: Asymmetric encoder-decoder model for efficient speech separation, achieving state-of-the-art performance with less computation.
SCOREQ: Speech Quality Assessment with Contrastive Regression
·2555 words·12 mins·
loading
·
loading
Speech and Audio
Speech Quality Assessment
🏢 University College Dublin
SCOREQ: a novel triplet loss contrastive regression approach for superior speech quality prediction, addressing generalization issues in no-reference metrics.
FINALLY: fast and universal speech enhancement with studio-like quality
·2546 words·12 mins·
loading
·
loading
Speech and Audio
Audio Enhancement
🏢 Samsung Research
FINALLY achieves studio-like speech enhancement speed and quality using a novel GAN-based approach with WavLM-integrated perceptual loss, outperforming existing diffusion models.
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing
·2545 words·12 mins·
loading
·
loading
Speech and Audio
Speech Recognition
🏢 Johns Hopkins University
CA-SSLR: a novel self-supervised learning model dynamically adapts to various speech tasks by integrating language and speaker embeddings, improving performance and reducing reliance on audio features…
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing
·2129 words·10 mins·
loading
·
loading
Speech and Audio
Speaker Recognition
🏢 Telecom Paris
Annealed Multiple Choice Learning (aMCL) overcomes limitations of Winner-takes-all in multiple choice learning by using annealing, improving robustness and performance.
Acoustic Volume Rendering for Neural Impulse Response Fields
·2052 words·10 mins·
loading
·
loading
Speech and Audio
Acoustic Scene Analysis
🏢 University of Pennsylvania
Acoustic Volume Rendering (AVR) revolutionizes realistic audio synthesis by adapting volume rendering to model acoustic impulse responses, achieving state-of-the-art performance in novel pose synthesi…