Audio-Visual Learning
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
·5958 words·28 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Audio-Visual Learning
🏢 Grad. School of AI, POSTECH
New metrics and representation enhance 3D talking head realism by focusing on perceptual lip synchronization.
Long-Video Audio Synthesis with Multi-Agent Collaboration
·2152 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Audio-Visual Learning
🏢 Hong Kong University of Science and Technology
LVAS-Agent: Multi-agent system conquers long-video audio synthesis with collaborative dubbing, script, design, & more!
$^R$FLAV: Rolling Flow matching for infinite Audio Video generation
·2128 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Audio-Visual Learning
🏢 University of Parma
RFLAV: A novel rolling flow matching model for infinite audio-video generation with high quality, synchronization, and temporal coherence.
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
·2695 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Audio-Visual Learning
🏢 Imperial College London
Llama-MTSK: AVSR via Matryoshka LLMs, adapting to computational limits without sacrificing accuracy!