🏢 Imperial College
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
·2851 words·14 mins·
loading
·
loading
Multimodal Learning
Audio-Visual Learning
🏢 Imperial College
One model to rule them all! This paper introduces Unified Speech Recognition (USR), a single model trained for auditory, visual, and audiovisual speech recognition, achieving state-of-the-art results …