Skip to main content

Multimodal Understanding

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
·5843 words·28 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 CUHK MMLab
AV-Odyssey Bench reveals that current multimodal LLMs struggle with basic audio-visual understanding, prompting the development of a comprehensive benchmark for more effective evaluation.