🏢 CUHK MMLab
Video-R1: Reinforcing Video Reasoning in MLLMs
·1632 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 CUHK MMLab
Video-R1: First to explore rule-based RL for video reasoning in MLLMs, enhancing performance on key benchmarks.
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
·2532 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 CUHK MMLab
GoT: Reasoning guides vivid image generation and editing!
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
·3185 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 CUHK MMLab
EasyRef uses multimodal LLMs to generate images from multiple references, overcoming limitations of prior methods by capturing consistent visual elements and offering improved zero-shot generalization…
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
·5843 words·28 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Understanding
🏢 CUHK MMLab
AV-Odyssey Bench reveals that current multimodal LLMs struggle with basic audio-visual understanding, prompting the development of a comprehensive benchmark for more effective evaluation.