↓Skip to main content

🏢 CUHK MMLab

Video-R1: Reinforcing Video Reasoning in MLLMs

27 March 2025·1632 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 CUHK MMLab

Video-R1: First to explore rule-based RL for video reasoning in MLLMs, enhancing performance on key benchmarks.

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

13 March 2025·2532 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 CUHK MMLab

GoT: Reasoning guides vivid image generation and editing!

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

12 December 2024·3185 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 CUHK MMLab

EasyRef uses multimodal LLMs to generate images from multiple references, overcoming limitations of prior methods by capturing consistent visual elements and offering improved zero-shot generalization…

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

3 December 2024·5843 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 CUHK MMLab

AV-Odyssey Bench reveals that current multimodal LLMs struggle with basic audio-visual understanding, prompting the development of a comprehensive benchmark for more effective evaluation.