Skip to main content

Multimodal Understanding

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
·441 words·3 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 UNC-Chapel Hill
MDocAgent: Multi-agent Doc understanding by integrating text and image for better accuracy.
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models
·4158 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 HIT
PEBench: A new benchmark for machine unlearning in multimodal language models, enhancing secure multimodal model development.
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
·2951 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 Huazhong University of Science & Technology
OmniMamba: Efficient multimodal understanding and generation via SSMs, trained on 2M image-text pairs.
ARMOR v0.1: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy
·2871 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 Nankai University
ARMOR: Empowers MLLMs with interleaved multimodal generation via asymmetric synergy, using limited resources.
Unified Reward Model for Multimodal Understanding and Generation
·368 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 Fudan University
UNIFIEDREWARD: A unified reward model that enhances multimodal understanding and generation!
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
·5843 words·28 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 CUHK MMLab
AV-Odyssey Bench reveals that current multimodal LLMs struggle with basic audio-visual understanding, prompting the development of a comprehensive benchmark for more effective evaluation.