Multimodal Reasoning
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
·4398 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Tsinghua University
video-SALMONN-01: An open-source audio-visual LLM enhances video understanding with a novel reasoning-intensive dataset and the pDPO method, achieving significant accuracy gains.
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
·1563 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Reallm Labs
InfiR: Efficient, small AI models rival larger ones in reasoning, slashing costs and boosting privacy for wider AI use.
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
·3464 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Hong Kong University of Science and Technology
ThinkDiff empowers text-to-image diffusion models with multimodal reasoning by aligning vision-language models to an LLM decoder, achieving state-of-the-art results on in-context reasoning benchmarks.
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
·3250 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Singapore University of Technology and Design
GPT models’ multimodal reasoning abilities are tracked over time on challenging visual puzzles, revealing surprisingly steady improvement and cost trade-offs.
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
·2983 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
Virgo: A new multimodal slow-thinking system, significantly improves MLLM reasoning by fine-tuning with text-based long-form thought data, demonstrating comparable performance to commercial systems.
Diving into Self-Evolving Training for Multimodal Reasoning
·3292 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Hong Kong University of Science and Technology
M-STAR: a novel self-evolving training framework significantly boosts multimodal reasoning in large models without human annotation, achieving state-of-the-art results.
Progressive Multimodal Reasoning via Active Retrieval
·3576 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
AR-MCTS: a novel framework boosting multimodal large language model reasoning by actively retrieving key supporting evidence and using Monte Carlo Tree Search for improved path selection and verificat…