Paper Reviews by AI

ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging

27 March 2025·1815 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers AI Theory Privacy 🏢 Zhejiang University

Model Merging: An unlearning system, which combines specialized models, achieves top results in SemEval-2025 Task 4 by selectively erasing sensitive knowledge.

Video-R1: Reinforcing Video Reasoning in MLLMs

27 March 2025·1632 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 CUHK MMLab

Video-R1: First to explore rule-based RL for video reasoning in MLLMs, enhancing performance on key benchmarks.

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

27 March 2025·3977 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Shanghai Artificial Intelligence Laboratory

VBench 2.0: A new benchmark suite advancing video generation evaluation with intrinsic faithfulness metrics.

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

27 March 2025·2964 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Vivo AI Lab

UI-R1 enhances GUI agents’ action prediction using reinforcement learning.

ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition

27 March 2025·3118 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai Artificial Intelligence Laboratory

ResearchBench: Benchmarking LLMs for Scientific Discovery via Inspiration-Based Task Decomposition.

ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation

27 March 2025·6982 words·33 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Tsinghua University

ReaRAG enhances factuality in large reasoning models (LRMs) by integrating knowledge-guided reasoning with iterative retrieval augmented generation.

Optimal Stepsize for Diffusion Sampling

27 March 2025·3204 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University Chinese Academic of Science

Optimal Stepsize Distillation accelerates diffusion sampling by distilling knowledge from reference trajectories, achieving 10x speedup with minimal performance loss.

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

27 March 2025·3416 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai AI Laboratory

Lumina-Image 2.0: A unified & efficient image generative framework, outperforming previous models with only 2.6B parameters.

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

27 March 2025·2412 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Waterloo

LOCATEdit refines cross-attention maps with graph Laplacian regularization, achieving precise & localized text-guided image editing without artifacts.

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

27 March 2025·2431 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai AI Laboratory

LeX-Art: High-quality text-to-image generation via scalable data synthesis.

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

27 March 2025·2979 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Peking University

This survey presents a methodology-centered taxonomy of LLM agent systems, linking design principles to emergent behaviors and identifying future research directions.

Exploring the Evolution of Physics Cognition in Video Generation: A Survey

27 March 2025·3260 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Huazhong University of Science and Technology

This survey explores the evolution of physics cognition in video generation, addressing the gap between visual realism and physical accuracy.

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

27 March 2025·3498 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Zhejiang University

Embodied-Reasoner: Integrates visual search, reasoning, and action for interactive tasks, outperforming existing models in embodied environments.

ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model

27 March 2025·1950 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Group

ChatAnyone: Stylized real-time portrait video generation with hierarchical motion diffusion model.

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

27 March 2025·5419 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Renmin University of China

OlymMATH: A new Olympiad-level math benchmark rigorously tests LLMs’ reasoning, revealing limitations and paving the way for advancements.

ViLBench: A Suite for Vision-Language Process Reward Modeling

26 March 2025·373 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 UC Santa Cruz

VILBENCH: Vision-Language Process Reward Modeling Suite

Unified Multimodal Discrete Diffusion

26 March 2025·3324 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Generation 🏢 Carnegie Mellon University

UniDisc: a unified multimodal discrete diffusion model for joint text and image generation, surpassing autoregressive models in quality & efficiency!

Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models

26 March 2025·393 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST

Fixing fine-tuned diffusion models! By using richer, unconditional priors, they generate better images and videos.

Synthetic Video Enhances Physical Fidelity in Video Synthesis

26 March 2025·4236 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 ByteDance Seed

Synthetic data can enhance the physical realism of video synthesis, paving the way for more believable generated content.

Open Deep Search: Democratizing Search with Open-source Reasoning Agents

26 March 2025·1746 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 University of Washington

Open Deep Search (ODS): Democratizing Search with Open-source Reasoning Agents.