Skip to main content

Paper Reviews by AI

2025

DreamRelation: Relation-Centric Video Customization
·2731 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Fudan University
DreamRelation: Personalize videos by customizing relationships between subjects, generalizing to new domains.
WildIFEval: Instruction Following in the Wild
·2601 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Hebrew University of Jerusalem
WILDIFEVAL: Instruction Following in the Wild.
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
·2597 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Zhongguancun Laboratory
VisualSimpleQA: A new benchmark for fine-grained evaluation of visual and linguistic modules in fact-seeking LVLMs.
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
·2686 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 CUHK
Seg-Zero: Cognitive Reinforcement for Reasoning-Chain Guided Segmentation!
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
·2549 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 WHU
ProJudge: MLLM judges’ benchmark for sci-reasoning & instruction-tuning data to boost performance!
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
·4283 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology
TDM: a new diffusion distillation paradigm unifying trajectory distillation and distribution matching, surpassing teachers in a data-free manner with state-of-the-art performance and low training cost…
DiffCLIP: Differential Attention Meets CLIP
·2247 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 KAUST
DiffCLIP: Enhancing CLIP models by integrating differential attention, achieving superior performance with minimal overhead.
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
·6125 words·29 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Machine Translation 🏢 NLP Lab, Northeastern University, Shenyang, China
LLMs as MT encoders enhance efficiency & generalization!
ARMOR v0.1: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy
·2871 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 Nankai University
ARMOR: Empowers MLLMs with interleaved multimodal generation via asymmetric synergy, using limited resources.
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
·2695 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Audio-Visual Learning 🏢 Imperial College London
Llama-MTSK: AVSR via Matryoshka LLMs, adapting to computational limits without sacrificing accuracy!
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
·4887 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 IEIT System Co., Ltd.
DropletVideo: A dataset and approach to explore integral spatio-temporal consistent video generation.
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling
·1373 words·7 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Not Available
Rimer: RWKV-7 empowers superior time series modeling, offering a simple yet effective alternative to Transformers with fewer parameters.
WritingBench: A Comprehensive Benchmark for Generative Writing
·4038 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Alibaba Group
WritingBench: A new benchmark for generative writing evaluation, enhancing LLMs across diverse domains.
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
·3223 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong
VideoPainter: Edit any video, any length, with user-guided instructions!
Unified Reward Model for Multimodal Understanding and Generation
·368 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Understanding 🏢 Fudan University
UNIFIEDREWARD: A unified reward model that enhances multimodal understanding and generation!
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
·2590 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong
TrajectoryCrafter: Precisely control camera movement in monocular videos with a novel diffusion model for coherent 4D content generation.
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
·1708 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KAIST
Sketch-of-Thought(SoT) reduces LLM token usage by up to 76% while maintaining (or improving) accuracy via cognitive-inspired sketching.
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
·3585 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Renmin University of China
R1-Searcher: RL enhances LLMs by incentivizing autonomous search, outperforming RAG methods, even GPT-4o-mini!
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
·1187 words·6 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Alibaba Group
R1-Omni: RLVR enhances multimodal emotion recognition, boosting reasoning and generalization.
Multi Agent based Medical Assistant for Edge Devices
·2191 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Healthcare 🏢 Samsung Research
On-device multi-agent system overcomes privacy/latency issues in healthcare, enabling personalized, scalable AI assistance.