Skip to main content

Paper Reviews by AI

2025

MaRI: Material Retrieval Integration across Domains
·2119 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Electronic Science and Technology of China
MaRI: Accurately retrieves textures from images by bridging the gap between visual representations and material properties across diverse domains.
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization
·2300 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology
LightGen: Efficient image generation via knowledge distillation and direct preference optimization.
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
·2477 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University
GTR: Prevents thought collapse in RL-based VLM agents by process guidance, enhancing performance in complex visual reasoning tasks.
BiasEdit: Debiasing Stereotyped Language Models via Model Editing
·2942 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of California, San Diego
BIASEDIT: Efficiently debiasing language models via lightweight network edits!
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models
·2590 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 KAIST, Visual Media Lab
AnyMoLe: Generate character motion in-between frames for diverse characters by video diffusion models without external data. Code: project page.
AI-native Memory 2.0: Second Me
·1327 words·7 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Human-AI Interaction 🏢 Mindverse.ai
AI-native memory 2.0 presents second me, an AI system for personal knowledge management.
$^R$FLAV: Rolling Flow matching for infinite Audio Video generation
·2128 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Audio-Visual Learning 🏢 University of Parma
RFLAV: A novel rolling flow matching model for infinite audio-video generation with high quality, synchronization, and temporal coherence.
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
·3702 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University
WISE: Evaluates world knowledge in text-to-image generation.
Video Action Differencing
·3793 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Stanford
VidDiff: Identify subtle action differences in videos for coaching and skill learning.
Should VLMs be Pre-trained with Image Data?
·3469 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Toyota Research Institute
Image data during pre-training can boost Vision-Language Model (VLM) performance, especially when introduced later in the process.
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
·3772 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance
Seedream 2.0: A native Chinese-English bilingual image generation model that understands cultural nuances and excels in text rendering.
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
·3962 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Renmin University of China
SEAP: Unlock LLM brainpower w/ training-free sparse expert activation pruning! Boost efficiency, keep accuracy. Optimize LLMs now!
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
·2040 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Inc.
RayFlow: Accelerating diffusion with instance-aware adaptive flow, boosting speed & quality!
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
·4256 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Samsung Research
PLADIS: Sparsity boosts attention for diffusion models, enhancing text-to-image generation at inference time!
PE3R: Perception-Efficient 3D Reconstruction
·2061 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore
PE3R: Achieves fast and accurate 3D scene reconstruction from 2D images by enhanced perception and efficiency.
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
·4375 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University
LLMs can now reason more efficiently!
Motion Anything: Any to Motion Generation
·7987 words·38 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Generation 🏢 ANU
Motion Anything: control human motion generation with multimodal conditions like text and music.
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
·2900 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Healthcare 🏢 Yale University
MEDAGENTSBENCH: a new benchmark for assessing complex medical reasoning in LLMs, revealing performance gaps and cost-effective strategies.
Effective and Efficient Masked Image Generation Models
·4167 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Renmin University of China
eMIGM: A unified, efficient masked image generation model achieving state-of-the-art performance with fewer resources.
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
·2653 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tiamat AI
EasyControl: Efficient & flexible control for Diffusion Transformers, enabling sophisticated image generation.