Paper Reviews by AI
2025
MaRI: Material Retrieval Integration across Domains
·2119 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Electronic Science and Technology of China
MaRI: Accurately retrieves textures from images by bridging the gap between visual representations and material properties across diverse domains.
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization
·2300 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Hong Kong University of Science and Technology
LightGen: Efficient image generation via knowledge distillation and direct preference optimization.
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
·2477 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
GTR: Prevents thought collapse in RL-based VLM agents by process guidance, enhancing performance in complex visual reasoning tasks.
BiasEdit: Debiasing Stereotyped Language Models via Model Editing
·2942 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of California, San Diego
BIASEDIT: Efficiently debiasing language models via lightweight network edits!
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models
·2590 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 KAIST, Visual Media Lab
AnyMoLe: Generate character motion in-between frames for diverse characters by video diffusion models without external data. Code: project page.
AI-native Memory 2.0: Second Me
·1327 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Human-AI Interaction
🏢 Mindverse.ai
AI-native memory 2.0 presents second me, an AI system for personal knowledge management.
$^R$FLAV: Rolling Flow matching for infinite Audio Video generation
·2128 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Audio-Visual Learning
🏢 University of Parma
RFLAV: A novel rolling flow matching model for infinite audio-video generation with high quality, synchronization, and temporal coherence.
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
·3702 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
WISE: Evaluates world knowledge in text-to-image generation.
Video Action Differencing
·3793 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Stanford
VidDiff: Identify subtle action differences in videos for coaching and skill learning.
Should VLMs be Pre-trained with Image Data?
·3469 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Toyota Research Institute
Image data during pre-training can boost Vision-Language Model (VLM) performance, especially when introduced later in the process.
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
·3772 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ByteDance
Seedream 2.0: A native Chinese-English bilingual image generation model that understands cultural nuances and excels in text rendering.
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
·3962 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Renmin University of China
SEAP: Unlock LLM brainpower w/ training-free sparse expert activation pruning! Boost efficiency, keep accuracy. Optimize LLMs now!
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
·2040 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ByteDance Inc.
RayFlow: Accelerating diffusion with instance-aware adaptive flow, boosting speed & quality!
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
·4256 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Samsung Research
PLADIS: Sparsity boosts attention for diffusion models, enhancing text-to-image generation at inference time!
PE3R: Perception-Efficient 3D Reconstruction
·2061 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 National University of Singapore
PE3R: Achieves fast and accurate 3D scene reconstruction from 2D images by enhanced perception and efficiency.
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
·4375 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Carnegie Mellon University
LLMs can now reason more efficiently!
Motion Anything: Any to Motion Generation
·7987 words·38 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Generation
🏢 ANU
Motion Anything: control human motion generation with multimodal conditions like text and music.
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
·2900 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Healthcare
🏢 Yale University
MEDAGENTSBENCH: a new benchmark for assessing complex medical reasoning in LLMs, revealing performance gaps and cost-effective strategies.
Effective and Efficient Masked Image Generation Models
·4167 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Renmin University of China
eMIGM: A unified, efficient masked image generation model achieving state-of-the-art performance with fewer resources.
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
·2653 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tiamat AI
EasyControl: Efficient & flexible control for Diffusion Transformers, enabling sophisticated image generation.