Paper Reviews by AI

Verbal Process Supervision Elicits Better Coding Agents

24 March 2025·1306 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Mindify AI, United States

CURA: Verbal process supervision improves coding agents.

Training-free Diffusion Acceleration with Bottleneck Sampling

24 March 2025·3305 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

Bottleneck Sampling: Accelerate diffusion models without retraining by cleverly using low-resolution priors for efficient inference!

MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse

24 March 2025·1703 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Northwestern University

MetaSpatial: RL for 3D Spatial Reasoning in VLMs

LookAhead Tuning: Safer Language Models via Partial Answer Previews

24 March 2025·2175 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers AI Theory Safety 🏢 Zhejiang University

LookAhead Tuning: Safer LLMs via Partial Answer Previews by preserving initial token distributions.

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

24 March 2025·1777 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Yonsei University

LSRNA: Super-resolution in latent space enhances image generation with diffusion models, achieving faster speeds and improved detail.

I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

24 March 2025·3290 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers AI Theory Interpretability 🏢 AIRI

LLMs’ reasoning is decoded via sparse autoencoders, revealing key features that, when steered, enhance performance. First mechanistic account of reasoning in LLMs!

FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images

24 March 2025·3848 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Australian National University

FRESA: fast feedforward 3D personalized avatar creation from few images.

Frequency Dynamic Convolution for Dense Image Prediction

24 March 2025·1612 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Beijing Institute of Technology

FDConv: Adaptable convolution via frequency domain learning, enhancing performance without heavy parameter cost.

FFN Fusion: Rethinking Sequential Computation in Large Language Models

24 March 2025·3776 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA

FFN Fusion: Parallelizing sequential computation in large language models for significant speedups!

Equivariant Image Modeling

24 March 2025·3413 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China

Aligning image generation subtasks: Equivariant modeling boosts efficiency and generalization by leveraging natural visual signal invariance.

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

24 March 2025·3661 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Beihang University

Diffusion-4K: Synthesizing ultra-high-resolution images with a new benchmark dataset and wavelet-based fine-tuning that makes 4K image creation more detailed and accessible!

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

24 March 2025·3612 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University

COMP: Continually pre-training Vision Foundation Models for better vision and language alignment and arbitrary size inputs.

CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models

24 March 2025·3380 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 S-Lab, Nanyang Technological University

CFG-Zero*: A better Classifier-Free Guidance to improve the image quality and text alignment in Flow Matching models.

AMD-Hummingbird: Towards an Efficient Text-to-Video Model

24 March 2025·739 words·4 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Advanced Micro Devices, Inc.

Hummingbird: An efficient text-to-video model that balances quality and computational efficiency via pruning and visual feedback learning.

AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning

24 March 2025·327 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Menlo Research

AlphaSpace enables robotic actions via semantic tokenization and symbolic reasoning, enhancing spatial intelligence in LLMs.

Aether: Geometric-Aware Unified World Modeling

24 March 2025·2472 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Shanghai AI Laboratory

AETHER: a unified framework enabling geometry-aware reasoning in world models, achieving zero-shot generalization from synthetic to real-world data.

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning

23 March 2025·3123 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences

Vision-R1: Improves LVLMs via vision-guided reinforcement learning, eliminating the need for human feedback and specialized reward models.

PathoHR: Breast Cancer Survival Prediction on High-Resolution Pathological Images

23 March 2025·1466 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Healthcare 🏢 XJLTU

PathoHR: Boost breast cancer survival prediction with high-resolution pathology images!

OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models

23 March 2025·2382 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 OriginAI, Tel-Aviv, Israel

OmnimatteZero: Real-time omnimatte using pre-trained video diffusion, no training needed!

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

23 March 2025·3575 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 155mv Research Lab

LLMs falter on culturally adapted math problems, revealing a critical cultural bias.