Skip to main content

Paper Reviews by AI

2025

Video-T1: Test-Time Scaling for Video Generation
·3231 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tsinghua University
Video-T1 enhances video generation through test-time scaling, improving quality and consistency by viewing generation as a search for optimal video trajectories.
Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models
·4635 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 MBZUAI
Video SimpleQA: A New Benchmark for Factuality Evaluation in Large Video Language Models.
Verbal Process Supervision Elicits Better Coding Agents
·1306 words·7 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Mindify AI, United States
CURA: Verbal process supervision improves coding agents.
Training-free Diffusion Acceleration with Bottleneck Sampling
·3305 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University
Bottleneck Sampling: Accelerate diffusion models without retraining by cleverly using low-resolution priors for efficient inference!
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
·1703 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Northwestern University
MetaSpatial: RL for 3D Spatial Reasoning in VLMs
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
·3290 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers AI Theory Interpretability 🏢 AIRI
LLMs’ reasoning is decoded via sparse autoencoders, revealing key features that, when steered, enhance performance. First mechanistic account of reasoning in LLMs!
FFN Fusion: Rethinking Sequential Computation in Large Language Models
·3776 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA
FFN Fusion: Parallelizing sequential computation in large language models for significant speedups!
Equivariant Image Modeling
·3413 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China
Aligning image generation subtasks: Equivariant modeling boosts efficiency and generalization by leveraging natural visual signal invariance.
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
·3661 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Beihang University
Diffusion-4K: Synthesizing ultra-high-resolution images with a new benchmark dataset and wavelet-based fine-tuning that makes 4K image creation more detailed and accessible!
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
·3380 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 S-Lab, Nanyang Technological University
CFG-Zero*: A better Classifier-Free Guidance to improve the image quality and text alignment in Flow Matching models.
AMD-Hummingbird: Towards an Efficient Text-to-Video Model
·739 words·4 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Advanced Micro Devices, Inc.
Hummingbird: An efficient text-to-video model that balances quality and computational efficiency via pruning and visual feedback learning.
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning
·327 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Menlo Research
AlphaSpace enables robotic actions via semantic tokenization and symbolic reasoning, enhancing spatial intelligence in LLMs.
Aether: Geometric-Aware Unified World Modeling
·2472 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Shanghai AI Laboratory
AETHER: a unified framework enabling geometry-aware reasoning in world models, achieving zero-shot generalization from synthetic to real-world data.
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
·3123 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences
Vision-R1: Improves LVLMs via vision-guided reinforcement learning, eliminating the need for human feedback and specialized reward models.
OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models
·2382 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 OriginAI, Tel-Aviv, Israel
OmnimatteZero: Real-time omnimatte using pre-trained video diffusion, no training needed!
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?
·3575 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 155mv Research Lab
LLMs falter on culturally adapted math problems, revealing a critical cultural bias.
AgentRxiv: Towards Collaborative Autonomous Research
·1858 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Healthcare 🏢 Johns Hopkins University
AgentRxiv enables collaborative autonomous research via LLM agent preprint sharing, boosting performance and discovery.
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
·3176 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Pattern Recognition Center, WeChat AI, Tencent
RDTF: Efficient animated sticker generation via dual-mask training, outperforming parameter-efficient tuning under constrained resources.
When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO
·1831 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Fudan University
Adaptive Diffusion Models with Minority-Aware Adaptive DPO
V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms
·1371 words·7 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Politecnico of Turin
V-SEEK accelerates LLM reasoning on open-hardware RISC-V platforms, achieving up to 3.0x speedup through optimized kernels and memory management.