Skip to main content

Paper Reviews by AI

2025

Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
·3836 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Department of Biomedical Engineering, Duke University
Gumbel-Softmax Flow Matching enables controllable biological sequence generation with straight-through guidance, scaling efficiently to high-dimensional simplices.
FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields
·2851 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 KAIST, Visual Media Lab
FFaceNeRF: Enables few-shot face editing in NeRFs via geometry adapter & latent mixing, enhancing control & quality with limited training data.
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
·3338 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Renmin University of China
ETVA evaluates text-to-video alignment via fine-grained question generation and answering.
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
·2987 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University
Zero-1-to-A: Animatable avatars from a single image using video diffusion, robust to spatial & temporal inconsistencies!
XAttention: Block Sparse Attention with Antidiagonal Scoring
·2960 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
XAttention: Antidiagonal scoring unlocks block-sparse attention, slashing compute costs in long-context Transformers without sacrificing accuracy.
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
·2005 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 AIRI
Efficient image representation via adaptive token reduction.
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
·1204 words·6 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 EverEx
VideoRFSplat: Direct text-to-3D Gaussian Splatting with flexible pose and multi-view joint modeling, bypassing SDS refinement!
Unleashing Vecset Diffusion Model for Fast Shape Generation
·3881 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 MMLab, CUHK
FlashVDM enables fast 3D shape generation by accelerating both VAE decoding and diffusion sampling.
Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens
·3099 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 DP Technology
Uni-3DAR: Autoregressive framework unifies 3D generation/understanding, compressing spatial tokens for faster, versatile AI.
Ultra-Resolution Adaptation with Ease
·2457 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore
URA: Ultra-resolution adaptation made easy! Uses synthetic data & minor weight tuning for efficient, high-res text-to-image diffusion models.
Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering
·1842 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Pohang University of Science and Technology
Typed-RAG enhances non-factoid QA by type-aware decomposition, refining retrieval and generation for nuanced, user-aligned answers.
Tokenize Image as a Set
·3037 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China
TokenSet: Tokenizing images as unordered sets for dynamic capacity allocation and robust generation, breaking from fixed-position latent codes.
Survey on Evaluation of LLM-based Agents
·396 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hebrew University of Jerusalem
A comprehensive survey on evaluation methodologies for LLM-based agents, analyzing benchmarks and frameworks across key dimensions like capabilities, applications, and generalist performance.
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
·3774 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Rice University
LLMs survey: Model, output, and prompt-based strategies for efficient reasoning, mitigating ‘overthinking’ for faster, cheaper, and real-world applications.
Sonata: Self-Supervised Learning of Reliable Point Representations
·2429 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Hong Kong
Sonata: Reliable 3D point cloud self-supervised learning through self-distillation, achieving SOTA with less data.
Scale-wise Distillation of Diffusion Models
·3863 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Yandex Research
SWD: Scale-wise distillation of diffusion models achieves faster image generation by upscaling resolution during denoising, outperforming counterparts with similar computation.
SALT: Singular Value Adaptation with Low-Rank Transformation
·1957 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Mohamed Bin Zayed University of Artificial Intelligence
SALT: Fine-tuning SAM for medical images using Singular Value Adaptation with Low-Rank Transformation for efficient, robust segmentation.
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
·1719 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 VNU University of Science, Vietnam
RL fine-tuning enhances reasoning in small LLMs, achieving competitive performance with limited resources, despite optimization & length challenges.
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
·3300 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Westlake University
VidKV: Achieves 1.5x-bit KV cache quantization for VideoLLMs, maintaining performance without retraining.
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes
·4268 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Simon Fraser University
NuiScene: Enables efficient & unbounded outdoor scene generation by encoding scene chunks as uniform vector sets and outpainting.