Paper Reviews by AI

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

18 February 2025·4689 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 California Institute of Technology

HEADINFER achieves memory-efficient LLM inference by cleverly offloading key-value cache to the CPU, enabling 4 million token inference on a single consumer GPU.

Eager Updates For Overlapped Communication and Computation in DiLoCo

18 February 2025·3815 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Federated Learning 🏢 Google DeepMind

Eager updates drastically speed up training massive language models by cleverly overlapping communication and computation in DiLoCo, achieving near-optimal performance even with low bandwidth.

Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge

18 February 2025·3819 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 City University of Hong Kong

Crowd-based comparative evaluation significantly boosts LLM-as-a-judge accuracy by using crowd responses to expose deeper details, resulting in more reliable and efficient auto-evaluation.

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

18 February 2025·2814 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 AIRI

LLMs can losslessly compress 1568 tokens into a single vector, surpassing prior methods by two orders of magnitude.

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

17 February 2025·4398 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Tsinghua University

video-SALMONN-01: An open-source audio-visual LLM enhances video understanding with a novel reasoning-intensive dataset and the pDPO method, achieving significant accuracy gains.

Thinking Preference Optimization

17 February 2025·5794 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Case.edu

ThinkPO improves LLM reasoning by preferring longer CoT, boosting performance without new data.

System Message Generation for User Preferences using Open-Source Models

17 February 2025·3777 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Upstage AI

SYSGEN: A novel pipeline generates effective system messages for LLMs using open-source models, improving model responses and addressing data scarcity in supervised fine-tuning.

Small Models Struggle to Learn from Strong Reasoners

17 February 2025·4149 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Washington

Small language models struggle to learn complex reasoning from large models, but a novel ‘Mix Distillation’ method balances complexity for effective capability transfer.

SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL

17 February 2025·3833 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Department of Artificial Intelligence, Chung-Ang University

SAFE-SQL boosts Text-to-SQL accuracy by intelligently generating and filtering self-augmented examples for in-context learning, surpassing existing methods in challenging scenarios.

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

17 February 2025·2710 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 School of Computer Science, Fudan University

Contrary to popular belief, longer reasoning chains don’t always boost Large Language Model (LLM) accuracy; this research reveals that parallel scaling with shorter solutions outperforms sequential sc…

Presumed Cultural Identity: How Names Shape LLM Responses

17 February 2025·2724 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers AI Theory Fairness 🏢 University of Copenhagen

LLMs personalize based on user names, but this study reveals that cultural presumptions in LLM responses risk reinforcing stereotypes.

PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning

17 February 2025·2524 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Xi'an Jiaotong University

PhysReason benchmark evaluates physics-based reasoning in LLMs, revealing critical limitations and guiding future improvements.

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction

17 February 2025·221 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 SenseTime Research

MaskGWM: Improves driving world models by using video mask reconstruction for better generalization.

MagicArticulate: Make Your 3D Models Articulation-Ready

17 February 2025·4321 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanyang Technological University

MagicArticulate automates 3D model animation preparation by generating skeletons and skinning weights, overcoming prior manual methods’ limitations, and introducing Articulation-XL, a large-scale benc…

Learning Getting-Up Policies for Real-World Humanoid Robots

17 February 2025·4423 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 University of Illinois Urbana-Champaign

HUMANUP: A novel two-stage reinforcement learning framework enables real-world humanoid robots to autonomously recover from falls on various terrains.

Large Language Models and Mathematical Reasoning Failures

17 February 2025·397 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KTH Royal Institute of Technology

Large language models struggle with mathematical word problems, demonstrating flaws in reasoning despite achieving high accuracy; a new study highlights these persistent gaps in generalization abiliti…

Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance

17 February 2025·1604 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KTH Royal Institute of Technology

LLMs’ performance on language complexity tasks (LIX & ADD) reveals a strong correlation with general capabilities, suggesting complexity metrics as noisy zero-shot proxies for model evaluation.

Intuitive physics understanding emerges from self-supervised pretraining on natural videos

17 February 2025·4400 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Meta AI

AI models learn intuitive physics from self-supervised video pretraining.

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

17 February 2025·1563 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Reallm Labs

InfiR: Efficient, small AI models rival larger ones in reasoning, slashing costs and boosting privacy for wider AI use.

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

17 February 2025·2102 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Peking University

HermesFlow seamlessly bridges the understanding-generation gap in MLLMs using a novel Pair-DPO framework and self-play optimization on homologous data, achieving significant performance improvements.