Machine Learning

Language Models can Self-Improve at State-Value Estimation for Better Search

4 March 2025·2765 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Georgia Institute of Technology

Self-Taught Lookahead improves LLM search via self-supervision, matching costly methods at a fraction of the compute!

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

4 March 2025·2938 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Microsoft GenAI

KODCODE: A new synthetic coding dataset with verified solutions and tests, enabling state-of-the-art performance for coding LLMs.

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

3 March 2025·3403 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 University of Illinois Urbana-Champaign

MultiAgentBench: A benchmark for evaluating collaboration and competition in LLM agents across diverse, interactive scenarios with novel metrics and protocols.

Identifying Sensitive Weights via Post-quantization Integral

28 February 2025·2603 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Tsinghua University

PQI: Accurately identify sensitive weights in post-quantization to enhance LLM compression & performance!

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

27 February 2025·2117 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Nanjing University of Aeronautics and Astronautics

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers.

Self-rewarding correction for mathematical reasoning

26 February 2025·3488 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 University of Illinois Urbana-Champaign

LLM can now reason and correct itself using self-generated data, achieving performance on par with external reward models!

OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

26 February 2025·1937 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Recommender Systems 🏢 KuaiShou Inc.

OneRec: A unified generative model that replaces the traditional retrieve-and-rank strategy, significantly improving recommendation quality in real-world scenarios.

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

24 February 2025·2799 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Exeter

Stable-SPAM stabilizes 4-bit LLM training, outperforming Adam.

Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance

24 February 2025·3383 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Microsoft

DVPO: A lean RLHF framework that decouples value & policy optimization with global value guidance, cutting GPU use by 40% and training time by 35%.

TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning

21 February 2025·1243 words·6 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Noah's Ark Lab, Huawei Technologies France

TAG: A decentralized framework for scalable multi-agent hierarchical reinforcement learning.

One-step Diffusion Models with $f$-Divergence Distribution Matching

21 February 2025·6126 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 NVIDIA

f-distill: One-step diffusion models through f-divergence minimization, outperforming reverse-KL with better mode coverage and lower variance.

MONSTER: Monash Scalable Time Series Evaluation Repository

21 February 2025·4728 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Monash University

MONSTER: Large datasets for time series classification!

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning

20 February 2025·3455 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Unsupervised Learning 🏢 UNC-Chapel Hill

UPCORE reduces unintended unlearning effects via coreset selection, balancing knowledge removal and utility preservation.

S*: Test Time Scaling for Code Generation

20 February 2025·2539 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 UC Berkeley

S*: Hybrid test-time scaling for code generation, boosting both coverage and selection accuracy.

ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

20 February 2025·4128 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

ReQFlow: Efficiently generate high-quality protein backbones with rectified quaternion flow, outperforming existing methods in speed and designability.

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

20 February 2025·1911 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 UC Santa Barbara

MLGYM: A new framework & benchmark to advance AI Research Agents

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

20 February 2025·3688 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Microsoft Research Asia

Logic-RL unlocks LLM reasoning via rule-based reinforcement learning, generalizing to math problems after training on logic puzzles.

LLM-based User Profile Management for Recommender System

20 February 2025·2332 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Recommender Systems 🏢 Ulsan National Institute of Science and Technology

PURE: LLM-driven user profile management boosts recommendation by harnessing user reviews for personalized insights while tackling token limits. PURE enhances LLMs for better recommendations.

Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

19 February 2025·6916 words·33 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Transfer Learning 🏢 Beijing Teleinfo Technology Company Ltd., China Academy of Information and Communications Technology

Unveiling the surprising potential of noise: transferable knowledge in semi-supervised heterogeneous domain adaptation (SHDA).

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

19 February 2025·4758 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Nanjing University

AdaptiveStep: Divides reasoning steps automatically through model confidence, enhancing PRM training & performance.