Machine Learning
Language Models can Self-Improve at State-Value Estimation for Better Search
·2765 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Georgia Institute of Technology
Self-Taught Lookahead improves LLM search via self-supervision, matching costly methods at a fraction of the compute!
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
·2938 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Microsoft GenAI
KODCODE: A new synthetic coding dataset with verified solutions and tests, enabling state-of-the-art performance for coding LLMs.
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents
·3403 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 University of Illinois Urbana-Champaign
MultiAgentBench: A benchmark for evaluating collaboration and competition in LLM agents across diverse, interactive scenarios with novel metrics and protocols.
Identifying Sensitive Weights via Post-quantization Integral
·2603 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Tsinghua University
PQI: Accurately identify sensitive weights in post-quantization to enhance LLM compression & performance!
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
·2117 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Nanjing University of Aeronautics and Astronautics
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers.
Self-rewarding correction for mathematical reasoning
·3488 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 University of Illinois Urbana-Champaign
LLM can now reason and correct itself using self-generated data, achieving performance on par with external reward models!
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment
·1937 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Recommender Systems
🏢 KuaiShou Inc.
OneRec: A unified generative model that replaces the traditional retrieve-and-rank strategy, significantly improving recommendation quality in real-world scenarios.
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
·2799 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 University of Exeter
Stable-SPAM stabilizes 4-bit LLM training, outperforming Adam.
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
·3383 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Microsoft
DVPO: A lean RLHF framework that decouples value & policy optimization with global value guidance, cutting GPU use by 40% and training time by 35%.
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning
·1243 words·6 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Noah's Ark Lab, Huawei Technologies France
TAG: A decentralized framework for scalable multi-agent hierarchical reinforcement learning.
One-step Diffusion Models with $f$-Divergence Distribution Matching
·6126 words·29 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 NVIDIA
f-distill: One-step diffusion models through f-divergence minimization, outperforming reverse-KL with better mode coverage and lower variance.
MONSTER: Monash Scalable Time Series Evaluation Repository
·4728 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Monash University
MONSTER: Large datasets for time series classification!
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
·3455 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Unsupervised Learning
🏢 UNC-Chapel Hill
UPCORE reduces unintended unlearning effects via coreset selection, balancing knowledge removal and utility preservation.
S*: Test Time Scaling for Code Generation
·2539 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 UC Berkeley
S*: Hybrid test-time scaling for code generation, boosting both coverage and selection accuracy.
ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
·4128 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
ReQFlow: Efficiently generate high-quality protein backbones with rectified quaternion flow, outperforming existing methods in speed and designability.
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
·1911 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 UC Santa Barbara
MLGYM: A new framework & benchmark to advance AI Research Agents
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
·3688 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Microsoft Research Asia
Logic-RL unlocks LLM reasoning via rule-based reinforcement learning, generalizing to math problems after training on logic puzzles.
LLM-based User Profile Management for Recommender System
·2332 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Recommender Systems
🏢 Ulsan National Institute of Science and Technology
PURE: LLM-driven user profile management boosts recommendation by harnessing user reviews for personalized insights while tackling token limits. PURE enhances LLMs for better recommendations.
Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective
·6916 words·33 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Transfer Learning
🏢 Beijing Teleinfo Technology Company Ltd., China Academy of Information and Communications Technology
Unveiling the surprising potential of noise: transferable knowledge in semi-supervised heterogeneous domain adaptation (SHDA).
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
·4758 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Reinforcement Learning
🏢 Nanjing University
AdaptiveStep: Divides reasoning steps automatically through model confidence, enhancing PRM training & performance.