Skip to main content

Machine Learning

S*: Test Time Scaling for Code Generation
·2539 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 UC Berkeley
S*: Hybrid test-time scaling for code generation, boosting both coverage and selection accuracy.
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
·1911 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 UC Santa Barbara
MLGYM: A new framework & benchmark to advance AI Research Agents
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
·3688 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Microsoft Research Asia
Logic-RL unlocks LLM reasoning via rule-based reinforcement learning, generalizing to math problems after training on logic puzzles.
LLM-based User Profile Management for Recommender System
·2332 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Recommender Systems 🏢 Ulsan National Institute of Science and Technology
PURE: LLM-driven user profile management boosts recommendation by harnessing user reviews for personalized insights while tackling token limits. PURE enhances LLMs for better recommendations.
Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective
·6916 words·33 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Transfer Learning 🏢 Beijing Teleinfo Technology Company Ltd., China Academy of Information and Communications Technology
Unveiling the surprising potential of noise: transferable knowledge in semi-supervised heterogeneous domain adaptation (SHDA).
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
·4758 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Nanjing University
AdaptiveStep: Divides reasoning steps automatically through model confidence, enhancing PRM training & performance.
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
·3894 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Tencent
S2R: Teaches LLMs to self-verify and self-correct, boosting reasoning with efficient reinforcement learning.
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
·6586 words·31 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 National University of Singapore
NExT-Mol: Combines 1D language models with 3D diffusion for molecule generation, achieving state-of-the-art performance and validity.
Eager Updates For Overlapped Communication and Computation in DiLoCo
·3815 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Federated Learning 🏢 Google DeepMind
Eager updates drastically speed up training massive language models by cleverly overlapping communication and computation in DiLoCo, achieving near-optimal performance even with low bandwidth.
Thinking Preference Optimization
·5794 words·28 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Case.edu
ThinkPO improves LLM reasoning by preferring longer CoT, boosting performance without new data.
Small Models Struggle to Learn from Strong Reasoners
·4149 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Washington
Small language models struggle to learn complex reasoning from large models, but a novel ‘Mix Distillation’ method balances complexity for effective capability transfer.
Towards Data-Efficient Pretraining for Atomic Property Prediction
·3694 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Transfer Learning 🏢 King Abdullah University of Science and Technology
High-quality, task-relevant pretraining data surpasses large-scale pretraining in atomic property prediction, achieving comparable performance at 1/24th the computational cost.
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
·4399 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 AIRI
MIKASA, a new benchmark for memory-intensive reinforcement learning, provides a unified framework for evaluating memory capabilities in diverse scenarios, including complex robotic manipulation tasks.
AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting
·3650 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Huawei Noah's Ark Lab, Paris, France
AdaPTS effectively adapts pre-trained univariate time series models to probabilistic multivariate forecasting, improving accuracy and uncertainty quantification.
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights
·3096 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 School of Computer Science and Engineering
ProbeLog: Zero-shot model search directly from weights, boosting efficiency and accuracy!
Agency Is Frame-Dependent
·400 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Google DeepMind
Agency, a key concept in AI, is shown to be relative to the observer’s perspective (frame-dependent), challenging traditional binary definitions and necessitating a more nuanced approach for AI system…
Improving Transformer World Models for Data-Efficient RL
·2775 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Google DeepMind
AI agents now master complex tasks with improved Transformer World Models, achieving a new state-of-the-art in data-efficient reinforcement learning.
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
·3269 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 University of Waterloo
AceCoder uses automated test-case synthesis to create a large-scale dataset for training reward models, enabling effective reinforcement learning to significantly boost code generation model performan…
Weak-to-Strong Diffusion with Reflection
·4655 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Hong Kong University of Science and Technology
W2SD: A novel framework boosts diffusion model quality by using the difference between weak and strong models to refine sampling trajectories, achieving state-of-the-art performance.
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch
·5509 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Federated Learning 🏢 Google DeepMind
Streaming DiLoCo achieves two orders of magnitude bandwidth reduction in billion-scale parameter LLM training by synchronizing parameter subsets sequentially, overlapping communication with computatio…