Skip to main content

Machine Learning

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
·3072 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Reinforcement Learning 🏒 StepFun
Open-Reasoner-Zero pioneers scalable, accessible RL training for reasoning in LLMs, achieving superior performance with a minimalist approach.
Expanding RL with Verifiable Rewards Across Diverse Domains
·3127 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Reinforcement Learning 🏒 Tencent AI Lab
RL with Verifiable Rewards is now expanding to diverse domains like medicine!
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
·3963 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Recommender Systems 🏒 Gaoling School of Artificial Intelligence, Renmin University of China
ReaRec: Unleashing latent reasoning power for sequential recommendation through inference-time multi-step reasoning.
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
·3814 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Reinforcement Learning 🏒 ByteDance Seed
This paper enhances Reinforcement Learning from Human Feedback (RLHF) by tackling reward hacking and response diversity issues through improved data construction methods.
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation
·3935 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 National University of Singapore
LogQuant: 2-bit quantization for KV cache, superior accuracy!
Verbal Process Supervision Elicits Better Coding Agents
·1306 words·7 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 Mindify AI, United States
CURA: Verbal process supervision improves coding agents.
Decoupling Angles and Strength in Low-rank Adaptation
·3846 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 University of TΓΌbingen
DeLoRA: Decoupling angles and strength in low-rank adaptation for robust & efficient finetuning of large models!
Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
·3836 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 Department of Biomedical Engineering, Duke University
Gumbel-Softmax Flow Matching enables controllable biological sequence generation with straight-through guidance, scaling efficiently to high-dimensional simplices.
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
·1719 words·9 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Reinforcement Learning 🏒 VNU University of Science, Vietnam
RL fine-tuning enhances reasoning in small LLMs, achieving competitive performance with limited resources, despite optimization & length challenges.
Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling
·2283 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 University of Science and Technology of China
UAE-3D: A unified latent space approach for efficient & high-quality 3D molecular generation, outperforming existing methods in accuracy and speed.
Frac-Connections: Fractional Extension of Hyper-Connections
·1945 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 ByteDance Seed
Frac-Connections: An efficient alternative to Hyper-Connections that divides hidden states into fractions.
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
·3349 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Reinforcement Learning 🏒 Tsinghua University
DAPO: Open-sources a LLM reinforcement learning system that achieves SOTA AIME scores, fostering reproducible research at scale.
Transformers without Normalization
·4050 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 FAIR, Meta
Transformers can achieve state-of-the-art performance without normalization layers via Dynamic Tanh (DyT), offering a simpler and more efficient alternative.
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
·3607 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 University of Central Florida
KArAt: Can Learnable Attention Beat Standard Attention in Vision Transformers?
Charting and Navigating Hugging Face's Model Atlas
·3697 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 School of Computer Science and Engineering
Navigating millions of models is hard. This paper charts Hugging Face, revealing model relationships and attribute predictions.
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
·4375 words·21 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Reinforcement Learning 🏒 Carnegie Mellon University
LLMs can now reason more efficiently!
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling
·1373 words·7 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 Not Available
Rimer: RWKV-7 empowers superior time series modeling, offering a simple yet effective alternative to Transformers with fewer parameters.
LoRACode: LoRA Adapters for Code Embeddings
·1678 words·8 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 Max Planck Institute for Software Systems
LoRACode enhances code embeddings using LoRA, achieving SOTA in code retrieval with minimal computational cost.
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
·3624 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 Delft University of Technology
This paper reviews AI4SE benchmarks, introduces BenchScout for benchmark discovery, and proposes BenchFrame for benchmark enhancement, demonstrated via HumanEvalNext.
Learning from Failures in Multi-Attempt Reinforcement Learning
·1948 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Reinforcement Learning 🏒 University of Cambridge
Multi-attempt RL refines LLMs, significantly boosting accuracy on math tasks by enabling them to learn from failures through user feedback.