Machine Learning

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

31 March 2025·3072 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 StepFun

Open-Reasoner-Zero pioneers scalable, accessible RL training for reasoning in LLMs, achieving superior performance with a minimalist approach.

Expanding RL with Verifiable Rewards Across Diverse Domains

31 March 2025·3127 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Tencent AI Lab

RL with Verifiable Rewards is now expanding to diverse domains like medicine!

Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation

28 March 2025·3963 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Recommender Systems 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

ReaRec: Unleashing latent reasoning power for sequential recommendation through inference-time multi-step reasoning.

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

28 March 2025·3814 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 ByteDance Seed

This paper enhances Reinforcement Learning from Human Feedback (RLHF) by tackling reward hacking and response diversity issues through improved data construction methods.

LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation

25 March 2025·3935 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 National University of Singapore

LogQuant: 2-bit quantization for KV cache, superior accuracy!

Verbal Process Supervision Elicits Better Coding Agents

24 March 2025·1306 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Mindify AI, United States

CURA: Verbal process supervision improves coding agents.

Decoupling Angles and Strength in Low-rank Adaptation

23 March 2025·3846 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Tübingen

DeLoRA: Decoupling angles and strength in low-rank adaptation for robust & efficient finetuning of large models!

Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation

21 March 2025·3836 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Department of Biomedical Engineering, Duke University

Gumbel-Softmax Flow Matching enables controllable biological sequence generation with straight-through guidance, scaling efficiently to high-dimensional simplices.

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

20 March 2025·1719 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 VNU University of Science, Vietnam

RL fine-tuning enhances reasoning in small LLMs, achieving competitive performance with limited resources, despite optimization & length challenges.

Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling

19 March 2025·2283 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Science and Technology of China

UAE-3D: A unified latent space approach for efficient & high-quality 3D molecular generation, outperforming existing methods in accuracy and speed.

Frac-Connections: Fractional Extension of Hyper-Connections

18 March 2025·1945 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 ByteDance Seed

Frac-Connections: An efficient alternative to Hyper-Connections that divides hidden states into fractions.

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

18 March 2025·3349 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Tsinghua University

DAPO: Open-sources a LLM reinforcement learning system that achieves SOTA AIME scores, fostering reproducible research at scale.

Transformers without Normalization

13 March 2025·4050 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 FAIR, Meta

Transformers can achieve state-of-the-art performance without normalization layers via Dynamic Tanh (DyT), offering a simpler and more efficient alternative.

Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?

13 March 2025·3607 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Central Florida

KArAt: Can Learnable Attention Beat Standard Attention in Vision Transformers?

Charting and Navigating Hugging Face's Model Atlas

13 March 2025·3697 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 School of Computer Science and Engineering

Navigating millions of models is hard. This paper charts Hugging Face, revealing model relationships and attribute predictions.

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

10 March 2025·4375 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Carnegie Mellon University

LLMs can now reason more efficiently!

BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling

8 March 2025·1373 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Not Available

Rimer: RWKV-7 empowers superior time series modeling, offering a simple yet effective alternative to Transformers with fewer parameters.

LoRACode: LoRA Adapters for Code Embeddings

7 March 2025·1678 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Max Planck Institute for Software Systems

LoRACode enhances code embeddings using LoRA, achieving SOTA in code retrieval with minimal computational cost.

Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol

7 March 2025·3624 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Delft University of Technology

This paper reviews AI4SE benchmarks, introduces BenchScout for benchmark discovery, and proposes BenchFrame for benchmark enhancement, demonstrated via HumanEvalNext.

Learning from Failures in Multi-Attempt Reinforcement Learning

4 March 2025·1948 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 University of Cambridge

Multi-attempt RL refines LLMs, significantly boosting accuracy on math tasks by enabling them to learn from failures through user feedback.