Machine Learning
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
·3072 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Reinforcement Learning
π’ StepFun
Open-Reasoner-Zero pioneers scalable, accessible RL training for reasoning in LLMs, achieving superior performance with a minimalist approach.
Expanding RL with Verifiable Rewards Across Diverse Domains
·3127 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Reinforcement Learning
π’ Tencent AI Lab
RL with Verifiable Rewards is now expanding to diverse domains like medicine!
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
·3963 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Recommender Systems
π’ Gaoling School of Artificial Intelligence, Renmin University of China
ReaRec: Unleashing latent reasoning power for sequential recommendation through inference-time multi-step reasoning.
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
·3814 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Reinforcement Learning
π’ ByteDance Seed
This paper enhances Reinforcement Learning from Human Feedback (RLHF) by tackling reward hacking and response diversity issues through improved data construction methods.
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation
·3935 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ National University of Singapore
LogQuant: 2-bit quantization for KV cache, superior accuracy!
Verbal Process Supervision Elicits Better Coding Agents
·1306 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Mindify AI, United States
CURA: Verbal process supervision improves coding agents.
Decoupling Angles and Strength in Low-rank Adaptation
·3846 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ University of TΓΌbingen
DeLoRA: Decoupling angles and strength in low-rank adaptation for robust & efficient finetuning of large models!
Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
·3836 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Department of Biomedical Engineering, Duke University
Gumbel-Softmax Flow Matching enables controllable biological sequence generation with straight-through guidance, scaling efficiently to high-dimensional simplices.
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
·1719 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Reinforcement Learning
π’ VNU University of Science, Vietnam
RL fine-tuning enhances reasoning in small LLMs, achieving competitive performance with limited resources, despite optimization & length challenges.
Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling
·2283 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ University of Science and Technology of China
UAE-3D: A unified latent space approach for efficient & high-quality 3D molecular generation, outperforming existing methods in accuracy and speed.
Frac-Connections: Fractional Extension of Hyper-Connections
·1945 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ ByteDance Seed
Frac-Connections: An efficient alternative to Hyper-Connections that divides hidden states into fractions.
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
·3349 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Reinforcement Learning
π’ Tsinghua University
DAPO: Open-sources a LLM reinforcement learning system that achieves SOTA AIME scores, fostering reproducible research at scale.
Transformers without Normalization
·4050 words·20 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ FAIR, Meta
Transformers can achieve state-of-the-art performance without normalization layers via Dynamic Tanh (DyT), offering a simpler and more efficient alternative.
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
·3607 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ University of Central Florida
KArAt: Can Learnable Attention Beat Standard Attention in Vision Transformers?
Charting and Navigating Hugging Face's Model Atlas
·3697 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ School of Computer Science and Engineering
Navigating millions of models is hard. This paper charts Hugging Face, revealing model relationships and attribute predictions.
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
·4375 words·21 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Reinforcement Learning
π’ Carnegie Mellon University
LLMs can now reason more efficiently!
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling
·1373 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Not Available
Rimer: RWKV-7 empowers superior time series modeling, offering a simple yet effective alternative to Transformers with fewer parameters.
LoRACode: LoRA Adapters for Code Embeddings
·1678 words·8 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Max Planck Institute for Software Systems
LoRACode enhances code embeddings using LoRA, achieving SOTA in code retrieval with minimal computational cost.
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
·3624 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Delft University of Technology
This paper reviews AI4SE benchmarks, introduces BenchScout for benchmark discovery, and proposes BenchFrame for benchmark enhancement, demonstrated via HumanEvalNext.
Learning from Failures in Multi-Attempt Reinforcement Learning
·1948 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Reinforcement Learning
π’ University of Cambridge
Multi-attempt RL refines LLMs, significantly boosting accuracy on math tasks by enabling them to learn from failures through user feedback.