Deep Learning

LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation

25 March 2025·3935 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 National University of Singapore

LogQuant: 2-bit quantization for KV cache, superior accuracy!

Verbal Process Supervision Elicits Better Coding Agents

24 March 2025·1306 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Mindify AI, United States

CURA: Verbal process supervision improves coding agents.

Decoupling Angles and Strength in Low-rank Adaptation

23 March 2025·3846 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Tübingen

DeLoRA: Decoupling angles and strength in low-rank adaptation for robust & efficient finetuning of large models!

Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation

21 March 2025·3836 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Department of Biomedical Engineering, Duke University

Gumbel-Softmax Flow Matching enables controllable biological sequence generation with straight-through guidance, scaling efficiently to high-dimensional simplices.

Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling

19 March 2025·2283 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Science and Technology of China

UAE-3D: A unified latent space approach for efficient & high-quality 3D molecular generation, outperforming existing methods in accuracy and speed.

Frac-Connections: Fractional Extension of Hyper-Connections

18 March 2025·1945 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 ByteDance Seed

Frac-Connections: An efficient alternative to Hyper-Connections that divides hidden states into fractions.

Transformers without Normalization

13 March 2025·4050 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 FAIR, Meta

Transformers can achieve state-of-the-art performance without normalization layers via Dynamic Tanh (DyT), offering a simpler and more efficient alternative.

Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?

13 March 2025·3607 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Central Florida

KArAt: Can Learnable Attention Beat Standard Attention in Vision Transformers?

Charting and Navigating Hugging Face's Model Atlas

13 March 2025·3697 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 School of Computer Science and Engineering

Navigating millions of models is hard. This paper charts Hugging Face, revealing model relationships and attribute predictions.

BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling

8 March 2025·1373 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Not Available

Rimer: RWKV-7 empowers superior time series modeling, offering a simple yet effective alternative to Transformers with fewer parameters.

LoRACode: LoRA Adapters for Code Embeddings

7 March 2025·1678 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Max Planck Institute for Software Systems

LoRACode enhances code embeddings using LoRA, achieving SOTA in code retrieval with minimal computational cost.

Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol

7 March 2025·3624 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Delft University of Technology

This paper reviews AI4SE benchmarks, introduces BenchScout for benchmark discovery, and proposes BenchFrame for benchmark enhancement, demonstrated via HumanEvalNext.

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

4 March 2025·2938 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Microsoft GenAI

KODCODE: A new synthetic coding dataset with verified solutions and tests, enabling state-of-the-art performance for coding LLMs.

Identifying Sensitive Weights via Post-quantization Integral

28 February 2025·2603 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Tsinghua University

PQI: Accurately identify sensitive weights in post-quantization to enhance LLM compression & performance!

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

27 February 2025·2117 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Nanjing University of Aeronautics and Astronautics

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers.

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

24 February 2025·2799 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Exeter

Stable-SPAM stabilizes 4-bit LLM training, outperforming Adam.

One-step Diffusion Models with $f$-Divergence Distribution Matching

21 February 2025·6126 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 NVIDIA

f-distill: One-step diffusion models through f-divergence minimization, outperforming reverse-KL with better mode coverage and lower variance.

MONSTER: Monash Scalable Time Series Evaluation Repository

21 February 2025·4728 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Monash University

MONSTER: Large datasets for time series classification!

S*: Test Time Scaling for Code Generation

20 February 2025·2539 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 UC Berkeley

S*: Hybrid test-time scaling for code generation, boosting both coverage and selection accuracy.

ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

20 February 2025·4128 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

ReQFlow: Efficiently generate high-quality protein backbones with rectified quaternion flow, outperforming existing methods in speed and designability.