Deep Learning
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation
·3935 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ National University of Singapore
LogQuant: 2-bit quantization for KV cache, superior accuracy!
Verbal Process Supervision Elicits Better Coding Agents
·1306 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Mindify AI, United States
CURA: Verbal process supervision improves coding agents.
Decoupling Angles and Strength in Low-rank Adaptation
·3846 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ University of TΓΌbingen
DeLoRA: Decoupling angles and strength in low-rank adaptation for robust & efficient finetuning of large models!
Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
·3836 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Department of Biomedical Engineering, Duke University
Gumbel-Softmax Flow Matching enables controllable biological sequence generation with straight-through guidance, scaling efficiently to high-dimensional simplices.
Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling
·2283 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ University of Science and Technology of China
UAE-3D: A unified latent space approach for efficient & high-quality 3D molecular generation, outperforming existing methods in accuracy and speed.
Frac-Connections: Fractional Extension of Hyper-Connections
·1945 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ ByteDance Seed
Frac-Connections: An efficient alternative to Hyper-Connections that divides hidden states into fractions.
Transformers without Normalization
·4050 words·20 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ FAIR, Meta
Transformers can achieve state-of-the-art performance without normalization layers via Dynamic Tanh (DyT), offering a simpler and more efficient alternative.
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
·3607 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ University of Central Florida
KArAt: Can Learnable Attention Beat Standard Attention in Vision Transformers?
Charting and Navigating Hugging Face's Model Atlas
·3697 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ School of Computer Science and Engineering
Navigating millions of models is hard. This paper charts Hugging Face, revealing model relationships and attribute predictions.
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling
·1373 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Not Available
Rimer: RWKV-7 empowers superior time series modeling, offering a simple yet effective alternative to Transformers with fewer parameters.
LoRACode: LoRA Adapters for Code Embeddings
·1678 words·8 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Max Planck Institute for Software Systems
LoRACode enhances code embeddings using LoRA, achieving SOTA in code retrieval with minimal computational cost.
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
·3624 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Delft University of Technology
This paper reviews AI4SE benchmarks, introduces BenchScout for benchmark discovery, and proposes BenchFrame for benchmark enhancement, demonstrated via HumanEvalNext.
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
·2938 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Microsoft GenAI
KODCODE: A new synthetic coding dataset with verified solutions and tests, enabling state-of-the-art performance for coding LLMs.
Identifying Sensitive Weights via Post-quantization Integral
·2603 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Tsinghua University
PQI: Accurately identify sensitive weights in post-quantization to enhance LLM compression & performance!
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
·2117 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Nanjing University of Aeronautics and Astronautics
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers.
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
·2799 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ University of Exeter
Stable-SPAM stabilizes 4-bit LLM training, outperforming Adam.
One-step Diffusion Models with $f$-Divergence Distribution Matching
·6126 words·29 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ NVIDIA
f-distill: One-step diffusion models through f-divergence minimization, outperforming reverse-KL with better mode coverage and lower variance.
MONSTER: Monash Scalable Time Series Evaluation Repository
·4728 words·23 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Monash University
MONSTER: Large datasets for time series classification!
S*: Test Time Scaling for Code Generation
·2539 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ UC Berkeley
S*: Hybrid test-time scaling for code generation, boosting both coverage and selection accuracy.
ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
·4128 words·20 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ Gaoling School of Artificial Intelligence, Renmin University of China
ReQFlow: Efficiently generate high-quality protein backbones with rectified quaternion flow, outperforming existing methods in speed and designability.