Deep Learning
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation
·3935 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 National University of Singapore
LogQuant: 2-bit quantization for KV cache, superior accuracy!
Verbal Process Supervision Elicits Better Coding Agents
·1306 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Mindify AI, United States
CURA: Verbal process supervision improves coding agents.
Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
·3836 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Department of Biomedical Engineering, Duke University
Gumbel-Softmax Flow Matching enables controllable biological sequence generation with straight-through guidance, scaling efficiently to high-dimensional simplices.
Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling
·2283 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 University of Science and Technology of China
UAE-3D: A unified latent space approach for efficient & high-quality 3D molecular generation, outperforming existing methods in accuracy and speed.
Frac-Connections: Fractional Extension of Hyper-Connections
·1945 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 ByteDance Seed
Frac-Connections: An efficient alternative to Hyper-Connections that divides hidden states into fractions.
Transformers without Normalization
·4050 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 FAIR, Meta
Transformers can achieve state-of-the-art performance without normalization layers via Dynamic Tanh (DyT), offering a simpler and more efficient alternative.
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
·3607 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 University of Central Florida
KArAt: Can Learnable Attention Beat Standard Attention in Vision Transformers?
Charting and Navigating Hugging Face's Model Atlas
·3697 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 School of Computer Science and Engineering
Navigating millions of models is hard. This paper charts Hugging Face, revealing model relationships and attribute predictions.
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling
·1373 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Not Available
Rimer: RWKV-7 empowers superior time series modeling, offering a simple yet effective alternative to Transformers with fewer parameters.
LoRACode: LoRA Adapters for Code Embeddings
·1678 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Max Planck Institute for Software Systems
LoRACode enhances code embeddings using LoRA, achieving SOTA in code retrieval with minimal computational cost.
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
·3624 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Delft University of Technology
This paper reviews AI4SE benchmarks, introduces BenchScout for benchmark discovery, and proposes BenchFrame for benchmark enhancement, demonstrated via HumanEvalNext.
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
·2938 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Microsoft GenAI
KODCODE: A new synthetic coding dataset with verified solutions and tests, enabling state-of-the-art performance for coding LLMs.
Identifying Sensitive Weights via Post-quantization Integral
·2603 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Tsinghua University
PQI: Accurately identify sensitive weights in post-quantization to enhance LLM compression & performance!
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
·2117 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Nanjing University of Aeronautics and Astronautics
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers.
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
·2799 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 University of Exeter
Stable-SPAM stabilizes 4-bit LLM training, outperforming Adam.
One-step Diffusion Models with $f$-Divergence Distribution Matching
·6126 words·29 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 NVIDIA
f-distill: One-step diffusion models through f-divergence minimization, outperforming reverse-KL with better mode coverage and lower variance.
MONSTER: Monash Scalable Time Series Evaluation Repository
·4728 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Monash University
MONSTER: Large datasets for time series classification!
S*: Test Time Scaling for Code Generation
·2539 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 UC Berkeley
S*: Hybrid test-time scaling for code generation, boosting both coverage and selection accuracy.
ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
·4128 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
ReQFlow: Efficiently generate high-quality protein backbones with rectified quaternion flow, outperforming existing methods in speed and designability.
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
·6586 words·31 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 National University of Singapore
NExT-Mol: Combines 1D language models with 3D diffusion for molecule generation, achieving state-of-the-art performance and validity.