🏢 Rice University
SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform
·2142 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Rice University
SS1: A novel GPU-friendly operator accelerates deep learning inference by leveraging structured parameter sharing, achieving superior quality-efficiency tradeoffs compared to existing methods.
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
·1675 words·8 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Rice University
SpaceByte: A novel byte-level decoder architecture achieving near-tokenized-model performance without tokenization!
Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation
·2063 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Rice University
LoPA: a novel parameter-efficient fine-tuning method matches state-of-the-art performance while requiring no server-side adapters, improving upon traditional prompt tuning.
Optimal Hypothesis Selection in (Almost) Linear Time
·1628 words·8 mins·
loading
·
loading
AI Theory
Optimization
🏢 Rice University
This paper presents the first almost linear-time algorithm achieving the optimal accuracy parameter for hypothesis selection, solving a decades-long open problem.
Optimal Algorithms for Augmented Testing of Discrete Distributions
·1848 words·9 mins·
loading
·
loading
AI Theory
Optimization
🏢 Rice University
Leveraging predictions, this research presents novel algorithms for uniformity, identity, and closeness testing of discrete distributions, achieving information-theoretically optimal sample complexity…
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
·2513 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Rice University
NoMAD-Attention achieves up to 2x speedup in 4-bit quantized LLaMA inference on CPUs by replacing computationally expensive multiply-add operations with ultra-low-latency in-register lookups.
Learning Transferable Features for Implicit Neural Representations
·4038 words·19 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 Rice University
STRAINER: A new framework enabling faster, higher-quality INR fitting by leveraging transferable features across similar signals, significantly boosting INR performance.
Fair GLASSO: Estimating Fair Graphical Models with Unbiased Statistical Behavior
·1979 words·10 mins·
loading
·
loading
AI Theory
Fairness
🏢 Rice University
Fair GLASSO ensures fair Gaussian graphical models by introducing novel bias metrics and a penalized maximum likelihood estimator to mitigate group biases in data.