🏢 Rice University

SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform

26 September 2024·2142 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Rice University

SS1: A novel GPU-friendly operator accelerates deep learning inference by leveraging structured parameter sharing, achieving superior quality-efficiency tradeoffs compared to existing methods.

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

26 September 2024·1675 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Rice University

SpaceByte: A novel byte-level decoder architecture achieving near-tokenized-model performance without tokenization!

Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation

26 September 2024·2063 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Rice University

LoPA: a novel parameter-efficient fine-tuning method matches state-of-the-art performance while requiring no server-side adapters, improving upon traditional prompt tuning.

Optimal Hypothesis Selection in (Almost) Linear Time

26 September 2024·1628 words·8 mins· loading · loading

AI Theory Optimization 🏢 Rice University

This paper presents the first almost linear-time algorithm achieving the optimal accuracy parameter for hypothesis selection, solving a decades-long open problem.

Optimal Algorithms for Augmented Testing of Discrete Distributions

26 September 2024·1848 words·9 mins· loading · loading

AI Theory Optimization 🏢 Rice University

Leveraging predictions, this research presents novel algorithms for uniformity, identity, and closeness testing of discrete distributions, achieving information-theoretically optimal sample complexity…

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

26 September 2024·2513 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Rice University

NoMAD-Attention achieves up to 2x speedup in 4-bit quantized LLaMA inference on CPUs by replacing computationally expensive multiply-add operations with ultra-low-latency in-register lookups.

Learning Transferable Features for Implicit Neural Representations

26 September 2024·4038 words·19 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Rice University

STRAINER: A new framework enabling faster, higher-quality INR fitting by leveraging transferable features across similar signals, significantly boosting INR performance.

Fair GLASSO: Estimating Fair Graphical Models with Unbiased Statistical Behavior

26 September 2024·1979 words·10 mins· loading · loading

AI Theory Fairness 🏢 Rice University

Fair GLASSO ensures fair Gaussian graphical models by introducing novel bias metrics and a penalized maximum likelihood estimator to mitigate group biases in data.