Large Language Models

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

26 September 2024·2517 words·12 mins· loading · loading

Large Language Models 🏢 Colfax Research

FlashAttention-3: Achieves 1.5-2x faster attention on H100 GPUs using asynchrony and low-precision, reaching 1.3 PFLOPs/s.

FLAME : Factuality-Aware Alignment for Large Language Models

26 September 2024·2851 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Waterloo

FLAME: A novel alignment method enhances large language model factuality by addressing hallucination in supervised fine-tuning and reinforcement learning, resulting in more accurate and helpful AI ass…

Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond

26 September 2024·1351 words·7 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Michigan

Researchers crack the code of in-context learning in Transformers, revealing how architecture, low-rank parameters, and data correlations influence model optimization and generalization.

Fight Back Against Jailbreaking via Prompt Adversarial Tuning

26 September 2024·2100 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Peking University

Prompt Adversarial Tuning (PAT) defends against LLM jailbreaking by training a protective prompt prefix. PAT uses adversarial and benign prompts to optimize this prefix, significantly reducing succes…

Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

26 September 2024·3653 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Alibaba Group

FlexLoRA: Efficient Federated Fine-tuning of LLMs for Heterogeneous Tasks and Resources.

Fast Best-of-N Decoding via Speculative Rejection

26 September 2024·1456 words·7 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Speculative Rejection: A novel algorithm boosts Large Language Model (LLM) alignment by speeding up inference-time alignment by 16-32x!

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

26 September 2024·2598 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 SenseTime Research

LLM-Infused Diffuser boosts text-to-image generation by smartly integrating LLMs, surpassing existing models in prompt understanding and image quality.

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

26 September 2024·3403 words·16 mins· loading · loading

Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

Researchers extended large language models’ context windows by training-free methods via analyzing and manipulating positional vectors, improving long-text processing.

Exploiting LLM Quantization

26 September 2024·1836 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ETH Zurich

LLM quantization, while improving efficiency, creates a security risk: attackers can craft seemingly benign models that exhibit malicious behavior only when quantized.

Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion

26 September 2024·2629 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Warsaw University of Technology

D2DMoE boosts Transformer efficiency by up to 60% via smart activation sparsity and dynamic expert selection, outperforming existing methods.

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

26 September 2024·2281 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

This paper introduces a model-agnostic algorithm that uses natural language predicates to make statistical model parameters directly interpretable, significantly improving explainability.

Evaluating the World Model Implicit in a Generative Model

26 September 2024·4059 words·20 mins· loading · loading

Large Language Models 🏢 Harvard University

New metrics reveal that generative models often possess surprisingly incoherent world models, despite seemingly accurate next-token predictions. This incoherence leads to fragility in solving related …

Estimating the Hallucination Rate of Generative AI

26 September 2024·3412 words·17 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Department of Statistics, Columbia University

New method estimates hallucination rates in generative AI’s in-context learning, improving model reliability.

ESPACE: Dimensionality Reduction of Activations for Model Compression

26 September 2024·2254 words·11 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 NVIDIA Research

ESPACE: A novel LLM compression technique achieving 50% model size reduction with minimal accuracy loss by cleverly projecting activations onto principal components.

Entity Alignment with Noisy Annotations from Large Language Models

26 September 2024·1820 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Hong Kong Polytechnic University

LLM4EA: A novel framework efficiently merges knowledge graphs using LLMs, overcoming noisy annotations and high costs via active learning and unsupervised label refinement, boosting accuracy and effic…

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

26 September 2024·2339 words·11 mins· loading · loading

Large Language Models 🏢 Harbin Institute of Technology

DEEPEN: a training-free LLM ensemble framework fusing probability distributions in a relative space to overcome vocabulary misalignment, improving performance consistently across benchmarks.

Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus

26 September 2024·3384 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Advanced AI Innovation Center, Hitachi

Boosting AI reasoning! New research enhances LLMs’ logical abilities via a principled synthetic logic corpus, achieving substantial improvements across logic, math, and coding benchmarks.

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control

26 September 2024·3239 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

Boosting LLM trustworthiness, researchers introduce Sparse Activation Control, a training-free method that concurrently enhances safety, factuality, and bias mitigation by selectively controlling atte…

Enhancing LLM’s Cognition via Structurization

26 September 2024·3694 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University

LLMs struggle with complex, long-form text. This paper introduces ‘context structurization,’ transforming unstructured text into a structured format to enhance LLM comprehension. Experiments across …

Enhancing Large Language Models through Adaptive Tokenizers

26 September 2024·1963 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab

Adaptive tokenizers enhance LLMs by dynamically optimizing vocabulary during training, improving accuracy without increasing vocabulary size.