Large Language Models
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
·2517 words·12 mins·
loading
·
loading
Large Language Models
🏢 Colfax Research
FlashAttention-3: Achieves 1.5-2x faster attention on H100 GPUs using asynchrony and low-precision, reaching 1.3 PFLOPs/s.
FLAME : Factuality-Aware Alignment for Large Language Models
·2851 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Waterloo
FLAME: A novel alignment method enhances large language model factuality by addressing hallucination in supervised fine-tuning and reinforcement learning, resulting in more accurate and helpful AI ass…
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
·1351 words·7 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Michigan
Researchers crack the code of in-context learning in Transformers, revealing how architecture, low-rank parameters, and data correlations influence model optimization and generalization.
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
·2100 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Peking University
Prompt Adversarial Tuning (PAT) defends against LLM jailbreaking by training a protective prompt prefix. PAT uses adversarial and benign prompts to optimize this prefix, significantly reducing succes…
Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources
·3653 words·18 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Alibaba Group
FlexLoRA: Efficient Federated Fine-tuning of LLMs for Heterogeneous Tasks and Resources.
Fast Best-of-N Decoding via Speculative Rejection
·1456 words·7 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
Speculative Rejection: A novel algorithm boosts Large Language Model (LLM) alignment by speeding up inference-time alignment by 16-32x!
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
·2598 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 SenseTime Research
LLM-Infused Diffuser boosts text-to-image generation by smartly integrating LLMs, surpassing existing models in prompt understanding and image quality.
Exploring Context Window of Large Language Models via Decomposed Positional Vectors
·3403 words·16 mins·
loading
·
loading
Large Language Models
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
Researchers extended large language models’ context windows by training-free methods via analyzing and manipulating positional vectors, improving long-text processing.
Exploiting LLM Quantization
·1836 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 ETH Zurich
LLM quantization, while improving efficiency, creates a security risk: attackers can craft seemingly benign models that exhibit malicious behavior only when quantized.
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
·2629 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Warsaw University of Technology
D2DMoE boosts Transformer efficiency by up to 60% via smart activation sparsity and dynamic expert selection, outperforming existing methods.
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
·2281 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 UC Berkeley
This paper introduces a model-agnostic algorithm that uses natural language predicates to make statistical model parameters directly interpretable, significantly improving explainability.
Evaluating the World Model Implicit in a Generative Model
·4059 words·20 mins·
loading
·
loading
Large Language Models
🏢 Harvard University
New metrics reveal that generative models often possess surprisingly incoherent world models, despite seemingly accurate next-token predictions. This incoherence leads to fragility in solving related …
Estimating the Hallucination Rate of Generative AI
·3412 words·17 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Department of Statistics, Columbia University
New method estimates hallucination rates in generative AI’s in-context learning, improving model reliability.
ESPACE: Dimensionality Reduction of Activations for Model Compression
·2254 words·11 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 NVIDIA Research
ESPACE: A novel LLM compression technique achieving 50% model size reduction with minimal accuracy loss by cleverly projecting activations onto principal components.
Entity Alignment with Noisy Annotations from Large Language Models
·1820 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Hong Kong Polytechnic University
LLM4EA: A novel framework efficiently merges knowledge graphs using LLMs, overcoming noisy annotations and high costs via active learning and unsupervised label refinement, boosting accuracy and effic…
Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration
·2339 words·11 mins·
loading
·
loading
Large Language Models
🏢 Harbin Institute of Technology
DEEPEN: a training-free LLM ensemble framework fusing probability distributions in a relative space to overcome vocabulary misalignment, improving performance consistently across benchmarks.
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus
·3384 words·16 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Advanced AI Innovation Center, Hitachi
Boosting AI reasoning! New research enhances LLMs’ logical abilities via a principled synthetic logic corpus, achieving substantial improvements across logic, math, and coding benchmarks.
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
·3239 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Zhejiang University
Boosting LLM trustworthiness, researchers introduce Sparse Activation Control, a training-free method that concurrently enhances safety, factuality, and bias mitigation by selectively controlling atte…
Enhancing LLM’s Cognition via Structurization
·3694 words·18 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Zhejiang University
LLMs struggle with complex, long-form text. This paper introduces ‘context structurization,’ transforming unstructured text into a structured format to enhance LLM comprehension. Experiments across …
Enhancing Large Language Models through Adaptive Tokenizers
·1963 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Huawei Noah's Ark Lab
Adaptive tokenizers enhance LLMs by dynamically optimizing vocabulary during training, improving accuracy without increasing vocabulary size.