Large Language Models

Graph-based Uncertainty Metrics for Long-form Language Model Generations

26 September 2024·2055 words·10 mins· loading · loading

Large Language Models 🏢 Stanford University

Graph Uncertainty boosts LLM factuality by 6.8% using graph centrality to estimate claim-level uncertainty and a novel uncertainty-aware decoding process.

Graph Convolutions Enrich the Self-Attention in Transformers!

26 September 2024·4545 words·22 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Yonsei University

Graph Filter-based Self-Attention (GFSA) enhances Transformers by addressing oversmoothing, boosting performance across various tasks with minimal added parameters.

Grammar-Aligned Decoding

26 September 2024·7195 words·34 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Wisconsin-Madison

Adaptive Sampling with Approximate Expected Futures (ASAp) ensures LLMs generate grammatically correct outputs that closely match the model’s original probability distribution.

Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

26 September 2024·2616 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

Gradient Cuff: A novel defense mechanism against LLM jailbreaks, leveraging refusal loss landscapes for improved malicious query rejection without harming model performance on benign inputs.

Gorilla: Large Language Model Connected with Massive APIs

26 September 2024·2454 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

Gorilla: a fine-tuned LLaMA model surpasses GPT-4 in generating accurate API calls by using Retriever Aware Training (RAT) to adapt to changing APIs and reduce hallucinations.

Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers

26 September 2024·2454 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Meta AI

AI-powered sequence-to-sequence transformers surpass human and algorithmic abilities in discovering Lyapunov functions for dynamical systems, solving a long-standing open problem in mathematics.

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

26 September 2024·1742 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Minnesota

Reward learning from human demonstrations enhances supervised fine-tuning (SFT) for better LLM alignment.

Geometric-Averaged Preference Optimization for Soft Preference Labels

26 September 2024·2987 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Tokyo

Improving LLM alignment, this paper introduces soft preference labels & geometric averaging in Direct Preference Optimization, consistently improving performance on standard benchmarks.

Generative Hierarchical Materials Search

26 September 2024·1856 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

Generative Hierarchical Materials Search (GenMS) uses AI to design novel crystal structures from natural language descriptions, outperforming prior methods in both fulfilling user requests and finding…

Generated and Pseudo Content guided Prototype Refinement for Few-shot Point Cloud Segmentation

26 September 2024·1934 words·10 mins· loading · loading

Large Language Models 🏢 Beijing Jiaotong University

LLM-powered prototype refinement boosts few-shot 3D point cloud segmentation accuracy.

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

26 September 2024·2081 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Soochow University

Gated Slot Attention (GSA) enhances linear Transformers for efficient, real-time sequence modeling. GSA uses a two-layer gated linear attention structure linked by softmax, enabling improved memory ca…

Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models

26 September 2024·4898 words·23 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin

This paper introduces a rate-distortion framework for prompt compression in LLMs, bridging the gap between existing methods and optimal performance. By formulating prompt compression as a linear progr…

From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When

26 September 2024·1923 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Michigan

LLMs’ in-context learning surprisingly arises from simple co-occurrence patterns in unstructured data, but positional information is key for complex tasks; ICL fails when patterns are unseen or fixed.

From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

26 September 2024·2311 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tencent AI Lab

TAGI, a novel method, generates task-specific adapters from instructions, enhancing LLM cross-task generalization by using knowledge distillation and a two-stage hypernetwork training process.

Fractal Patterns May Illuminate the Success of Next-Token Prediction

26 September 2024·2223 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs’ success is explained by the self-similar, long-range dependent fractal structure of language; small-scale patterns reflect larger ones.

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

26 September 2024·2248 words·11 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin

Ms-PoE, a simple plug-and-play positional encoding, significantly improves LLMs’ ability to utilize long contexts by mitigating the ’lost-in-the-middle’ problem and enhancing the capacity to capture i…

fMRI predictors based on language models of increasing complexity recover brain left lateralization

26 September 2024·2912 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 CNRS, EHESS

Larger language models better predict brain activity in fMRI studies, with left-hemisphere prediction significantly increasing as model complexity scales up, reconciling classic aphasia findings with …

FM-Delta: Lossless Compression for Storing Massive Fine-tuned Foundation Models

26 September 2024·3523 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications

FM-Delta: Lossless compression halves cloud storage for massive fine-tuned language models, saving costs without sacrificing accuracy.

FlowLLM: Flow Matching for Material Generation with Large Language Models as Base Distributions

26 September 2024·2004 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Meta AI

FlowLLM revolutionizes material design by cleverly merging large language models and Riemannian flow matching, yielding a 300% boost in stable material generation!

FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low-Rank Adaptations

26 September 2024·1833 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Maryland

FLORA enables efficient & private federated fine-tuning of LLMs via novel stacking-based heterogeneous low-rank adaptation, surpassing existing methods.