Skip to main content

Large Language Models

Graph-based Uncertainty Metrics for Long-form Language Model Generations
·2055 words·10 mins· loading · loading
Large Language Models 🏢 Stanford University
Graph Uncertainty boosts LLM factuality by 6.8% using graph centrality to estimate claim-level uncertainty and a novel uncertainty-aware decoding process.
Graph Convolutions Enrich the Self-Attention in Transformers!
·4545 words·22 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Yonsei University
Graph Filter-based Self-Attention (GFSA) enhances Transformers by addressing oversmoothing, boosting performance across various tasks with minimal added parameters.
Grammar-Aligned Decoding
·7195 words·34 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Wisconsin-Madison
Adaptive Sampling with Approximate Expected Futures (ASAp) ensures LLMs generate grammatically correct outputs that closely match the model’s original probability distribution.
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
·2616 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong
Gradient Cuff: A novel defense mechanism against LLM jailbreaks, leveraging refusal loss landscapes for improved malicious query rejection without harming model performance on benign inputs.
Gorilla: Large Language Model Connected with Massive APIs
·2454 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 UC Berkeley
Gorilla: a fine-tuned LLaMA model surpasses GPT-4 in generating accurate API calls by using Retriever Aware Training (RAT) to adapt to changing APIs and reduce hallucinations.
Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers
·2454 words·12 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Meta AI
AI-powered sequence-to-sequence transformers surpass human and algorithmic abilities in discovering Lyapunov functions for dynamical systems, solving a long-standing open problem in mathematics.
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
·1742 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Minnesota
Reward learning from human demonstrations enhances supervised fine-tuning (SFT) for better LLM alignment.
Geometric-Averaged Preference Optimization for Soft Preference Labels
·2987 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Tokyo
Improving LLM alignment, this paper introduces soft preference labels & geometric averaging in Direct Preference Optimization, consistently improving performance on standard benchmarks.
Generative Hierarchical Materials Search
·1856 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
Generative Hierarchical Materials Search (GenMS) uses AI to design novel crystal structures from natural language descriptions, outperforming prior methods in both fulfilling user requests and finding…
Generated and Pseudo Content guided Prototype Refinement for Few-shot Point Cloud Segmentation
·1934 words·10 mins· loading · loading
Large Language Models 🏢 Beijing Jiaotong University
LLM-powered prototype refinement boosts few-shot 3D point cloud segmentation accuracy.
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
·2081 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Soochow University
Gated Slot Attention (GSA) enhances linear Transformers for efficient, real-time sequence modeling. GSA uses a two-layer gated linear attention structure linked by softmax, enabling improved memory ca…
Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
·4898 words·23 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin
This paper introduces a rate-distortion framework for prompt compression in LLMs, bridging the gap between existing methods and optimal performance. By formulating prompt compression as a linear progr…
From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When
·1923 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Michigan
LLMs’ in-context learning surprisingly arises from simple co-occurrence patterns in unstructured data, but positional information is key for complex tasks; ICL fails when patterns are unseen or fixed.
From Instance Training to Instruction Learning: Task Adapters Generation from Instructions
·2311 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tencent AI Lab
TAGI, a novel method, generates task-specific adapters from instructions, enhancing LLM cross-task generalization by using knowledge distillation and a two-stage hypernetwork training process.
Fractal Patterns May Illuminate the Success of Next-Token Prediction
·2223 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
LLMs’ success is explained by the self-similar, long-range dependent fractal structure of language; small-scale patterns reflect larger ones.
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
·2248 words·11 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin
Ms-PoE, a simple plug-and-play positional encoding, significantly improves LLMs’ ability to utilize long contexts by mitigating the ’lost-in-the-middle’ problem and enhancing the capacity to capture i…
fMRI predictors based on language models of increasing complexity recover brain left lateralization
·2912 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 CNRS, EHESS
Larger language models better predict brain activity in fMRI studies, with left-hemisphere prediction significantly increasing as model complexity scales up, reconciling classic aphasia findings with …
FM-Delta: Lossless Compression for Storing Massive Fine-tuned Foundation Models
·3523 words·17 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications
FM-Delta: Lossless compression halves cloud storage for massive fine-tuned language models, saving costs without sacrificing accuracy.
FlowLLM: Flow Matching for Material Generation with Large Language Models as Base Distributions
·2004 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Meta AI
FlowLLM revolutionizes material design by cleverly merging large language models and Riemannian flow matching, yielding a 300% boost in stable material generation!
FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low-Rank Adaptations
·1833 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Maryland
FLORA enables efficient & private federated fine-tuning of LLMs via novel stacking-based heterogeneous low-rank adaptation, surpassing existing methods.