Natural Language Processing

Group Robust Preference Optimization in Reward-free RLHF

26 September 2024·2045 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University College London (UCL)

Group Robust Preference Optimization (GRPO) enhances reward-free RLHF by aligning LLMs to diverse group preferences, maximizing worst-case performance, and significantly improving fairness.

Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization

26 September 2024·2486 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 the Ohio State University

Transformers can learn implicit reasoning through ‘grokking’, achieving high accuracy in composition and comparison tasks; however, generalization varies across reasoning types.

GraphVis: Boosting LLMs with Visual Knowledge Graph Integration

26 September 2024·2376 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Los Angeles

GraphVis boosts LLMs by visualizing knowledge graphs, improving accuracy in textual and visual question answering.

Graph Convolutions Enrich the Self-Attention in Transformers!

26 September 2024·4545 words·22 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Yonsei University

Graph Filter-based Self-Attention (GFSA) enhances Transformers by addressing oversmoothing, boosting performance across various tasks with minimal added parameters.

Grammar-Aligned Decoding

26 September 2024·7195 words·34 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Wisconsin-Madison

Adaptive Sampling with Approximate Expected Futures (ASAp) ensures LLMs generate grammatically correct outputs that closely match the model’s original probability distribution.

Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

26 September 2024·2616 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

Gradient Cuff: A novel defense mechanism against LLM jailbreaks, leveraging refusal loss landscapes for improved malicious query rejection without harming model performance on benign inputs.

Gorilla: Large Language Model Connected with Massive APIs

26 September 2024·2454 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

Gorilla: a fine-tuned LLaMA model surpasses GPT-4 in generating accurate API calls by using Retriever Aware Training (RAT) to adapt to changing APIs and reduce hallucinations.

Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers

26 September 2024·2454 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Meta AI

AI-powered sequence-to-sequence transformers surpass human and algorithmic abilities in discovering Lyapunov functions for dynamical systems, solving a long-standing open problem in mathematics.

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

26 September 2024·1742 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Minnesota

Reward learning from human demonstrations enhances supervised fine-tuning (SFT) for better LLM alignment.

Geometric-Averaged Preference Optimization for Soft Preference Labels

26 September 2024·2987 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Tokyo

Improving LLM alignment, this paper introduces soft preference labels & geometric averaging in Direct Preference Optimization, consistently improving performance on standard benchmarks.

Generative Hierarchical Materials Search

26 September 2024·1856 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

Generative Hierarchical Materials Search (GenMS) uses AI to design novel crystal structures from natural language descriptions, outperforming prior methods in both fulfilling user requests and finding…

General Detection-based Text Line Recognition

26 September 2024·2137 words·11 mins· loading · loading

AI Generated Natural Language Processing Text Recognition 🏢 LIGM, Ecole Des Ponts

A novel detection-based approach (DTLR) achieves state-of-the-art text line recognition across diverse scripts (Latin, Chinese, ciphers), overcoming challenges of character-level annotation and comple…

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

26 September 2024·2081 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Soochow University

Gated Slot Attention (GSA) enhances linear Transformers for efficient, real-time sequence modeling. GSA uses a two-layer gated linear attention structure linked by softmax, enabling improved memory ca…

G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering

26 September 2024·3454 words·17 mins· loading · loading

AI Generated Natural Language Processing Question Answering 🏢 National University of Singapore

G-Retriever: a novel RAG approach enables conversational interaction with textual graphs, improving graph understanding and question answering efficiency while mitigating hallucination.

Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models

26 September 2024·4898 words·23 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin

This paper introduces a rate-distortion framework for prompt compression in LLMs, bridging the gap between existing methods and optimal performance. By formulating prompt compression as a linear progr…

From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When

26 September 2024·1923 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Michigan

LLMs’ in-context learning surprisingly arises from simple co-occurrence patterns in unstructured data, but positional information is key for complex tasks; ICL fails when patterns are unseen or fixed.

From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

26 September 2024·2311 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tencent AI Lab

TAGI, a novel method, generates task-specific adapters from instructions, enhancing LLM cross-task generalization by using knowledge distillation and a two-stage hypernetwork training process.

Fractal Patterns May Illuminate the Success of Next-Token Prediction

26 September 2024·2223 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs’ success is explained by the self-similar, long-range dependent fractal structure of language; small-scale patterns reflect larger ones.

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

26 September 2024·2248 words·11 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin

Ms-PoE, a simple plug-and-play positional encoding, significantly improves LLMs’ ability to utilize long contexts by mitigating the ’lost-in-the-middle’ problem and enhancing the capacity to capture i…

fMRI predictors based on language models of increasing complexity recover brain left lateralization

26 September 2024·2912 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 CNRS, EHESS

Larger language models better predict brain activity in fMRI studies, with left-hemisphere prediction significantly increasing as model complexity scales up, reconciling classic aphasia findings with …