Skip to main content

Natural Language Processing

Group Robust Preference Optimization in Reward-free RLHF
·2045 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University College London (UCL)
Group Robust Preference Optimization (GRPO) enhances reward-free RLHF by aligning LLMs to diverse group preferences, maximizing worst-case performance, and significantly improving fairness.
Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization
·2486 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 the Ohio State University
Transformers can learn implicit reasoning through ‘grokking’, achieving high accuracy in composition and comparison tasks; however, generalization varies across reasoning types.
GraphVis: Boosting LLMs with Visual Knowledge Graph Integration
·2376 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 UC Los Angeles
GraphVis boosts LLMs by visualizing knowledge graphs, improving accuracy in textual and visual question answering.
Graph Convolutions Enrich the Self-Attention in Transformers!
·4545 words·22 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Yonsei University
Graph Filter-based Self-Attention (GFSA) enhances Transformers by addressing oversmoothing, boosting performance across various tasks with minimal added parameters.
Grammar-Aligned Decoding
·7195 words·34 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Wisconsin-Madison
Adaptive Sampling with Approximate Expected Futures (ASAp) ensures LLMs generate grammatically correct outputs that closely match the model’s original probability distribution.
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
·2616 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong
Gradient Cuff: A novel defense mechanism against LLM jailbreaks, leveraging refusal loss landscapes for improved malicious query rejection without harming model performance on benign inputs.
Gorilla: Large Language Model Connected with Massive APIs
·2454 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 UC Berkeley
Gorilla: a fine-tuned LLaMA model surpasses GPT-4 in generating accurate API calls by using Retriever Aware Training (RAT) to adapt to changing APIs and reduce hallucinations.
Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers
·2454 words·12 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Meta AI
AI-powered sequence-to-sequence transformers surpass human and algorithmic abilities in discovering Lyapunov functions for dynamical systems, solving a long-standing open problem in mathematics.
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
·1742 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Minnesota
Reward learning from human demonstrations enhances supervised fine-tuning (SFT) for better LLM alignment.
Geometric-Averaged Preference Optimization for Soft Preference Labels
·2987 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Tokyo
Improving LLM alignment, this paper introduces soft preference labels & geometric averaging in Direct Preference Optimization, consistently improving performance on standard benchmarks.
Generative Hierarchical Materials Search
·1856 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
Generative Hierarchical Materials Search (GenMS) uses AI to design novel crystal structures from natural language descriptions, outperforming prior methods in both fulfilling user requests and finding…
General Detection-based Text Line Recognition
·2137 words·11 mins· loading · loading
AI Generated Natural Language Processing Text Recognition 🏢 LIGM, Ecole Des Ponts
A novel detection-based approach (DTLR) achieves state-of-the-art text line recognition across diverse scripts (Latin, Chinese, ciphers), overcoming challenges of character-level annotation and comple…
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
·2081 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Soochow University
Gated Slot Attention (GSA) enhances linear Transformers for efficient, real-time sequence modeling. GSA uses a two-layer gated linear attention structure linked by softmax, enabling improved memory ca…
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
·3454 words·17 mins· loading · loading
AI Generated Natural Language Processing Question Answering 🏢 National University of Singapore
G-Retriever: a novel RAG approach enables conversational interaction with textual graphs, improving graph understanding and question answering efficiency while mitigating hallucination.
Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
·4898 words·23 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin
This paper introduces a rate-distortion framework for prompt compression in LLMs, bridging the gap between existing methods and optimal performance. By formulating prompt compression as a linear progr…
From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When
·1923 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Michigan
LLMs’ in-context learning surprisingly arises from simple co-occurrence patterns in unstructured data, but positional information is key for complex tasks; ICL fails when patterns are unseen or fixed.
From Instance Training to Instruction Learning: Task Adapters Generation from Instructions
·2311 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tencent AI Lab
TAGI, a novel method, generates task-specific adapters from instructions, enhancing LLM cross-task generalization by using knowledge distillation and a two-stage hypernetwork training process.
Fractal Patterns May Illuminate the Success of Next-Token Prediction
·2223 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
LLMs’ success is explained by the self-similar, long-range dependent fractal structure of language; small-scale patterns reflect larger ones.
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
·2248 words·11 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin
Ms-PoE, a simple plug-and-play positional encoding, significantly improves LLMs’ ability to utilize long contexts by mitigating the ’lost-in-the-middle’ problem and enhancing the capacity to capture i…
fMRI predictors based on language models of increasing complexity recover brain left lateralization
·2912 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 CNRS, EHESS
Larger language models better predict brain activity in fMRI studies, with left-hemisphere prediction significantly increasing as model complexity scales up, reconciling classic aphasia findings with …