Natural Language Processing

LoQT: Low-Rank Adapters for Quantized Pretraining

26 September 2024·2483 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Copenhagen

LoQT enables efficient large language model training on consumer hardware via quantized weights and low-rank weight updates, overcoming memory limitations.

Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

26 September 2024·2344 words·12 mins· loading · loading

Natural Language Processing Question Answering 🏢 Xi'an Jiaotong University

New dataset MUSIC-AVQA-R and a multi-faceted cycle collaborative debiasing strategy significantly improve audio-visual question answering robustness.

Long-form factuality in large language models

26 September 2024·4779 words·23 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs often generate factually inaccurate long-form text. This work introduces LongFact, a new benchmark dataset of 2280 fact-seeking prompts, and SAFE, a novel automated evaluation method that outperf…

Loki: Low-rank Keys for Efficient Sparse Attention

26 September 2024·3255 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Maryland

Loki: Low-rank Keys for Efficient Sparse Attention accelerates attention mechanisms in LLMs by exploiting the low-dimensionality of key vectors. It dynamically selects key tokens based on approximate…

LoFiT: Localized Fine-tuning on LLM Representations

26 September 2024·4045 words·19 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin

LOFIT: Localized fine-tuning boosts LLMs’ performance by selectively training only a small subset of attention heads, achieving comparable accuracy to other methods while using significantly fewer par…

Local to Global: Learning Dynamics and Effect of Initialization for Transformers

26 September 2024·2433 words·12 mins· loading · loading

AI Generated Natural Language Processing Text Generation 🏢 EPFL

Transformers’ learning dynamics depend heavily on initialization and Markovian data properties, leading to either global or local minima; this paper proves this, offers initialization guidelines, and …

LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings

26 September 2024·1927 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Beihang University

TEA-GLM leverages LLMs for zero-shot graph learning by aligning GNN representations with LLM token embeddings, achieving state-of-the-art performance on unseen datasets and tasks.

LLMDFA: Analyzing Dataflow in Code with Large Language Models

26 September 2024·3865 words·19 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Purdue University

LLMDFA: A novel LLM-powered framework performs compilation-free and customizable dataflow analysis, achieving high accuracy in bug detection by decomposing the task into sub-problems and mitigating L…

LLM-Check: Investigating Detection of Hallucinations in Large Language Models

26 September 2024·2270 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Maryland, College Park

LLM-Check efficiently detects LLM hallucinations in a single response, using internal model analysis, improving real-time applications.

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

26 September 2024·5678 words·27 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Toronto

LLM Processes leverage LLMs to create probabilistic regression models guided by natural language, enabling seamless integration of expert knowledge and improving prediction accuracy.

LLM Dataset Inference: Did you train on my dataset?

26 September 2024·4983 words·24 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

LLM dataset inference reliably detects if a dataset was used in training, overcoming limitations of existing membership inference attacks.

LLM Circuit Analyses Are Consistent Across Training and Scale

26 September 2024·2075 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 EleutherAl

LLM circuit analyses remain consistent across model scales and extensive training, enabling more efficient interpretability research.

LLaNA: Large Language and NeRF Assistant

26 September 2024·4250 words·20 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Bologna

LLaNA: A novel Multimodal Large Language Model directly processes NeRF weights to enable NeRF captioning and Q&A, outperforming traditional 2D/3D-based methods.

LLaMo: Large Language Model-based Molecular Graph Assistant

26 September 2024·2751 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Korea University

LLaMo, a novel Large Language Model-based Molecular Graph Assistant, uses multi-level graph projection and instruction tuning to achieve superior performance on diverse molecular tasks.

LIVE: Learnable In-Context Vector for Visual Question Answering

26 September 2024·3429 words·17 mins· loading · loading

Natural Language Processing Question Answering 🏢 Southeast University

LIVE, a novel learnable in-context vector, significantly improves visual question answering by reducing computational costs and enhancing accuracy compared to traditional ICL methods.

Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack

26 September 2024·2933 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Georgia Institute of Technology

Lisa: a novel lazy safety alignment method safeguards LLMs against harmful fine-tuning attacks by introducing a proximal term to constrain model drift, significantly improving alignment performance.

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

26 September 2024·3222 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

LISA, a layerwise importance sampling method, dramatically improves memory-efficient large language model fine-tuning, outperforming existing methods while using less GPU memory.

Linking In-context Learning in Transformers to Human Episodic Memory

26 September 2024·3883 words·19 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 UC San Diego

Transformers’ in-context learning mirrors human episodic memory, with specific attention heads acting like the brain’s contextual maintenance and retrieval system.

Linguistic Collapse: Neural Collapse in (Large) Language Models

26 September 2024·6528 words·31 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Toronto

Scaling causal language models reveals a connection between neural collapse properties, model size, and improved generalization, highlighting NC’s broader relevance to LLMs.

Limits of Transformer Language Models on Learning to Compose Algorithms

26 September 2024·2755 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

Large Language Models struggle with compositional tasks, requiring exponentially more data than expected for learning compared to learning sub-tasks individually. This paper reveals surprising sample …