Natural Language Processing
LoQT: Low-Rank Adapters for Quantized Pretraining
·2483 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Copenhagen
LoQT enables efficient large language model training on consumer hardware via quantized weights and low-rank weight updates, overcoming memory limitations.
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
·2344 words·12 mins·
loading
·
loading
Natural Language Processing
Question Answering
🏢 Xi'an Jiaotong University
New dataset MUSIC-AVQA-R and a multi-faceted cycle collaborative debiasing strategy significantly improve audio-visual question answering robustness.
Long-form factuality in large language models
·4779 words·23 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Google DeepMind
LLMs often generate factually inaccurate long-form text. This work introduces LongFact, a new benchmark dataset of 2280 fact-seeking prompts, and SAFE, a novel automated evaluation method that outperf…
Loki: Low-rank Keys for Efficient Sparse Attention
·3255 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Maryland
Loki: Low-rank Keys for Efficient Sparse Attention accelerates attention mechanisms in LLMs by exploiting the low-dimensionality of key vectors. It dynamically selects key tokens based on approximate…
LoFiT: Localized Fine-tuning on LLM Representations
·4045 words·19 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 University of Texas at Austin
LOFIT: Localized fine-tuning boosts LLMs’ performance by selectively training only a small subset of attention heads, achieving comparable accuracy to other methods while using significantly fewer par…
Local to Global: Learning Dynamics and Effect of Initialization for Transformers
·2433 words·12 mins·
loading
·
loading
AI Generated
Natural Language Processing
Text Generation
🏢 EPFL
Transformers’ learning dynamics depend heavily on initialization and Markovian data properties, leading to either global or local minima; this paper proves this, offers initialization guidelines, and …
LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings
·1927 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Beihang University
TEA-GLM leverages LLMs for zero-shot graph learning by aligning GNN representations with LLM token embeddings, achieving state-of-the-art performance on unseen datasets and tasks.
LLMDFA: Analyzing Dataflow in Code with Large Language Models
·3865 words·19 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Purdue University
LLMDFA: A novel LLM-powered framework performs compilation-free and customizable dataflow analysis, achieving high accuracy in bug detection by decomposing the task into sub-problems and mitigating L…
LLM-Check: Investigating Detection of Hallucinations in Large Language Models
·2270 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Maryland, College Park
LLM-Check efficiently detects LLM hallucinations in a single response, using internal model analysis, improving real-time applications.
LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language
·5678 words·27 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Toronto
LLM Processes leverage LLMs to create probabilistic regression models guided by natural language, enabling seamless integration of expert knowledge and improving prediction accuracy.
LLM Dataset Inference: Did you train on my dataset?
·4983 words·24 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
LLM dataset inference reliably detects if a dataset was used in training, overcoming limitations of existing membership inference attacks.
LLM Circuit Analyses Are Consistent Across Training and Scale
·2075 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 EleutherAl
LLM circuit analyses remain consistent across model scales and extensive training, enabling more efficient interpretability research.
LLaNA: Large Language and NeRF Assistant
·4250 words·20 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 University of Bologna
LLaNA: A novel Multimodal Large Language Model directly processes NeRF weights to enable NeRF captioning and Q&A, outperforming traditional 2D/3D-based methods.
LLaMo: Large Language Model-based Molecular Graph Assistant
·2751 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Korea University
LLaMo, a novel Large Language Model-based Molecular Graph Assistant, uses multi-level graph projection and instruction tuning to achieve superior performance on diverse molecular tasks.
LIVE: Learnable In-Context Vector for Visual Question Answering
·3429 words·17 mins·
loading
·
loading
Natural Language Processing
Question Answering
🏢 Southeast University
LIVE, a novel learnable in-context vector, significantly improves visual question answering by reducing computational costs and enhancing accuracy compared to traditional ICL methods.
Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
·2933 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Georgia Institute of Technology
Lisa: a novel lazy safety alignment method safeguards LLMs against harmful fine-tuning attacks by introducing a proximal term to constrain model drift, significantly improving alignment performance.
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
·3222 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
LISA, a layerwise importance sampling method, dramatically improves memory-efficient large language model fine-tuning, outperforming existing methods while using less GPU memory.
Linking In-context Learning in Transformers to Human Episodic Memory
·3883 words·19 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 UC San Diego
Transformers’ in-context learning mirrors human episodic memory, with specific attention heads acting like the brain’s contextual maintenance and retrieval system.
Linguistic Collapse: Neural Collapse in (Large) Language Models
·6528 words·31 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 University of Toronto
Scaling causal language models reveals a connection between neural collapse properties, model size, and improved generalization, highlighting NC’s broader relevance to LLMs.
Limits of Transformer Language Models on Learning to Compose Algorithms
·2755 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 IBM Research
Large Language Models struggle with compositional tasks, requiring exponentially more data than expected for learning compared to learning sub-tasks individually. This paper reveals surprising sample …