Large Language Models

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

26 September 2024·2846 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Georgia State University

VB-LoRA achieves extreme parameter efficiency in fine-tuning LLMs by sharing parameters globally via a vector bank, outperforming state-of-the-art PEFT methods while maintaining comparable or better p…

Vaccine: Perturbation-aware Alignment for Large Language Models against Harmful Fine-tuning Attack

26 September 2024·2382 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Georgia Institute of Technology

Vaccine: a novel technique safeguards LLMs against harmful fine-tuning attacks by creating invariant hidden embeddings.

UQE: A Query Engine for Unstructured Databases

26 September 2024·1692 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

UQE: A novel query engine uses LLMs for efficient and accurate unstructured data analytics, surpassing existing methods.

Unveiling LoRA Intrinsic Ranks via Salience Analysis

26 September 2024·1998 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Southeast University

SalientLoRA unveils optimal LoRA ranks by analyzing rank salience via time-series analysis, improving fine-tuning efficiency and performance significantly.

Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers

26 September 2024·2178 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Yale University

Transformers learn complex tasks surprisingly well through in-context learning, but the mechanism remains unclear. This paper proves that a two-layer transformer trained on n-gram Markov chain data co…

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

26 September 2024·2435 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Defense Technology

LLMs struggle with genuine causal reasoning; new benchmark CausalProbe-2024 reveals limitations, and G2-Reasoner method improves causal reasoning by integrating general knowledge and goal-oriented pro…

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

26 September 2024·2380 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Washington

This study disentangles best practices for learning from preference feedback in LLMs, revealing that data quality, algorithm choice, and reward model significantly impact performance.

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

26 September 2024·2088 words·10 mins· loading · loading

Large Language Models 🏢 New York University

Unlocking tight generalization bounds for massive LLMs using a novel token-level approach.

Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought

26 September 2024·2755 words·13 mins· loading · loading

Large Language Models 🏢 Chinese University of Hong Kong

Reasoning Boundary Framework (RBF) quantitatively assesses and optimizes chain-of-thought (CoT) in LLMs, offering novel metrics and optimization strategies validated across various models and tasks.

Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation

26 September 2024·2269 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua Shenzhen International Graduate School

Unlocking intermediate layers in MLLMs improves referring expression generation by enhancing accuracy and detail while reducing hallucinations.

Universal In-Context Approximation By Prompting Fully Recurrent Models

26 September 2024·3295 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Oxford

Fully recurrent neural networks can be universal in-context approximators, achieving the same capabilities as transformer models by cleverly using prompts.

UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation

26 September 2024·2138 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ETH Zurich

UniBias unveils and mitigates LLM bias by identifying and eliminating biased internal components (FFN vectors and attention heads), significantly improving in-context learning performance and robustne…

Understanding Transformers via N-Gram Statistics

26 September 2024·3310 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs’ inner workings remain elusive. This study uses N-gram statistics to approximate transformer predictions, revealing how LLMs learn from simple to complex statistical rules, and how model variance…

Understanding the Differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

26 September 2024·1735 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ETH Zurich

Unifying framework reveals hidden connections between attention, recurrent, and state-space models, boosting foundation model efficiency.

Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data

26 September 2024·1955 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Georgia Institute of Technology

Deep learning scaling laws are explained by novel approximation and estimation theories for transformers on low-dimensional data, resolving discrepancies between theory and practice.

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

26 September 2024·3071 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Tokyo

Linear probing then fine-tuning (LP-FT) significantly improves language model fine-tuning; this paper uses Neural Tangent Kernel (NTK) theory to explain why.

Understanding Information Storage and Transfer in Multi-Modal Large Language Models

26 September 2024·2906 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

Researchers unveil how multi-modal LLMs process information, revealing that early layers are key for storage, and introduce MULTEDIT, a model-editing algorithm for correcting errors and inserting new …

Understanding Emergent Abilities of Language Models from the Loss Perspective

26 September 2024·1924 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

Language model emergent abilities aren’t exclusive to large models; they emerge when pre-training loss falls below a threshold, irrespective of model or data size.

Understanding and Minimising Outlier Features in Transformer Training

26 September 2024·5007 words·24 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ETH Zurich

New methods minimize outlier features in transformer training, improving quantization and efficiency without sacrificing convergence speed.

Uncovering Safety Risks of Large Language Models through Concept Activation Vector

26 September 2024·4605 words·22 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Renmin University of China

Researchers developed SCAV, a novel framework to effectively reveal safety risks in LLMs by accurately interpreting their safety mechanisms. SCAV-guided attacks significantly improve attack success r…