Skip to main content

Large Language Models

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks
·2846 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Georgia State University
VB-LoRA achieves extreme parameter efficiency in fine-tuning LLMs by sharing parameters globally via a vector bank, outperforming state-of-the-art PEFT methods while maintaining comparable or better p…
Vaccine: Perturbation-aware Alignment for Large Language Models against Harmful Fine-tuning Attack
·2382 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Georgia Institute of Technology
Vaccine: a novel technique safeguards LLMs against harmful fine-tuning attacks by creating invariant hidden embeddings.
UQE: A Query Engine for Unstructured Databases
·1692 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
UQE: A novel query engine uses LLMs for efficient and accurate unstructured data analytics, surpassing existing methods.
Unveiling LoRA Intrinsic Ranks via Salience Analysis
·1998 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Southeast University
SalientLoRA unveils optimal LoRA ranks by analyzing rank salience via time-series analysis, improving fine-tuning efficiency and performance significantly.
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
·2178 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Yale University
Transformers learn complex tasks surprisingly well through in-context learning, but the mechanism remains unclear. This paper proves that a two-layer transformer trained on n-gram Markov chain data co…
Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?
·2435 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 National University of Defense Technology
LLMs struggle with genuine causal reasoning; new benchmark CausalProbe-2024 reveals limitations, and G2-Reasoner method improves causal reasoning by integrating general knowledge and goal-oriented pro…
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
·2380 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Washington
This study disentangles best practices for learning from preference feedback in LLMs, revealing that data quality, algorithm choice, and reward model significantly impact performance.
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
·2088 words·10 mins· loading · loading
Large Language Models 🏢 New York University
Unlocking tight generalization bounds for massive LLMs using a novel token-level approach.
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought
·2755 words·13 mins· loading · loading
Large Language Models 🏢 Chinese University of Hong Kong
Reasoning Boundary Framework (RBF) quantitatively assesses and optimizes chain-of-thought (CoT) in LLMs, offering novel metrics and optimization strategies validated across various models and tasks.
Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation
·2269 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua Shenzhen International Graduate School
Unlocking intermediate layers in MLLMs improves referring expression generation by enhancing accuracy and detail while reducing hallucinations.
Universal In-Context Approximation By Prompting Fully Recurrent Models
·3295 words·16 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Oxford
Fully recurrent neural networks can be universal in-context approximators, achieving the same capabilities as transformer models by cleverly using prompts.
UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation
·2138 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 ETH Zurich
UniBias unveils and mitigates LLM bias by identifying and eliminating biased internal components (FFN vectors and attention heads), significantly improving in-context learning performance and robustne…
Understanding Transformers via N-Gram Statistics
·3310 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
LLMs’ inner workings remain elusive. This study uses N-gram statistics to approximate transformer predictions, revealing how LLMs learn from simple to complex statistical rules, and how model variance…
Understanding the Differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
·1735 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 ETH Zurich
Unifying framework reveals hidden connections between attention, recurrent, and state-space models, boosting foundation model efficiency.
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
·1955 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Georgia Institute of Technology
Deep learning scaling laws are explained by novel approximation and estimation theories for transformers on low-dimensional data, resolving discrepancies between theory and practice.
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
·3071 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Tokyo
Linear probing then fine-tuning (LP-FT) significantly improves language model fine-tuning; this paper uses Neural Tangent Kernel (NTK) theory to explain why.
Understanding Information Storage and Transfer in Multi-Modal Large Language Models
·2906 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Microsoft Research
Researchers unveil how multi-modal LLMs process information, revealing that early layers are key for storage, and introduce MULTEDIT, a model-editing algorithm for correcting errors and inserting new …
Understanding Emergent Abilities of Language Models from the Loss Perspective
·1924 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
Language model emergent abilities aren’t exclusive to large models; they emerge when pre-training loss falls below a threshold, irrespective of model or data size.
Understanding and Minimising Outlier Features in Transformer Training
·5007 words·24 mins· loading · loading
Natural Language Processing Large Language Models 🏢 ETH Zurich
New methods minimize outlier features in transformer training, improving quantization and efficiency without sacrificing convergence speed.
Uncovering Safety Risks of Large Language Models through Concept Activation Vector
·4605 words·22 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Renmin University of China
Researchers developed SCAV, a novel framework to effectively reveal safety risks in LLMs by accurately interpreting their safety mechanisms. SCAV-guided attacks significantly improve attack success r…