Natural Language Processing

UQE: A Query Engine for Unstructured Databases

26 September 2024·1692 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

UQE: A novel query engine uses LLMs for efficient and accurate unstructured data analytics, surpassing existing methods.

Unveiling LoRA Intrinsic Ranks via Salience Analysis

26 September 2024·1998 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Southeast University

SalientLoRA unveils optimal LoRA ranks by analyzing rank salience via time-series analysis, improving fine-tuning efficiency and performance significantly.

Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers

26 September 2024·2178 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Yale University

Transformers learn complex tasks surprisingly well through in-context learning, but the mechanism remains unclear. This paper proves that a two-layer transformer trained on n-gram Markov chain data co…

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

26 September 2024·2435 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Defense Technology

LLMs struggle with genuine causal reasoning; new benchmark CausalProbe-2024 reveals limitations, and G2-Reasoner method improves causal reasoning by integrating general knowledge and goal-oriented pro…

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

26 September 2024·2380 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Washington

This study disentangles best practices for learning from preference feedback in LLMs, revealing that data quality, algorithm choice, and reward model significantly impact performance.

Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation

26 September 2024·2269 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua Shenzhen International Graduate School

Unlocking intermediate layers in MLLMs improves referring expression generation by enhancing accuracy and detail while reducing hallucinations.

Universal In-Context Approximation By Prompting Fully Recurrent Models

26 September 2024·3295 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Oxford

Fully recurrent neural networks can be universal in-context approximators, achieving the same capabilities as transformer models by cleverly using prompts.

UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation

26 September 2024·2138 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ETH Zurich

UniBias unveils and mitigates LLM bias by identifying and eliminating biased internal components (FFN vectors and attention heads), significantly improving in-context learning performance and robustne…

Understanding Transformers via N-Gram Statistics

26 September 2024·3310 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs’ inner workings remain elusive. This study uses N-gram statistics to approximate transformer predictions, revealing how LLMs learn from simple to complex statistical rules, and how model variance…

Understanding Transformer Reasoning Capabilities via Graph Algorithms

26 September 2024·2280 words·11 mins· loading · loading

Natural Language Processing Question Answering 🏢 Google Research

Transformers excel at graph reasoning, with logarithmic depth proving necessary and sufficient for parallelizable tasks; single-layer transformers solve retrieval tasks efficiently.

Understanding the Differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

26 September 2024·1735 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ETH Zurich

Unifying framework reveals hidden connections between attention, recurrent, and state-space models, boosting foundation model efficiency.

Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data

26 September 2024·1955 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Georgia Institute of Technology

Deep learning scaling laws are explained by novel approximation and estimation theories for transformers on low-dimensional data, resolving discrepancies between theory and practice.

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

26 September 2024·3071 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Tokyo

Linear probing then fine-tuning (LP-FT) significantly improves language model fine-tuning; this paper uses Neural Tangent Kernel (NTK) theory to explain why.

Understanding Information Storage and Transfer in Multi-Modal Large Language Models

26 September 2024·2906 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

Researchers unveil how multi-modal LLMs process information, revealing that early layers are key for storage, and introduce MULTEDIT, a model-editing algorithm for correcting errors and inserting new …

Understanding Emergent Abilities of Language Models from the Loss Perspective

26 September 2024·1924 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

Language model emergent abilities aren’t exclusive to large models; they emerge when pre-training loss falls below a threshold, irrespective of model or data size.

Understanding and Minimising Outlier Features in Transformer Training

26 September 2024·5007 words·24 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ETH Zurich

New methods minimize outlier features in transformer training, improving quantization and efficiency without sacrificing convergence speed.

Uncovering Safety Risks of Large Language Models through Concept Activation Vector

26 September 2024·4605 words·22 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Renmin University of China

Researchers developed SCAV, a novel framework to effectively reveal safety risks in LLMs by accurately interpreting their safety mechanisms. SCAV-guided attacks significantly improve attack success r…

Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrast

26 September 2024·2047 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

Self-Contrast Mixture-of-Experts (SCMoE) boosts MoE model reasoning by cleverly using ‘unchosen’ experts during inference. This training-free method contrasts outputs from strong and weak expert acti…

Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs

26 September 2024·3889 words·19 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

Uncertainty of Thoughts (UoT) algorithm significantly boosts LLMs’ information-seeking abilities, leading to substantial performance gains across diverse tasks.

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

26 September 2024·2935 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Huazhong University of Science and Technology

Twin-Merging dynamically merges modular model expertise, significantly improving multitask performance without retraining, and adapting to diverse data.