Skip to main content

Natural Language Processing

UQE: A Query Engine for Unstructured Databases
·1692 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Google DeepMind
UQE: A novel query engine uses LLMs for efficient and accurate unstructured data analytics, surpassing existing methods.
Unveiling LoRA Intrinsic Ranks via Salience Analysis
·1998 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Southeast University
SalientLoRA unveils optimal LoRA ranks by analyzing rank salience via time-series analysis, improving fine-tuning efficiency and performance significantly.
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
·2178 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Yale University
Transformers learn complex tasks surprisingly well through in-context learning, but the mechanism remains unclear. This paper proves that a two-layer transformer trained on n-gram Markov chain data co…
Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?
·2435 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏒 National University of Defense Technology
LLMs struggle with genuine causal reasoning; new benchmark CausalProbe-2024 reveals limitations, and G2-Reasoner method improves causal reasoning by integrating general knowledge and goal-oriented pro…
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
·2380 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏒 University of Washington
This study disentangles best practices for learning from preference feedback in LLMs, revealing that data quality, algorithm choice, and reward model significantly impact performance.
Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation
·2269 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Tsinghua Shenzhen International Graduate School
Unlocking intermediate layers in MLLMs improves referring expression generation by enhancing accuracy and detail while reducing hallucinations.
Universal In-Context Approximation By Prompting Fully Recurrent Models
·3295 words·16 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 University of Oxford
Fully recurrent neural networks can be universal in-context approximators, achieving the same capabilities as transformer models by cleverly using prompts.
UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation
·2138 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏒 ETH Zurich
UniBias unveils and mitigates LLM bias by identifying and eliminating biased internal components (FFN vectors and attention heads), significantly improving in-context learning performance and robustne…
Understanding Transformers via N-Gram Statistics
·3310 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Google DeepMind
LLMs’ inner workings remain elusive. This study uses N-gram statistics to approximate transformer predictions, revealing how LLMs learn from simple to complex statistical rules, and how model variance…
Understanding Transformer Reasoning Capabilities via Graph Algorithms
·2280 words·11 mins· loading · loading
Natural Language Processing Question Answering 🏒 Google Research
Transformers excel at graph reasoning, with logarithmic depth proving necessary and sufficient for parallelizable tasks; single-layer transformers solve retrieval tasks efficiently.
Understanding the Differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
·1735 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏒 ETH Zurich
Unifying framework reveals hidden connections between attention, recurrent, and state-space models, boosting foundation model efficiency.
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
·1955 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Georgia Institute of Technology
Deep learning scaling laws are explained by novel approximation and estimation theories for transformers on low-dimensional data, resolving discrepancies between theory and practice.
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
·3071 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏒 University of Tokyo
Linear probing then fine-tuning (LP-FT) significantly improves language model fine-tuning; this paper uses Neural Tangent Kernel (NTK) theory to explain why.
Understanding Information Storage and Transfer in Multi-Modal Large Language Models
·2906 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Microsoft Research
Researchers unveil how multi-modal LLMs process information, revealing that early layers are key for storage, and introduce MULTEDIT, a model-editing algorithm for correcting errors and inserting new …
Understanding Emergent Abilities of Language Models from the Loss Perspective
·1924 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Tsinghua University
Language model emergent abilities aren’t exclusive to large models; they emerge when pre-training loss falls below a threshold, irrespective of model or data size.
Understanding and Minimising Outlier Features in Transformer Training
·5007 words·24 mins· loading · loading
Natural Language Processing Large Language Models 🏒 ETH Zurich
New methods minimize outlier features in transformer training, improving quantization and efficiency without sacrificing convergence speed.
Uncovering Safety Risks of Large Language Models through Concept Activation Vector
·4605 words·22 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Renmin University of China
Researchers developed SCAV, a novel framework to effectively reveal safety risks in LLMs by accurately interpreting their safety mechanisms. SCAV-guided attacks significantly improve attack success r…
Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrast
·2047 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Tsinghua University
Self-Contrast Mixture-of-Experts (SCMoE) boosts MoE model reasoning by cleverly using ‘unchosen’ experts during inference. This training-free method contrasts outputs from strong and weak expert acti…
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs
·3889 words·19 mins· loading · loading
Natural Language Processing Large Language Models 🏒 National University of Singapore
Uncertainty of Thoughts (UoT) algorithm significantly boosts LLMs’ information-seeking abilities, leading to substantial performance gains across diverse tasks.
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
·2935 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Huazhong University of Science and Technology
Twin-Merging dynamically merges modular model expertise, significantly improving multitask performance without retraining, and adapting to diverse data.