Natural Language Processing

On Affine Homotopy between Language Encoders

26 September 2024·2070 words·10 mins· loading · loading

AI Generated Natural Language Processing Representation Learning 🏢 ETH Zurich

This paper introduces a novel notion of intrinsic similarity between language encoders, based on affine homotopy, and demonstrates its strong correlation with extrinsic similarity (downstream task per…

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

26 September 2024·2170 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MIT

OccamLLM: LLMs now perform accurate arithmetic in a single step!

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

26 September 2024·2502 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cerebras Systems

By cleverly integrating per-example gradient norm calculations during the backward pass of LayerNorm layers, this research enables efficient and accurate gradient noise scale estimation in Transformer…

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

26 September 2024·2513 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Rice University

NoMAD-Attention achieves up to 2x speedup in 4-bit quantized LLaMA inference on CPUs by replacing computationally expensive multiply-add operations with ultra-low-latency in-register lookups.

NoiseGPT: Label Noise Detection and Rectification through Probability Curvature

26 September 2024·2389 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Beijing Institute of Technology

NoiseGPT uses multi-modal LLMs to detect & fix noisy image labels by identifying probability curvature differences between clean and noisy examples.

Noise Contrastive Alignment of Language Models with Explicit Rewards

26 September 2024·2166 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

This paper introduces InfoNCA and NCA, novel frameworks for language model alignment using noise contrastive estimation, enabling direct optimization from both explicit rewards and pairwise preference…

No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices

26 September 2024·3353 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

LLM watermarking faces inherent trade-offs; this paper reveals simple attacks exploiting common design choices, proposing guidelines and defenses for more secure systems.

Neuro-Symbolic Data Generation for Math Reasoning

26 September 2024·1986 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Nanjing University

Neuro-symbolic framework generates high-quality mathematical datasets, enhancing LLMs’ mathematical reasoning capabilities and surpassing state-of-the-art counterparts.

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

26 September 2024·2207 words·11 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Cohere

NEST, a novel semi-parametric language model, significantly boosts LLM generation quality, provides accurate source attribution, and achieves a 1.8x speedup in inference time by cleverly incorporating…

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

26 September 2024·3033 words·15 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Georgia Tech

Researchers discover ‘safety basins’ in LLMs, proposing a new metric (VISAGE) to quantify finetuning risks and visualize how these basins protect against safety compromise during model training.

Navigating Extremes: Dynamic Sparsity in Large Output Spaces

26 September 2024·2090 words·10 mins· loading · loading

Natural Language Processing Text Classification 🏢 Department of Computer Science, Aalto University

SPARTEX achieves memory-efficient extreme multi-label classification by integrating dynamic sparse training with an auxiliary loss function, enabling end-to-end training with millions of labels on com…

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encoding

26 September 2024·2545 words·12 mins· loading · loading

Natural Language Processing Information Retrieval 🏢 Google Research

MUVERA: Revolutionizing multi-vector retrieval with single-vector speed and accuracy!

MutaPLM: Protein Language Modeling for Mutation Explanation and Engineering

26 September 2024·2665 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

MutaPLM: a novel protein language model, provides human-understandable mutation explanations and designs novel mutations with desirable properties using a unique protein delta network and chain-of-tho…

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

26 September 2024·1726 words·9 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Cornell University

This paper introduces an efficient multivariate stochastic dominance test using optimal transport, enabling robust model benchmarking by considering metric dependencies.

Multi-turn Reinforcement Learning with Preference Human Feedback

26 September 2024·1515 words·8 mins· loading · loading

Natural Language Processing Dialogue Systems 🏢 Google Research

Multi-turn RLHF surpasses single-turn methods by aligning LLMs with human preferences across entire conversations, not just individual turns. A novel mirror-descent algorithm, MTPO, is introduced, pr…

Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

26 September 2024·3055 words·15 mins· loading · loading

AI Generated Natural Language Processing Vision-Language Models 🏢 Department of Computer Science, Purdue University

D-LISA: Dynamic modules & language-informed spatial attention revolutionizes multi-object 3D grounding, surpassing state-of-the-art accuracy by 12.8%.

Multi-LLM Debate: Framework, Principals, and Interventions

26 September 2024·1604 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ByteDance Research

Boosting LLM collaboration, this research introduces a novel theoretical framework for multi-LLM debate, revealing key principles like the effect of similar models and interventions to enhance accurac…

Multi-language Diversity Benefits Autoformalization

26 September 2024·1698 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Cambridge

Researchers created MMA, a large multilingual dataset of informal-formal mathematical pairs, leveraging a language model for reverse translation. Fine-tuned models achieved significantly improved aut…

Multi-Head Mixture-of-Experts

26 September 2024·2844 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

Multi-Head Mixture-of-Experts (MH-MoE) drastically boosts large language model efficiency by activating almost all expert networks, achieving superior performance compared to existing Sparse Mixture-o…

MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs

26 September 2024·4032 words·19 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

MR-Ben: A new benchmark reveals LLMs’ meta-reasoning flaws, pushing the boundaries of AI evaluation beyond simple accuracy.