Skip to main content

Natural Language Processing

On Affine Homotopy between Language Encoders
·2070 words·10 mins· loading · loading
AI Generated Natural Language Processing Representation Learning 🏢 ETH Zurich
This paper introduces a novel notion of intrinsic similarity between language encoders, based on affine homotopy, and demonstrates its strong correlation with extrinsic similarity (downstream task per…
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step
·2170 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 MIT
OccamLLM: LLMs now perform accurate arithmetic in a single step!
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
·2502 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Cerebras Systems
By cleverly integrating per-example gradient norm calculations during the backward pass of LayerNorm layers, this research enables efficient and accurate gradient noise scale estimation in Transformer…
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
·2513 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Rice University
NoMAD-Attention achieves up to 2x speedup in 4-bit quantized LLaMA inference on CPUs by replacing computationally expensive multiply-add operations with ultra-low-latency in-register lookups.
NoiseGPT: Label Noise Detection and Rectification through Probability Curvature
·2389 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Beijing Institute of Technology
NoiseGPT uses multi-modal LLMs to detect & fix noisy image labels by identifying probability curvature differences between clean and noisy examples.
Noise Contrastive Alignment of Language Models with Explicit Rewards
·2166 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
This paper introduces InfoNCA and NCA, novel frameworks for language model alignment using noise contrastive estimation, enabling direct optimization from both explicit rewards and pairwise preference…
No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices
·3353 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
LLM watermarking faces inherent trade-offs; this paper reveals simple attacks exploiting common design choices, proposing guidelines and defenses for more secure systems.
Neuro-Symbolic Data Generation for Math Reasoning
·1986 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Nanjing University
Neuro-symbolic framework generates high-quality mathematical datasets, enhancing LLMs’ mathematical reasoning capabilities and surpassing state-of-the-art counterparts.
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
·2207 words·11 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Cohere
NEST, a novel semi-parametric language model, significantly boosts LLM generation quality, provides accurate source attribution, and achieves a 1.8x speedup in inference time by cleverly incorporating…
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
·3033 words·15 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Georgia Tech
Researchers discover ‘safety basins’ in LLMs, proposing a new metric (VISAGE) to quantify finetuning risks and visualize how these basins protect against safety compromise during model training.
Navigating Extremes: Dynamic Sparsity in Large Output Spaces
·2090 words·10 mins· loading · loading
Natural Language Processing Text Classification 🏢 Department of Computer Science, Aalto University
SPARTEX achieves memory-efficient extreme multi-label classification by integrating dynamic sparse training with an auxiliary loss function, enabling end-to-end training with millions of labels on com…
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encoding
·2545 words·12 mins· loading · loading
Natural Language Processing Information Retrieval 🏢 Google Research
MUVERA: Revolutionizing multi-vector retrieval with single-vector speed and accuracy!
MutaPLM: Protein Language Modeling for Mutation Explanation and Engineering
·2665 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
MutaPLM: a novel protein language model, provides human-understandable mutation explanations and designs novel mutations with desirable properties using a unique protein delta network and chain-of-tho…
Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking
·1726 words·9 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Cornell University
This paper introduces an efficient multivariate stochastic dominance test using optimal transport, enabling robust model benchmarking by considering metric dependencies.
Multi-turn Reinforcement Learning with Preference Human Feedback
·1515 words·8 mins· loading · loading
Natural Language Processing Dialogue Systems 🏢 Google Research
Multi-turn RLHF surpasses single-turn methods by aligning LLMs with human preferences across entire conversations, not just individual turns. A novel mirror-descent algorithm, MTPO, is introduced, pr…
Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention
·3055 words·15 mins· loading · loading
AI Generated Natural Language Processing Vision-Language Models 🏢 Department of Computer Science, Purdue University
D-LISA: Dynamic modules & language-informed spatial attention revolutionizes multi-object 3D grounding, surpassing state-of-the-art accuracy by 12.8%.
Multi-LLM Debate: Framework, Principals, and Interventions
·1604 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 ByteDance Research
Boosting LLM collaboration, this research introduces a novel theoretical framework for multi-LLM debate, revealing key principles like the effect of similar models and interventions to enhance accurac…
Multi-language Diversity Benefits Autoformalization
·1698 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Cambridge
Researchers created MMA, a large multilingual dataset of informal-formal mathematical pairs, leveraging a language model for reverse translation. Fine-tuned models achieved significantly improved aut…
Multi-Head Mixture-of-Experts
·2844 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Microsoft Research
Multi-Head Mixture-of-Experts (MH-MoE) drastically boosts large language model efficiency by activating almost all expert networks, achieving superior performance compared to existing Sparse Mixture-o…
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs
·4032 words·19 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong
MR-Ben: A new benchmark reveals LLMs’ meta-reasoning flaws, pushing the boundaries of AI evaluation beyond simple accuracy.