Large Language Models

One-Shot Safety Alignment for Large Language Models via Optimal Dualization

26 September 2024·2069 words·10 mins· loading · loading

Large Language Models 🏢 University of Pennsylvania

One-shot dualization aligns large language models with safety constraints efficiently, eliminating iterative primal-dual methods for improved stability and reduced computational burden.

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

26 September 2024·3294 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Show Lab, National University of Singapore

VideoLISA: A video-based multimodal large language model enabling precise, language-instructed video object segmentation with superior performance.

Once Read is Enough: Domain-specific Pretraining-free Language Models with Cluster-guided Sparse Experts for Long-tail Domain Knowledge

26 September 2024·2658 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Oxford

This research introduces Cluster-guided Sparse Experts (CSE), enabling pretrained language models to effectively learn long-tail domain knowledge without domain-specific pretraining, thus achieving su…

On the Worst Prompt Performance of Large Language Models

26 September 2024·2797 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tencent AI Lab

LLMs’ performance drastically varies depending on prompt phrasing; this paper introduces ROBUSTAL-PACAEVAL to evaluate lower-bound performance via worst-case prompt analysis, revealing model inconsist…

On the Power of Decision Trees in Auto-Regressive Language Modeling

26 September 2024·2176 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Massachusetts Institute of Technology

Auto-Regressive Decision Trees (ARDTs) surprisingly outperform Transformers on language tasks!

On the Inductive Bias of Stacking Towards Improving Reasoning

26 September 2024·2018 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google Research

MIDAS: A novel training method improves language model reasoning by efficiently stacking middle layers, surprisingly boosting downstream task performance without increasing pretraining perplexity.

On Softmax Direct Preference Optimization for Recommendation

26 September 2024·1530 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

Softmax-DPO boosts LM-based recommender performance by directly optimizing for personalized ranking using a novel loss function that incorporates multiple negative samples, significantly outperforming…

On scalable oversight with weak LLMs judging strong LLMs

26 September 2024·5158 words·25 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Google DeepMind

Weak LLMs can accurately supervise strong LLMs via debate, outperforming simpler consultancy methods, especially in information-asymmetric tasks.

On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion

26 September 2024·2220 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Huazhong University of Science and Technology

Effortlessly boost large language model performance by dynamically fusing knowledge from smaller, task-specific models – achieving near full fine-tuning results with minimal computational cost!

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

26 September 2024·2170 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MIT

OccamLLM: LLMs now perform accurate arithmetic in a single step!

Observational Scaling Laws and the Predictability of Langauge Model Performance

26 September 2024·4816 words·23 mins· loading · loading

Large Language Models 🏢 University of Toronto

Researchers predict language model performance by observing existing models, bypassing costly training, revealing surprising predictability in complex scaling phenomena.

Not All Tokens Are What You Need for Pretraining

26 September 2024·2178 words·11 mins· loading · loading

Large Language Models 🏢 Tsinghua University

RHO-1, a novel language model, uses selective pretraining focusing on high-value tokens, achieving state-of-the-art results with significantly less data than existing models.

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

26 September 2024·2502 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cerebras Systems

By cleverly integrating per-example gradient norm calculations during the backward pass of LayerNorm layers, this research enables efficient and accurate gradient noise scale estimation in Transformer…

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

26 September 2024·2513 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Rice University

NoMAD-Attention achieves up to 2x speedup in 4-bit quantized LLaMA inference on CPUs by replacing computationally expensive multiply-add operations with ultra-low-latency in-register lookups.

NoiseGPT: Label Noise Detection and Rectification through Probability Curvature

26 September 2024·2389 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Beijing Institute of Technology

NoiseGPT uses multi-modal LLMs to detect & fix noisy image labels by identifying probability curvature differences between clean and noisy examples.

Noise Contrastive Alignment of Language Models with Explicit Rewards

26 September 2024·2166 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

This paper introduces InfoNCA and NCA, novel frameworks for language model alignment using noise contrastive estimation, enabling direct optimization from both explicit rewards and pairwise preference…

No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices

26 September 2024·3353 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

LLM watermarking faces inherent trade-offs; this paper reveals simple attacks exploiting common design choices, proposing guidelines and defenses for more secure systems.

Neuro-Symbolic Data Generation for Math Reasoning

26 September 2024·1986 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Nanjing University

Neuro-symbolic framework generates high-quality mathematical datasets, enhancing LLMs’ mathematical reasoning capabilities and surpassing state-of-the-art counterparts.

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

26 September 2024·2207 words·11 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Cohere

NEST, a novel semi-parametric language model, significantly boosts LLM generation quality, provides accurate source attribution, and achieves a 1.8x speedup in inference time by cleverly incorporating…

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

26 September 2024·3033 words·15 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Georgia Tech

Researchers discover ‘safety basins’ in LLMs, proposing a new metric (VISAGE) to quantify finetuning risks and visualize how these basins protect against safety compromise during model training.