Large Language Models
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
·2069 words·10 mins·
loading
·
loading
Large Language Models
🏢 University of Pennsylvania
One-shot dualization aligns large language models with safety constraints efficiently, eliminating iterative primal-dual methods for improved stability and reduced computational burden.
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
·3294 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Show Lab, National University of Singapore
VideoLISA: A video-based multimodal large language model enabling precise, language-instructed video object segmentation with superior performance.
Once Read is Enough: Domain-specific Pretraining-free Language Models with Cluster-guided Sparse Experts for Long-tail Domain Knowledge
·2658 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Oxford
This research introduces Cluster-guided Sparse Experts (CSE), enabling pretrained language models to effectively learn long-tail domain knowledge without domain-specific pretraining, thus achieving su…
On the Worst Prompt Performance of Large Language Models
·2797 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
LLMs’ performance drastically varies depending on prompt phrasing; this paper introduces ROBUSTAL-PACAEVAL to evaluate lower-bound performance via worst-case prompt analysis, revealing model inconsist…
On the Power of Decision Trees in Auto-Regressive Language Modeling
·2176 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Massachusetts Institute of Technology
Auto-Regressive Decision Trees (ARDTs) surprisingly outperform Transformers on language tasks!
On the Inductive Bias of Stacking Towards Improving Reasoning
·2018 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Google Research
MIDAS: A novel training method improves language model reasoning by efficiently stacking middle layers, surprisingly boosting downstream task performance without increasing pretraining perplexity.
On Softmax Direct Preference Optimization for Recommendation
·1530 words·8 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 National University of Singapore
Softmax-DPO boosts LM-based recommender performance by directly optimizing for personalized ranking using a novel loss function that incorporates multiple negative samples, significantly outperforming…
On scalable oversight with weak LLMs judging strong LLMs
·5158 words·25 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Google DeepMind
Weak LLMs can accurately supervise strong LLMs via debate, outperforming simpler consultancy methods, especially in information-asymmetric tasks.
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
·2220 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Huazhong University of Science and Technology
Effortlessly boost large language model performance by dynamically fusing knowledge from smaller, task-specific models – achieving near full fine-tuning results with minimal computational cost!
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step
·2170 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 MIT
OccamLLM: LLMs now perform accurate arithmetic in a single step!
Observational Scaling Laws and the Predictability of Langauge Model Performance
·4816 words·23 mins·
loading
·
loading
Large Language Models
🏢 University of Toronto
Researchers predict language model performance by observing existing models, bypassing costly training, revealing surprising predictability in complex scaling phenomena.
Not All Tokens Are What You Need for Pretraining
·2178 words·11 mins·
loading
·
loading
Large Language Models
🏢 Tsinghua University
RHO-1, a novel language model, uses selective pretraining focusing on high-value tokens, achieving state-of-the-art results with significantly less data than existing models.
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
·2502 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Cerebras Systems
By cleverly integrating per-example gradient norm calculations during the backward pass of LayerNorm layers, this research enables efficient and accurate gradient noise scale estimation in Transformer…
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
·2513 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Rice University
NoMAD-Attention achieves up to 2x speedup in 4-bit quantized LLaMA inference on CPUs by replacing computationally expensive multiply-add operations with ultra-low-latency in-register lookups.
NoiseGPT: Label Noise Detection and Rectification through Probability Curvature
·2389 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Beijing Institute of Technology
NoiseGPT uses multi-modal LLMs to detect & fix noisy image labels by identifying probability curvature differences between clean and noisy examples.
Noise Contrastive Alignment of Language Models with Explicit Rewards
·2166 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Tsinghua University
This paper introduces InfoNCA and NCA, novel frameworks for language model alignment using noise contrastive estimation, enabling direct optimization from both explicit rewards and pairwise preference…
No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices
·3353 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
LLM watermarking faces inherent trade-offs; this paper reveals simple attacks exploiting common design choices, proposing guidelines and defenses for more secure systems.
Neuro-Symbolic Data Generation for Math Reasoning
·1986 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Nanjing University
Neuro-symbolic framework generates high-quality mathematical datasets, enhancing LLMs’ mathematical reasoning capabilities and surpassing state-of-the-art counterparts.
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
·2207 words·11 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Cohere
NEST, a novel semi-parametric language model, significantly boosts LLM generation quality, provides accurate source attribution, and achieves a 1.8x speedup in inference time by cleverly incorporating…
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
·3033 words·15 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Georgia Tech
Researchers discover ‘safety basins’ in LLMs, proposing a new metric (VISAGE) to quantify finetuning risks and visualize how these basins protect against safety compromise during model training.