Natural Language Processing
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
·2790 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 National University of Singapore
MomentumSMoE boosts Sparse Mixture of Experts’ (SMoE) performance by integrating momentum, resulting in more stable training and robust models.
MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability
·2311 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Harbin Institute of Technology
MoGU: A framework dynamically balances safety and usability in LLMs by routing benign and malicious instructions to different LLM variants, leading to safer, more useful responses.
MoEUT: Mixture-of-Experts Universal Transformers
·2486 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Stanford University
MoEUT: Mixture-of-Experts Universal Transformers significantly improves the compute efficiency of Universal Transformers, making them competitive with standard Transformers in large-scale language mod…
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
·2786 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Shanghai Artificial Intelligence Laboratory
MxDNA: Model learns optimal DNA tokenization via gradient descent, outperforming existing methods.
Mobility-LLM: Learning Visiting Intentions and Travel Preference from Human Mobility Data with Large Language Models
·2805 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Beijing Jiaotong University
Mobility-LLM leverages LLMs to analyze human mobility data from check-in sequences, significantly outperforming existing models in location prediction, user identification, and time prediction tasks.
Mixture of Tokens: Continuous MoE through Cross-Example Aggregation
·1989 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Warsaw
Mixture of Tokens (MoT) achieves 3x faster LLM training than dense Transformers and matches state-of-the-art MoE performance via continuous token mixing.
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
·1888 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Seoul National University
BinaryMoS: a novel token-adaptive binarization method that boosts LLM accuracy and efficiency by dynamically merging multiple scaling experts for each token.
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
·2372 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
MoICE, a novel plug-in, significantly enhances LLMs’ long context awareness by dynamically routing attention using multiple RoPE angles, achieving superior performance with high inference efficiency.
Mixture of Demonstrations for In-Context Learning
·1953 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Virginia
MoD, a novel Mixture of Demonstrations framework, enhances in-context learning by partitioning demonstration pools and employing expert-wise training, achieving state-of-the-art performance.
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
·3103 words·15 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 National University of Singapore
MixEval revolutionizes LLM benchmarking by blending real-world user queries with existing datasets, creating a cost-effective, unbiased, and dynamic evaluation method.
Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation
·1697 words·8 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 ByteDance Research
ADVPO, a novel method, tackles reward overoptimization in RLHF via a lightweight uncertainty quantification approach, resulting in enhanced LLM performance and alignment with human values.
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
·3037 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 ZIP Lab, Monash University
MiniCache: A novel approach to drastically reduce LLM KV cache memory footprint.
Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training
·2712 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 California Institute of Technology
MINI-SEQUENCE TRANSFORMER (MST) drastically reduces memory usage in LLM training by processing mini-sequences iteratively, enabling training with 12-24x longer sequences than conventional methods with…
MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages
·2639 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Shanghai Artificial Intelligence Laboratory
MindMerger efficiently boosts LLM reasoning in non-English languages by merging LLMs with external multilingual language understanding capabilities, achieving significant accuracy improvements, especi…
Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
·4302 words·21 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Microsoft Research
LLMs’ spatial reasoning abilities are boosted by visualizing their thought processes via ‘Visualization-of-Thought’ prompting, significantly improving performance on navigation and tiling tasks.
Microstructures and Accuracy of Graph Recall by Large Language Models
·2595 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Cornell University
LLMs struggle with graph recall, exhibiting biases like favoring triangles and underperforming compared to humans; advanced models show striking domain dependence.
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
·2050 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Institute of Science and Technology Austria (ISTA)
MICROADAM: A new Adam optimizer variant dramatically cuts memory usage for training large language models without compromising accuracy or provable convergence.
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
·2343 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Google DeepMind
LLMs gain math skills via prompt-guided skill labeling and exemplar selection, significantly boosting accuracy.
MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models
·3742 words·18 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 University of Manchester
MetaAligner: a novel, policy-agnostic, and generalizable method for efficiently aligning LLMs to multiple objectives, even unseen ones, achieving significant and balanced improvements while saving up …
Meta-Diffu$B$: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
·3164 words·15 mins·
loading
·
loading
AI Generated
Natural Language Processing
Text Generation
🏢 University of Washington
Meta-DiffuB enhances sequence-to-sequence text diffusion models by using meta-exploration to learn a contextualized noise schedule, resulting in state-of-the-art performance.