Skip to main content

Large Language Models

Free Process Rewards without Process Labels
·3126 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Train high-performing Process Reward Models (PRMs) cheaply using only outcome-level labels, eliminating the need for costly step-by-step annotations!
o1-Coder: an o1 Replication for Coding
·1672 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing Jiaotong University
O1-CODER replicates OpenAI’s o1 model for coding, integrating reinforcement learning and Monte Carlo Tree Search to enhance System-2 thinking and generate high-quality code with reasoning steps.
KV Shifting Attention Enhances Language Modeling
·5293 words·25 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Baichuan Inc.
KV Shifting Attention: A novel attention mechanism significantly enhances language modeling by simplifying induction heads, leading to improved performance and faster convergence, even in large-scale …
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
·7526 words·36 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 EPFL
New multilingual LLM benchmark, INCLUDE, tackles regional knowledge gaps by using 197K QA pairs from 44 languages, improving cross-lingual evaluation.
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
·2134 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab
Boosting LLMs’ reasoning: A novel token-level contrastive estimation method automatically identifies and penalizes critical tokens leading to errors, significantly enhancing reasoning accuracy.
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models
·1730 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group
Boost LLM accuracy exponentially by using a two-stage algorithm with provable scaling laws: generate multiple candidate solutions then compare them in a knockout tournament!
A dynamic parallel method for performance optimization on hybrid CPUs
·1564 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Intel Corporation
Dynamic parallel processing boosts LLM inference speed on hybrid CPUs by over 90% memory bandwidth, resolving performance bottlenecks caused by imbalanced hardware capabilities.
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
·4724 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA
Puzzle: a novel framework accelerates large language model inference by using neural architecture search and knowledge distillation, achieving a 2.17x speedup on a single GPU while preserving 98.4% ac…
Training and Evaluating Language Models with Template-based Data Generation
·415 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Researchers created TemplateGSM, a massive dataset of 7M+ grade-school math problems and solutions, using GPT-4 to generate templates, significantly advancing LLM training for mathematical reasoning.
Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding
·2920 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab
Self-VerIfication length Policy (SVIP) dynamically adjusts speculative decoding draft lengths based on token difficulty, achieving up to 20% faster large language model inference.
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS
·2022 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
HiAR-ICL, a novel automated reasoning paradigm using Monte Carlo Tree Search, surpasses state-of-the-art accuracy in complex mathematical reasoning by shifting focus from specific examples to abstract…
Star Attention: Efficient LLM Inference over Long Sequences
·5535 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA
Star Attention: 11x faster LLM inference on long sequences with 95-100% accuracy!
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
·3397 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab
Low-bit quantization excels for undertrained LLMs but struggles with fully-trained ones; new scaling laws reveal this, directing future research.
Predicting Emergent Capabilities by Finetuning
·6002 words·29 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Berkeley
Predicting emergent LLM capabilities is now possible by finetuning smaller models; this approach shifts the emergence point, enabling accurate predictions of future model performance, even with up to …
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
·2809 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Generative AI Research Lab (GAIR)
Simple distillation from OpenAI’s API, combined with fine-tuning, surprisingly surpasses OpenAI’s O1-preview on complex mathematical reasoning, urging transparency in AI research.
MH-MoE:Multi-Head Mixture-of-Experts
·1899 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Research
MH-MoE: A novel implementation of Multi-Head Mixture-of-Experts achieves superior performance in large language models by enhancing efficiency without sacrificing model size or computational cost.
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
·1776 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Arizona State University
LLMs are revolutionizing AI evaluation by offering nuanced judgments surpassing traditional methods. This paper provides a taxonomy, benchmark, and future directions for LLM-as-a-judge.
From CISC to RISC: language-model guided assembly transpilation
·3290 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Mohamed Bin Zayed University of Artificial Intelligence
A novel LLM-based transpiler, CRT, efficiently converts x86 assembly to ARM and RISC-V assembly, achieving high accuracy and significant performance improvements over existing virtualization methods.
One to rule them all: natural language to bind communication, perception and action
·1627 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Milan
AI-powered robots now understand and execute complex natural language commands, adapting seamlessly to dynamic environments thanks to a new architecture integrating LLMs, perception, and planning.
MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts
·4779 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong Polytechnic University
MolReFlect achieves state-of-the-art molecule-text alignment by using a teacher-student LLM framework that generates fine-grained alignments, improving accuracy and explainability.