Large Language Models

Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding

27 November 2024·2920 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Self-VerIfication length Policy (SVIP) dynamically adjusts speculative decoding draft lengths based on token difficulty, achieving up to 20% faster large language model inference.

Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS

27 November 2024·2022 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

HiAR-ICL, a novel automated reasoning paradigm using Monte Carlo Tree Search, surpasses state-of-the-art accuracy in complex mathematical reasoning by shifting focus from specific examples to abstract…

Star Attention: Efficient LLM Inference over Long Sequences

26 November 2024·5535 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA

Star Attention: 11x faster LLM inference on long sequences with 95-100% accuracy!

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

26 November 2024·3397 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Low-bit quantization excels for undertrained LLMs but struggles with fully-trained ones; new scaling laws reveal this, directing future research.

Predicting Emergent Capabilities by Finetuning

25 November 2024·6002 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Berkeley

Predicting emergent LLM capabilities is now possible by finetuning smaller models; this approach shifts the emergence point, enabling accurate predictions of future model performance, even with up to …

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

25 November 2024·2809 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Generative AI Research Lab (GAIR)

Simple distillation from OpenAI’s API, combined with fine-tuning, surprisingly surpasses OpenAI’s O1-preview on complex mathematical reasoning, urging transparency in AI research.

MH-MoE:Multi-Head Mixture-of-Experts

25 November 2024·1899 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Research

MH-MoE: A novel implementation of Multi-Head Mixture-of-Experts achieves superior performance in large language models by enhancing efficiency without sacrificing model size or computational cost.

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

25 November 2024·1776 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Arizona State University

LLMs are revolutionizing AI evaluation by offering nuanced judgments surpassing traditional methods. This paper provides a taxonomy, benchmark, and future directions for LLM-as-a-judge.

From CISC to RISC: language-model guided assembly transpilation

25 November 2024·3290 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Mohamed Bin Zayed University of Artificial Intelligence

A novel LLM-based transpiler, CRT, efficiently converts x86 assembly to ARM and RISC-V assembly, achieving high accuracy and significant performance improvements over existing virtualization methods.

One to rule them all: natural language to bind communication, perception and action

22 November 2024·1627 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Milan

AI-powered robots now understand and execute complex natural language commands, adapting seamlessly to dynamic environments thanks to a new architecture integrating LLMs, perception, and planning.

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

22 November 2024·4779 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong Polytechnic University

MolReFlect achieves state-of-the-art molecule-text alignment by using a teacher-student LLM framework that generates fine-grained alignments, improving accuracy and explainability.

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

21 November 2024·2221 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Ajou University

UnifiedCrawl efficiently harvests massive monolingual datasets for low-resource languages from Common Crawl, enabling affordable LLM adaptation via QLoRA, significantly improving performance.

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

21 November 2024·2284 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba International Digital Commerce

Marco-01: a novel large reasoning model surpasses existing LLMs by using Chain-of-Thought, Monte Carlo Tree Search, and reflection mechanisms to excel in open-ended problem-solving, particularly in co…

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

21 November 2024·5261 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ETH Zurich

LLMs’ hallucinations stem from entity recognition: SAEs reveal model ‘self-knowledge’, causally affecting whether it hallucinates or refuses to answer. This mechanism is even repurposed by chat finet…

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

20 November 2024·4011 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 National University of Singapore

AnchorAttention enhances long-context LLMs by mitigating BFloat16’s disruptive effects on RoPE, improving performance and speeding up training.

Patience Is The Key to Large Language Model Reasoning

20 November 2024·477 words·3 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Boosting Large Language Model (LLM) reasoning without massive datasets: A novel training method encourages ‘patient’ reasoning, improving accuracy by up to 6.7% on benchmark tasks.

Hymba: A Hybrid-head Architecture for Small Language Models

20 November 2024·4219 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA

Hymba: Hybrid-head architecture boosts small language model performance by 11.67x cache size reduction and 3.49x throughput, surpassing existing models.

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

20 November 2024·2774 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University College London

BALROG benchmark rigorously evaluates LLMs’/VLMs’ abilities in complex games, revealing their strengths and weaknesses in long-term planning and decision-making, highlighting the need for improved vis…

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

20 November 2024·2311 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Government Technology Agency Singapore

New data-free methodology creates effective, generalizable LLMs guardrails against off-topic prompts, significantly improving LLM safety and responsible use.

Ultra-Sparse Memory Network

19 November 2024·5103 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ByteDance

UltraMem, a novel ultra-sparse memory network, drastically speeds up LLM inference by 6x compared to MoE while maintaining performance, paving the way for efficient large-scale model deployment.