Large Language Models

Training Large Language Models to Reason in a Continuous Latent Space

9 December 2024·2859 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI

LLMs are trained to reason using language, but COCONUT lets them reason directly in a continuous latent space, boosting performance on logical tasks requiring complex planning.

Fully Open Source Moxin-7B Technical Report

8 December 2024·334 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Northeastern University

Moxin-LLM: A fully open-source 7B parameter LLM achieving superior zero-shot performance, promoting transparency and reproducibility in AI research.

EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

6 December 2024·5961 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 LG AI Research

LG AI Research unveils EXAONE 3.5, a series of instruction-tuned language models (2.4B, 7.8B, and 32B parameters) excelling in real-world tasks, long-context understanding, and general benchmarks.

Evaluating and Aligning CodeLLMs on Human Preference

6 December 2024·3535 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group

CodeArena, a novel benchmark, evaluates code LLMs based on human preferences, revealing performance gaps between open-source and proprietary models, and a large-scale synthetic instruction corpus impr…

Monet: Mixture of Monosemantic Experts for Transformers

5 December 2024·5131 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Korea University

MONET improves Transformer interpretability by using Mixture-of-Experts (MoE) with 262K monosemantic experts per layer, achieving parameter efficiency and enabling knowledge manipulation without perfo…

Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement

5 December 2024·7239 words·34 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba International Digital Commerce

Marco-LLM: A groundbreaking multilingual LLM significantly enhances cross-lingual capabilities via massive multilingual training, bridging the performance gap between high- and low-resource languages.

Densing Law of LLMs

5 December 2024·1976 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

LLMs’ training quality is exponentially improving, enabling models with half the parameters to match state-of-the-art performance every 3 months, thus reducing inference costs.

Weighted-Reward Preference Optimization for Implicit Model Fusion

4 December 2024·4595 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 School of Computer Science and Engineering, Sun Yat-Sen University

WRPO: Implicitly fuse LLMs, boosting performance without complex alignment or merging!

Robust Multi-bit Text Watermark with LLM-based Paraphrasers

4 December 2024·3046 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ByteDance Research

Researchers developed a robust multi-bit text watermarking method using LLMs for paraphrasing, achieving over 99.99% detection accuracy while maintaining semantic information and resisting common atta…

Evaluating Language Models as Synthetic Data Generators

4 December 2024·4403 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

AGORABENCH: A new benchmark reveals surprising strengths & weaknesses of LMs as synthetic data generators, showing that problem-solving ability isn’t the sole indicator of data quality.

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

3 December 2024·4800 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Peking University

Imperfect OCR hinders Retrieval-Augmented Generation (RAG). OHRBench, a new benchmark, reveals this cascading impact, showing current OCR solutions insufficient for high-quality RAG knowledge bases. …

Free Process Rewards without Process Labels

2 December 2024·3126 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Train high-performing Process Reward Models (PRMs) cheaply using only outcome-level labels, eliminating the need for costly step-by-step annotations!

o1-Coder: an o1 Replication for Coding

29 November 2024·1672 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing Jiaotong University

O1-CODER replicates OpenAI’s o1 model for coding, integrating reinforcement learning and Monte Carlo Tree Search to enhance System-2 thinking and generate high-quality code with reasoning steps.

KV Shifting Attention Enhances Language Modeling

29 November 2024·5293 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Baichuan Inc.

KV Shifting Attention: A novel attention mechanism significantly enhances language modeling by simplifying induction heads, leading to improved performance and faster convergence, even in large-scale …

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

29 November 2024·7526 words·36 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 EPFL

New multilingual LLM benchmark, INCLUDE, tackles regional knowledge gaps by using 197K QA pairs from 44 languages, improving cross-lingual evaluation.

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

29 November 2024·2134 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Boosting LLMs’ reasoning: A novel token-level contrastive estimation method automatically identifies and penalizes critical tokens leading to errors, significantly enhancing reasoning accuracy.

A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models

29 November 2024·1730 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group

Boost LLM accuracy exponentially by using a two-stage algorithm with provable scaling laws: generate multiple candidate solutions then compare them in a knockout tournament!

A dynamic parallel method for performance optimization on hybrid CPUs

29 November 2024·1564 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Intel Corporation

Dynamic parallel processing boosts LLM inference speed on hybrid CPUs by over 90% memory bandwidth, resolving performance bottlenecks caused by imbalanced hardware capabilities.

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

28 November 2024·4724 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA

Puzzle: a novel framework accelerates large language model inference by using neural architecture search and knowledge distillation, achieving a 2.17x speedup on a single GPU while preserving 98.4% ac…

Training and Evaluating Language Models with Template-based Data Generation

27 November 2024·415 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Researchers created TemplateGSM, a massive dataset of 7M+ grade-school math problems and solutions, using GPT-4 to generate templates, significantly advancing LLM training for mathematical reasoning.