Natural Language Processing

YuLan-Mini: An Open Data-efficient Language Model

23 December 2024·4206 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Renmin University of China

YuLan-Mini: An open, data-efficient 2.42B parameter LLM achieving top-tier performance with innovative training techniques.

In Case You Missed It: ARC 'Challenge' Is Not That Challenging

23 December 2024·2565 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Snowflake AI Research

LLM evaluation on multiple-choice questions is flawed; considering all options simultaneously, not individually, reveals much higher accuracy and challenges existing benchmark rankings.

Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding

23 December 2024·2127 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Dialogue Systems 🏢 Peking University

Friends-MMC: A new dataset facilitates multi-modal multi-party conversation understanding by providing 24,000+ utterances with video, audio, and speaker annotations, enabling advancements in character…

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

23 December 2024·2203 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

FoPE enhances attention’s periodic extension for better length generalization in language models by addressing spectral damage in RoPE using Fourier Series and zeroing out destructive frequencies.

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

23 December 2024·402 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Machine Translation 🏢 Tencent AI Lab

DRT-01 leverages long chain-of-thought reasoning to significantly boost machine translation quality, particularly for complex sentences with metaphors and similes, achieving substantial improvements o…

Deliberation in Latent Space via Differentiable Cache Augmentation

23 December 2024·3569 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

Frozen LLMs get a performance boost by augmenting their key-value cache with latent embeddings generated by a differentiable offline coprocessor.

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

23 December 2024·2172 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

B-STAR dynamically balances exploration and exploitation in self-taught reasoners, achieving superior performance in mathematical, coding, and commonsense reasoning tasks.

A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression

23 December 2024·4375 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

This study reveals that gist token-based context compression in LLMs, while effective for some tasks, suffers from key failure patterns. The authors propose fine-grained autoencoding and segment-wise…

Revisiting In-Context Learning with Long Context Language Models

22 December 2024·4377 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

Long-context models surprisingly show that simple random sampling of examples is as effective as sophisticated methods for in-context learning, shifting the focus to efficient context utilization.

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

22 December 2024·2034 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing Jiaotong University

OpenRFT adapts generalist reasoning models for domain-specific tasks using reinforcement fine-tuning, overcoming data scarcity and lack of reasoning step data via question augmentation, synthesized re…

NILE: Internal Consistency Alignment in Large Language Models

21 December 2024·3034 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

NILE framework significantly boosts LLM performance by aligning instruction-tuning datasets with pre-trained internal knowledge, achieving up to 68.5% gains.

RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

19 December 2024·2508 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Peking University

ROBUSTFT tackles noisy data in LLM fine-tuning by using multi-expert noise detection and context-enhanced relabeling, significantly boosting model performance in noisy scenarios.

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

19 December 2024·5664 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

ReMoE: Revolutionizing Mixture-of-Experts with fully differentiable ReLU routing, achieving superior scalability and performance.

Outcome-Refining Process Supervision for Code Generation

19 December 2024·2838 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Peking University

Boosting code generation accuracy, Outcome-Refining Process Supervision (ORPS) uses execution feedback and structured reasoning to refine code, achieving significant improvements across models and dat…

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

19 December 2024·2482 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Research

MixLLM achieves state-of-the-art LLM compression by using mixed-precision quantization between output features, improving accuracy and system efficiency.

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

19 December 2024·11623 words·55 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 TU Darmstadt

M-ALERT, a new multilingual benchmark, reveals significant safety inconsistencies across languages in top LLMs.

How to Synthesize Text Data without Model Collapse?

19 December 2024·5702 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Token-level editing prevents language model collapse from synthetic data by theoretically bounding test error and empirically improving model performance.

Fietje: An open, efficient LLM for Dutch

19 December 2024·3094 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KU Leuven

Fietje: an open-source, efficient Dutch language model outperforming larger models.

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

19 December 2024·3123 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA Research

AceMath achieves state-of-the-art results in mathematical reasoning by introducing highly effective instruction-tuned models and reward models.

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

18 December 2024·2677 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

AI agents are tested in a simulated company, revealing their capability to automate tasks and shortcomings with complex workflows and interfaces.