Natural Language Processing
Training Large Language Models to Reason in a Continuous Latent Space
·2859 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Meta AI
LLMs are trained to reason using language, but COCONUT lets them reason directly in a continuous latent space, boosting performance on logical tasks requiring complex planning.
Fully Open Source Moxin-7B Technical Report
·334 words·2 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Northeastern University
Moxin-LLM: A fully open-source 7B parameter LLM achieving superior zero-shot performance, promoting transparency and reproducibility in AI research.
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
·5961 words·28 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ LG AI Research
LG AI Research unveils EXAONE 3.5, a series of instruction-tuned language models (2.4B, 7.8B, and 32B parameters) excelling in real-world tasks, long-context understanding, and general benchmarks.
Evaluating and Aligning CodeLLMs on Human Preference
·3535 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Alibaba Group
CodeArena, a novel benchmark, evaluates code LLMs based on human preferences, revealing performance gaps between open-source and proprietary models, and a large-scale synthetic instruction corpus impr…
DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling
·2286 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Dialogue Systems
π’ School of Artificial Intelligence, University of Chinese Academy of Sciences
DEMO benchmark revolutionizes dialogue modeling by focusing on fine-grained elements (Prelude, Interlocution, Epilogue), enabling comprehensive evaluation and superior agent performance.
Monet: Mixture of Monosemantic Experts for Transformers
·5131 words·25 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Korea University
MONET improves Transformer interpretability by using Mixture-of-Experts (MoE) with 262K monosemantic experts per layer, achieving parameter efficiency and enabling knowledge manipulation without perfo…
Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement
·7239 words·34 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Alibaba International Digital Commerce
Marco-LLM: A groundbreaking multilingual LLM significantly enhances cross-lingual capabilities via massive multilingual training, bridging the performance gap between high- and low-resource languages.
Densing Law of LLMs
·1976 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tsinghua University
LLMs’ training quality is exponentially improving, enabling models with half the parameters to match state-of-the-art performance every 3 months, thus reducing inference costs.
Weighted-Reward Preference Optimization for Implicit Model Fusion
·4595 words·22 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ School of Computer Science and Engineering, Sun Yat-Sen University
WRPO: Implicitly fuse LLMs, boosting performance without complex alignment or merging!
Robust Multi-bit Text Watermark with LLM-based Paraphrasers
·3046 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ ByteDance Research
Researchers developed a robust multi-bit text watermarking method using LLMs for paraphrasing, achieving over 99.99% detection accuracy while maintaining semantic information and resisting common atta…
Evaluating Language Models as Synthetic Data Generators
·4403 words·21 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Carnegie Mellon University
AGORABENCH: A new benchmark reveals surprising strengths & weaknesses of LMs as synthetic data generators, showing that problem-solving ability isn’t the sole indicator of data quality.
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
·4800 words·23 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Peking University
Imperfect OCR hinders Retrieval-Augmented Generation (RAG). OHRBench, a new benchmark, reveals this cascading impact, showing current OCR solutions insufficient for high-quality RAG knowledge bases. …
Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning
·1712 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Text Classification
π’ Telecom SudParis
Few-shot learning empowers cross-lingual audio abuse detection using pre-trained models, achieving high accuracy in low-resource Indian languages.
Free Process Rewards without Process Labels
·3126 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tsinghua University
Train high-performing Process Reward Models (PRMs) cheaply using only outcome-level labels, eliminating the need for costly step-by-step annotations!
o1-Coder: an o1 Replication for Coding
·1672 words·8 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Beijing Jiaotong University
O1-CODER replicates OpenAI’s o1 model for coding, integrating reinforcement learning and Monte Carlo Tree Search to enhance System-2 thinking and generate high-quality code with reasoning steps.
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification
·2350 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Text Classification
π’ JoΕΎef Stefan Institute
Researchers developed a multilingual news topic classifier using a teacher-student framework and GPT-40 for automatic data annotation, achieving high performance without manual annotation.
KV Shifting Attention Enhances Language Modeling
·5293 words·25 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Baichuan Inc.
KV Shifting Attention: A novel attention mechanism significantly enhances language modeling by simplifying induction heads, leading to improved performance and faster convergence, even in large-scale …
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
·7526 words·36 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ EPFL
New multilingual LLM benchmark, INCLUDE, tackles regional knowledge gaps by using 197K QA pairs from 44 languages, improving cross-lingual evaluation.
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
·2134 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tencent AI Lab
Boosting LLMs’ reasoning: A novel token-level contrastive estimation method automatically identifies and penalizes critical tokens leading to errors, significantly enhancing reasoning accuracy.
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models
·1730 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Alibaba Group
Boost LLM accuracy exponentially by using a two-stage algorithm with provable scaling laws: generate multiple candidate solutions then compare them in a knockout tournament!