Skip to main content

Large Language Models

Granite Guardian
·4191 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research
Granite Guardian: Open-source risk detection models for LLMs, surpassing existing models in accuracy and offering comprehensive coverage across multiple risk dimensions, promoting safer AI.
Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation
·1928 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Pisa
Contextualized AI counterspeech significantly outperforms generic methods by adapting to the moderation context and user, improving persuasiveness without sacrificing other qualities.
Training Large Language Models to Reason in a Continuous Latent Space
·2859 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI
LLMs are trained to reason using language, but COCONUT lets them reason directly in a continuous latent space, boosting performance on logical tasks requiring complex planning.
Fully Open Source Moxin-7B Technical Report
·334 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Northeastern University
Moxin-LLM: A fully open-source 7B parameter LLM achieving superior zero-shot performance, promoting transparency and reproducibility in AI research.
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
·5961 words·28 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 LG AI Research
LG AI Research unveils EXAONE 3.5, a series of instruction-tuned language models (2.4B, 7.8B, and 32B parameters) excelling in real-world tasks, long-context understanding, and general benchmarks.
Evaluating and Aligning CodeLLMs on Human Preference
·3535 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group
CodeArena, a novel benchmark, evaluates code LLMs based on human preferences, revealing performance gaps between open-source and proprietary models, and a large-scale synthetic instruction corpus impr…
Monet: Mixture of Monosemantic Experts for Transformers
·5131 words·25 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Korea University
MONET improves Transformer interpretability by using Mixture-of-Experts (MoE) with 262K monosemantic experts per layer, achieving parameter efficiency and enabling knowledge manipulation without perfo…
Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement
·7239 words·34 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba International Digital Commerce
Marco-LLM: A groundbreaking multilingual LLM significantly enhances cross-lingual capabilities via massive multilingual training, bridging the performance gap between high- and low-resource languages.
Densing Law of LLMs
·1976 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
LLMs’ training quality is exponentially improving, enabling models with half the parameters to match state-of-the-art performance every 3 months, thus reducing inference costs.
Weighted-Reward Preference Optimization for Implicit Model Fusion
·4595 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 School of Computer Science and Engineering, Sun Yat-Sen University
WRPO: Implicitly fuse LLMs, boosting performance without complex alignment or merging!
Robust Multi-bit Text Watermark with LLM-based Paraphrasers
·3046 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ByteDance Research
Researchers developed a robust multi-bit text watermarking method using LLMs for paraphrasing, achieving over 99.99% detection accuracy while maintaining semantic information and resisting common atta…
Evaluating Language Models as Synthetic Data Generators
·4403 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
AGORABENCH: A new benchmark reveals surprising strengths & weaknesses of LMs as synthetic data generators, showing that problem-solving ability isn’t the sole indicator of data quality.
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
·4800 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Peking University
Imperfect OCR hinders Retrieval-Augmented Generation (RAG). OHRBench, a new benchmark, reveals this cascading impact, showing current OCR solutions insufficient for high-quality RAG knowledge bases. …
Free Process Rewards without Process Labels
·3126 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Train high-performing Process Reward Models (PRMs) cheaply using only outcome-level labels, eliminating the need for costly step-by-step annotations!
o1-Coder: an o1 Replication for Coding
·1672 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing Jiaotong University
O1-CODER replicates OpenAI’s o1 model for coding, integrating reinforcement learning and Monte Carlo Tree Search to enhance System-2 thinking and generate high-quality code with reasoning steps.
KV Shifting Attention Enhances Language Modeling
·5293 words·25 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Baichuan Inc.
KV Shifting Attention: A novel attention mechanism significantly enhances language modeling by simplifying induction heads, leading to improved performance and faster convergence, even in large-scale …
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
·7526 words·36 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 EPFL
New multilingual LLM benchmark, INCLUDE, tackles regional knowledge gaps by using 197K QA pairs from 44 languages, improving cross-lingual evaluation.
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
·2134 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab
Boosting LLMs’ reasoning: A novel token-level contrastive estimation method automatically identifies and penalizes critical tokens leading to errors, significantly enhancing reasoning accuracy.
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models
·1730 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group
Boost LLM accuracy exponentially by using a two-stage algorithm with provable scaling laws: generate multiple candidate solutions then compare them in a knockout tournament!
A dynamic parallel method for performance optimization on hybrid CPUs
·1564 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Intel Corporation
Dynamic parallel processing boosts LLM inference speed on hybrid CPUs by over 90% memory bandwidth, resolving performance bottlenecks caused by imbalanced hardware capabilities.