Skip to main content

Natural Language Processing

Smaller Language Models Are Better Instruction Evolvers
·5507 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications
Smaller is better: SLMs outperform LLMs in evolving complex & diverse instructions for AI training.
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
·5380 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Corporation
New benchmark for evaluating long-context models finds sub-O(n) methods lacking in real-world use cases.
Byte Latent Transformer: Patches Scale Better Than Tokens
·4848 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington
BLT: tokenizer-free LLM for efficiency and robustness
Word Sense Linking: Disambiguating Outside the Sandbox
·2984 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Word Sense Disambiguation 🏢 Sapienza University of Rome
Word Sense Linking (WSL) revolutionizes word sense disambiguation by tackling its real-world limitations. It combines span identification and sense linking in plain text, offering better integration …
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective
·1893 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 National Library of Norway
Norwegians show that using copyrighted material improves LLMs, but raises legal and ethical issues.
Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages
·1855 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Machine Translation 🏢 Indian Institute of Technology Madras
Shiksha: A new multilingual translation dataset and model surpasses existing benchmarks for Indian languages, focusing on scientific, technical, and educational domains.
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
·3495 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Santa Barbara
RULEARENA, a new benchmark, rigorously evaluates large language models’ ability to apply complex, real-world rules across diverse scenarios, revealing significant shortcomings in current LLMs’ rule-gu…
Phi-4 Technical Report
·2630 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Research
Phi-4: a 14B parameter LLM surpassing its teacher model (GPT-4) in STEM-focused QA through innovative synthetic data generation and post-training techniques.
JuStRank: Benchmarking LLM Judges for System Ranking
·13985 words·66 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research
JuStRank: LLM system ranker benchmark reveals critical judge qualities (decisiveness, bias) impacting ranking accuracy, highlighting instance-level performance doesn’t guarantee accurate system-level…
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs
·2774 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Saudi Data & Artificial Intelligence Authority
Fine-tuning small language models? Tweak the learning rate and batch size for a reasoning boost!
Granite Guardian
·4191 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research
Granite Guardian: Open-source risk detection models for LLMs, surpassing existing models in accuracy and offering comprehensive coverage across multiple risk dimensions, promoting safer AI.
Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation
·1928 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Pisa
Contextualized AI counterspeech significantly outperforms generic methods by adapting to the moderation context and user, improving persuasiveness without sacrificing other qualities.
Training Large Language Models to Reason in a Continuous Latent Space
·2859 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI
LLMs are trained to reason using language, but COCONUT lets them reason directly in a continuous latent space, boosting performance on logical tasks requiring complex planning.
Fully Open Source Moxin-7B Technical Report
·334 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Northeastern University
Moxin-LLM: A fully open-source 7B parameter LLM achieving superior zero-shot performance, promoting transparency and reproducibility in AI research.
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
·5961 words·28 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 LG AI Research
LG AI Research unveils EXAONE 3.5, a series of instruction-tuned language models (2.4B, 7.8B, and 32B parameters) excelling in real-world tasks, long-context understanding, and general benchmarks.
Evaluating and Aligning CodeLLMs on Human Preference
·3535 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group
CodeArena, a novel benchmark, evaluates code LLMs based on human preferences, revealing performance gaps between open-source and proprietary models, and a large-scale synthetic instruction corpus impr…
DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling
·2286 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Dialogue Systems 🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
DEMO benchmark revolutionizes dialogue modeling by focusing on fine-grained elements (Prelude, Interlocution, Epilogue), enabling comprehensive evaluation and superior agent performance.
Monet: Mixture of Monosemantic Experts for Transformers
·5131 words·25 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Korea University
MONET improves Transformer interpretability by using Mixture-of-Experts (MoE) with 262K monosemantic experts per layer, achieving parameter efficiency and enabling knowledge manipulation without perfo…
Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement
·7239 words·34 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba International Digital Commerce
Marco-LLM: A groundbreaking multilingual LLM significantly enhances cross-lingual capabilities via massive multilingual training, bridging the performance gap between high- and low-resource languages.
Densing Law of LLMs
·1976 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
LLMs’ training quality is exponentially improving, enabling models with half the parameters to match state-of-the-art performance every 3 months, thus reducing inference costs.