Natural Language Processing
Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation
·2883 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Dialogue Systems
🏢 Seoul National University
Unified Spoken Dialog Model (USDM) directly generates coherent spoken responses with natural prosody, surpassing cascaded baselines and enhancing natural conversation in speech-enabled LLMs.
Panacea: Pareto Alignment via Preference Adaptation for LLMs
·2565 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Peking University
Panacea: a novel LLM alignment method achieving Pareto optimality via online preference adaptation using a single model.
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
·2356 words·12 mins·
loading
·
loading
Natural Language Processing
Named Entity Recognition
🏢 ByteDance
PaDeLLM-NER massively accelerates LLM-based NER inference by up to 10x, enabling near real-time performance without accuracy loss.
PaCE: Parsimonious Concept Engineering for Large Language Models
·2526 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Johns Hopkins University
PaCE, a novel activation engineering framework, efficiently aligns LLMs by removing undesirable concepts from activations using sparse coding, achieving state-of-the-art performance while preserving l…
Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation
·2892 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
Over-parameterized Distillation Framework (OPDF) boosts knowledge distillation by efficiently over-parameterizing student models via tensor decomposition, significantly improving performance without i…
Order-Independence Without Fine Tuning
·1791 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Harvard University
Set-Based Prompting guarantees order-independent LLM outputs by modifying input representations, eliminating unwanted inconsistencies without fine-tuning.
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
·1896 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Google Research
Orchid: a novel deep learning architecture using data-dependent convolution achieves quasilinear scalability and outperforms attention-based models on various sequence modeling tasks.
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning
·4495 words·22 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 KAIST
LLMs boost tabular data prediction by generating optimized features via decision tree reasoning, outperforming existing methods.
Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives
·2599 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 CISPA Helmholtz Center for Information Security
Open LLMs outperform closed alternatives for private data adaptation, offering superior privacy, performance, and lower costs.
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
·1619 words·8 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 University of Illinois Urbana-Champaign
This paper proposes a novel, reward-free RLHF framework using a general preference oracle, surpassing existing reward-based approaches in efficiency and generalizability.
Online Adaptation of Language Models with a Memory of Amortized Contexts
·2374 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 KAIST
MAC: Efficiently updates large language models (LLMs) using a memory of compressed contexts for improved real-time knowledge retention and adaptation.
OneBit: Towards Extremely Low-bit Large Language Models
·2001 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Research Center for Social Computing and Information Retrieval,Harbin Institute of Technology
OneBit achieves surprisingly good performance in 1-bit quantized LLMs by using a novel 1-bit parameter representation method and an effective parameter initialization method.
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
·3294 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Show Lab, National University of Singapore
VideoLISA: A video-based multimodal large language model enabling precise, language-instructed video object segmentation with superior performance.
Once Read is Enough: Domain-specific Pretraining-free Language Models with Cluster-guided Sparse Experts for Long-tail Domain Knowledge
·2658 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Oxford
This research introduces Cluster-guided Sparse Experts (CSE), enabling pretrained language models to effectively learn long-tail domain knowledge without domain-specific pretraining, thus achieving su…
On the Worst Prompt Performance of Large Language Models
·2797 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
LLMs’ performance drastically varies depending on prompt phrasing; this paper introduces ROBUSTAL-PACAEVAL to evaluate lower-bound performance via worst-case prompt analysis, revealing model inconsist…
On the Power of Decision Trees in Auto-Regressive Language Modeling
·2176 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Massachusetts Institute of Technology
Auto-Regressive Decision Trees (ARDTs) surprisingly outperform Transformers on language tasks!
On the Inductive Bias of Stacking Towards Improving Reasoning
·2018 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Google Research
MIDAS: A novel training method improves language model reasoning by efficiently stacking middle layers, surprisingly boosting downstream task performance without increasing pretraining perplexity.
On Softmax Direct Preference Optimization for Recommendation
·1530 words·8 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 National University of Singapore
Softmax-DPO boosts LM-based recommender performance by directly optimizing for personalized ranking using a novel loss function that incorporates multiple negative samples, significantly outperforming…
On scalable oversight with weak LLMs judging strong LLMs
·5158 words·25 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Google DeepMind
Weak LLMs can accurately supervise strong LLMs via debate, outperforming simpler consultancy methods, especially in information-asymmetric tasks.
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
·2220 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Huazhong University of Science and Technology
Effortlessly boost large language model performance by dynamically fusing knowledge from smaller, task-specific models – achieving near full fine-tuning results with minimal computational cost!