Natural Language Processing

Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation

26 September 2024·2883 words·14 mins· loading · loading

AI Generated Natural Language Processing Dialogue Systems 🏢 Seoul National University

Unified Spoken Dialog Model (USDM) directly generates coherent spoken responses with natural prosody, surpassing cascaded baselines and enhancing natural conversation in speech-enabled LLMs.

Panacea: Pareto Alignment via Preference Adaptation for LLMs

26 September 2024·2565 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Peking University

Panacea: a novel LLM alignment method achieving Pareto optimality via online preference adaptation using a single model.

PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition

26 September 2024·2356 words·12 mins· loading · loading

Natural Language Processing Named Entity Recognition 🏢 ByteDance

PaDeLLM-NER massively accelerates LLM-based NER inference by up to 10x, enabling near real-time performance without accuracy loss.

PaCE: Parsimonious Concept Engineering for Large Language Models

26 September 2024·2526 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Johns Hopkins University

PaCE, a novel activation engineering framework, efficiently aligns LLMs by removing undesirable concepts from activations using sparse coding, achieving state-of-the-art performance while preserving l…

Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation

26 September 2024·2892 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

Over-parameterized Distillation Framework (OPDF) boosts knowledge distillation by efficiently over-parameterizing student models via tensor decomposition, significantly improving performance without i…

Order-Independence Without Fine Tuning

26 September 2024·1791 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harvard University

Set-Based Prompting guarantees order-independent LLM outputs by modifying input representations, eliminating unwanted inconsistencies without fine-tuning.

Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling

26 September 2024·1896 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google Research

Orchid: a novel deep learning architecture using data-dependent convolution achieves quasilinear scalability and outperforms attention-based models on various sequence modeling tasks.

Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning

26 September 2024·4495 words·22 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 KAIST

LLMs boost tabular data prediction by generating optimized features via decision tree reasoning, outperforming existing methods.

Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives

26 September 2024·2599 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 CISPA Helmholtz Center for Information Security

Open LLMs outperform closed alternatives for private data adaptation, offering superior privacy, performance, and lower costs.

Online Iterative Reinforcement Learning from Human Feedback with General Preference Model

26 September 2024·1619 words·8 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Illinois Urbana-Champaign

This paper proposes a novel, reward-free RLHF framework using a general preference oracle, surpassing existing reward-based approaches in efficiency and generalizability.

Online Adaptation of Language Models with a Memory of Amortized Contexts

26 September 2024·2374 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 KAIST

MAC: Efficiently updates large language models (LLMs) using a memory of compressed contexts for improved real-time knowledge retention and adaptation.

OneBit: Towards Extremely Low-bit Large Language Models

26 September 2024·2001 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Research Center for Social Computing and Information Retrieval,Harbin Institute of Technology

OneBit achieves surprisingly good performance in 1-bit quantized LLMs by using a novel 1-bit parameter representation method and an effective parameter initialization method.

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

26 September 2024·3294 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Show Lab, National University of Singapore

VideoLISA: A video-based multimodal large language model enabling precise, language-instructed video object segmentation with superior performance.

Once Read is Enough: Domain-specific Pretraining-free Language Models with Cluster-guided Sparse Experts for Long-tail Domain Knowledge

26 September 2024·2658 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Oxford

This research introduces Cluster-guided Sparse Experts (CSE), enabling pretrained language models to effectively learn long-tail domain knowledge without domain-specific pretraining, thus achieving su…

On the Worst Prompt Performance of Large Language Models

26 September 2024·2797 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tencent AI Lab

LLMs’ performance drastically varies depending on prompt phrasing; this paper introduces ROBUSTAL-PACAEVAL to evaluate lower-bound performance via worst-case prompt analysis, revealing model inconsist…

On the Power of Decision Trees in Auto-Regressive Language Modeling

26 September 2024·2176 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Massachusetts Institute of Technology

Auto-Regressive Decision Trees (ARDTs) surprisingly outperform Transformers on language tasks!

On the Inductive Bias of Stacking Towards Improving Reasoning

26 September 2024·2018 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google Research

MIDAS: A novel training method improves language model reasoning by efficiently stacking middle layers, surprisingly boosting downstream task performance without increasing pretraining perplexity.

On Softmax Direct Preference Optimization for Recommendation

26 September 2024·1530 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

Softmax-DPO boosts LM-based recommender performance by directly optimizing for personalized ranking using a novel loss function that incorporates multiple negative samples, significantly outperforming…

On scalable oversight with weak LLMs judging strong LLMs

26 September 2024·5158 words·25 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Google DeepMind

Weak LLMs can accurately supervise strong LLMs via debate, outperforming simpler consultancy methods, especially in information-asymmetric tasks.

On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion

26 September 2024·2220 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Huazhong University of Science and Technology

Effortlessly boost large language model performance by dynamically fusing knowledge from smaller, task-specific models – achieving near full fine-tuning results with minimal computational cost!