Large Language Models

Doing Experiments and Revising Rules with Natural Language and Probabilistic Reasoning

26 September 2024·3039 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

This paper introduces ActiveACRE, a model that uses LLMs and probabilistic inference to infer natural language rules through online experimentation, demonstrating higher accuracy than existing methods…

DoFIT: Domain-aware Federated Instruction Tuning with Alleviated Catastrophic Forgetting

26 September 2024·2536 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Nanjing University of Science and Technology

DoFIT: A novel domain-aware framework significantly reduces catastrophic forgetting in federated instruction tuning by finely aggregating overlapping weights and using a proximal perturbation initiali…

Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models

26 September 2024·2327 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

LLMs’ reasoning abilities are assessed via a novel framework that leverages probabilities of causation, revealing that while capable, their understanding of causality falls short of human-level reason…

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

26 September 2024·2914 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Department of Computer Science, University of Chicago

LLMs’ fact retrieval is easily manipulated by context, highlighting their associative memory behavior; this paper studies this with transformers, showing how self-attention and value matrices support …

Do LLMs Build World Representations? Probing Through the Lens of State Abstraction

26 September 2024·2243 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Mila, McGill University

LLMs prioritize task completion over full world-state understanding by using goal-oriented abstractions.

DLAD: Improving Logits-based Detector without Logits from Black-box LLMs

26 September 2024·2559 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MBZUAI

DALD: A novel framework for black-box LLM text detection, achieving state-of-the-art performance without relying on source model logits, by aligning surrogate model distributions.

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

26 September 2024·2382 words·12 mins· loading · loading

Large Language Models 🏢 Harbin Institute of Technology

FUNCODER: a novel code generation framework that uses a divide-and-conquer approach with functional consensus to generate code that meets complex requirements.

Divergences between Language Models and Human Brains

26 September 2024·2519 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Language models struggle with social/emotional intelligence and physical commonsense, unlike human brains. Fine-tuning models on these aspects improves their brain response prediction accuracy.

Distributional Preference Alignment of LLMs via Optimal Transport

26 September 2024·2204 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

LLMs are aligned to human preferences distributionally using Optimal Transport, achieving state-of-the-art performance.

DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

26 September 2024·3179 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Samsung Research

DISP-LLM: A novel dimension-independent structural pruning method for LLMs achieves accuracy similar to semi-structural pruning while improving flexibility and efficiency, outperforming state-of-the-a…

Discrete Flow Matching

26 September 2024·2076 words·10 mins· loading · loading

Large Language Models 🏢 Meta FAIR

Discrete Flow Matching (DFM) revolutionizes discrete data generation by introducing a novel flow paradigm that surpasses existing methods. DFM leverages flexible probability paths, enabling efficient …

Discovery of the Hidden World with Large Language Models

26 September 2024·6303 words·30 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Hong Kong Baptist University

COAT leverages LLMs to identify high-level causal factors from unstructured data, enabling causal discovery in real-world scenarios where well-defined variables are lacking.

Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models

26 September 2024·1939 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

DSA, a novel automated framework, discovers optimal sparsity allocation for layer-wise LLM pruning, achieving significant performance gains across various models and tasks.

Discovering Preference Optimization Algorithms with and for Large Language Models

26 September 2024·4948 words·24 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Sakana AI

LLMs discover novel offline preference optimization algorithms, achieving state-of-the-art performance on various tasks.

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models

26 September 2024·2902 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Diffusion-of-Thought (DoT) boosts reasoning in diffusion language models by enabling parallel reasoning steps, outperforming larger autoregressive models in speed and accuracy.

Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

26 September 2024·2220 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Qing Yuan Research Institute, SEIEE, Shanghai Jiao Tong University

Diff-eRank: A novel rank-based metric assessing LLMs’ efficiency in eliminating redundant information during training, showing improved correlation with model size and performance.

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

26 September 2024·2779 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Baidu Inc.

Decoupled-Head Attention (DHA) drastically cuts LLM inference costs by adaptively sharing key/value heads, achieving 97.6% of original performance with only 0.25% pre-training.

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

26 September 2024·2333 words·11 mins· loading · loading

Large Language Models 🏢 University of Mannheim

DeTikZify: AI synthesizes publication-ready scientific figures from sketches and existing figures, automatically generating semantically-preserving TikZ code.

DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning

26 September 2024·2740 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 ByteDance

DeTeCtive: a novel multi-task contrastive learning framework, achieves state-of-the-art AI-generated text detection by distinguishing diverse writing styles instead of simple binary classification.

DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning

26 September 2024·3087 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

DETAIL: A novel attribution method reveals the impact of individual demonstrations in in-context learning, boosting interpretability and improving transformer-based model performance.