Large Language Models

CriticEval: Evaluating Large-scale Language Model as Critic

26 September 2024·4755 words·23 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Beijing Institute of Technology

CRITICEVAL: A new benchmark reliably evaluates LLMs’ ability to identify and correct flaws in their responses, addressing limitations of existing methods by offering comprehensive and reliable evaluat…

Crafting Interpretable Embeddings for Language Neuroscience by Asking LLMs Questions

26 September 2024·1981 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

LLM-based text embeddings are powerful but lack interpretability. This paper introduces QA-Emb, a novel method that uses an LLM to answer yes/no questions about a text, thereby producing an interpreta…

CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning

26 September 2024·2973 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 King Abdullah University of Science and Technology

CorDA: Context-oriented weight decomposition enhances large language model fine-tuning by integrating task context, improving efficiency and mitigating catastrophic forgetting.

Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents

26 September 2024·6245 words·30 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ETH Zurich

LLMs struggle to cooperate sustainably; GOVSIM reveals this, showing communication and ‘universalization’ reasoning improve outcomes.

Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models

26 September 2024·4422 words·21 mins· loading · loading

Machine Learning Large Language Models 🏢 University of Cambridge

Context-Aware Testing (CAT) revolutionizes ML model testing by using contextual information to identify relevant failures, surpassing traditional data-only methods.

ConStat: Performance-Based Contamination Detection in Large Language Models

26 September 2024·4433 words·21 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 ETH Zurich

ConStat: Exposing hidden LLM contamination!

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

26 September 2024·13838 words·65 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 UC Berkeley

LLMs surprisingly infer censored knowledge from implicit training data hints, posing safety challenges.

Confidence Regulation Neurons in Language Models

26 September 2024·3393 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 ETH Zurich

LLMs regulate uncertainty via specialized ’entropy’ and ’token frequency’ neurons, impacting prediction confidence without directly altering logits.

Compressing Large Language Models using Low Rank and Low Precision Decomposition

26 September 2024·2393 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Stanford University

CALDERA: a new post-training LLM compression algorithm achieving state-of-the-art zero-shot performance using low-rank, low-precision decomposition.

Compositional 3D-aware Video Generation with LLM Director

26 September 2024·2894 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

LLM-directed compositional 3D-aware video generation (C3V) achieves high-fidelity video generation with diverse motion and flexible concept control by decomposing prompts, generating 3D concepts, and …

Compact Language Models via Pruning and Knowledge Distillation

26 September 2024·4214 words·20 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 NVIDIA

MINITRON: Efficiently creating smaller, high-performing LLMs via pruning & distillation, slashing training costs by up to 40x!

CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization

26 September 2024·2398 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University at Albany, SUNY

CoMERA achieves 2-3x faster AI model training via rank-adaptive tensor optimization, significantly improving both computing and memory efficiency.

COLD: Causal reasOning in cLosed Daily activities

26 September 2024·3472 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Indian Institute of Technology Kanpur

COLD framework rigorously evaluates LLMs’ causal reasoning in everyday scenarios using 9 million causal queries derived from human-generated scripts of daily activities.

Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning

26 September 2024·2454 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences

CORY: a novel multi-agent RL framework boosts LLM fine-tuning!

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

26 September 2024·3695 words·18 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

New program synthesis method, REX, leverages Thompson Sampling to balance exploration and exploitation in iterative LLM code refinement, solving more problems with fewer model calls.

Co-occurrence is not Factual Association in Language Models

26 September 2024·1941 words·10 mins· loading · loading

Large Language Models 🏢 Tsinghua University

Language models struggle to learn facts; this study reveals they prioritize word co-occurrence over true factual associations, proposing new training strategies for improved factual knowledge generali…

CLUES: Collaborative Private-domain High-quality Data Selection for LLMs via Training Dynamics

26 September 2024·2368 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Cambridge

CLUES: Collaborative learning selects high-quality private data for LLM fine-tuning via training dynamics, significantly boosting performance in diverse domains.

Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models

26 September 2024·1676 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Shanghai University of Finance and Economics

CherryQ, a novel quantization method, leverages parameter heterogeneity in LLMs to achieve superior performance by selectively quantizing less critical parameters while preserving essential ones.

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

26 September 2024·2344 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

Chat-Scene: Bridging 3D scenes and LLMs using object identifiers for efficient, object-level interaction and improved scene comprehension.

Chain-of-Thought Reasoning Without Prompting

26 September 2024·2324 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs can reason effectively without prompting by simply adjusting the decoding process to reveal inherent chain-of-thought paths.