Natural Language Processing

Artemis: Towards Referential Understanding in Complex Videos

26 September 2024·3373 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Chinese Academy of Sciences

Artemis: A new MLLM excels at video-based referential understanding, accurately describing targets within complex videos using natural language questions and bounding boxes, surpassing existing models…

ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction

26 September 2024·2152 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Peking University

ARKVALE boosts LLM inference efficiency by intelligently evicting and recalling key-value pairs from cache, improving latency and throughput without significant accuracy loss.

Are More LLM Calls All You Need? Towards the Scaling Properties of Compound AI Systems

26 September 2024·1725 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Stanford University

More LM calls don’t always mean better results for compound AI; this study reveals performance can initially increase then decrease, highlighting the importance of optimal call number prediction.

Approaching Human-Level Forecasting with Language Models

26 September 2024·4201 words·20 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 UC Berkeley

Language models (LMs) can now forecast future events as accurately as expert human forecasters! This groundbreaking research unveils a retrieval-augmented LM system surpassing human forecasters in spe…

Apathetic or Empathetic? Evaluating LLMs' Emotional Alignments with Humans

26 September 2024·3664 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tencent AI Lab

LLMs’ emotional alignment with humans is assessed using emotion appraisal theory, revealing that while LLMs respond appropriately in some cases, they lack alignment with human emotional behaviors and …

AP-Adapter: Improving Generalization of Automatic Prompts on Unseen Text-to-Image Diffusion Models

26 September 2024·2738 words·13 mins· loading · loading

Natural Language Processing Text Generation 🏢 State Key Laboratory for Novel Software Technology, Nanjing University

AP-Adapter boosts text-to-image diffusion model generalization by using a two-stage prompt optimization method that leverages large language models and inter-model differences.

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

26 September 2024·2347 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 EPFL, Switzerland

This study reveals that modifying optimizers to normalize updates based on angular changes and gradient signal-to-noise ratio significantly reduces the need for learning rate warmup in GPT training.

Analysing the Generalisation and Reliability of Steering Vectors

26 September 2024·2935 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Department of Computer Science, University College London

Steering vectors, while promising for controlling LLMs, show unreliable in- and out-of-distribution performance, highlighting crucial limitations for real-world applications.

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

26 September 2024·2583 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

ANAH-v2 tackles LLM hallucination by introducing a self-training framework that iteratively scales annotation datasets and improves annotator accuracy, achieving state-of-the-art results.

An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

26 September 2024·2754 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China

Extend LLMs context via a simple, training-efficient positional encoding method, CREAM, outperforming existing methods by focusing on crucial mid-context information.

AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

26 September 2024·3399 words·16 mins· loading · loading

Natural Language Processing Question Answering 🏢 Tsinghua University

AMOR: Adaptable Modular knowledge agent using LLMs, excels with FSM-based reasoning and process feedback, enabling human supervision and domain adaptation.

AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment

26 September 2024·1725 words·9 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Georgia Institute of Technology

AmoebaLLM: Instantly create optimally-sized LLMs for any platform!

ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

26 September 2024·3567 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 MIT

ALPS: An optimization-based framework achieves state-of-the-art one-shot LLM pruning, significantly reducing test perplexity and improving zero-shot performance.

ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models

26 September 2024·2122 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

ALPINE reveals how Transformer-based LLMs learn planning by embedding graph information into their weights, but also highlights their inability to handle transitive relationships.

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

26 September 2024·3772 words·18 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Nankai University

AlphaPruning leverages Heavy-Tailed Self-Regularization theory to allocate optimal layer-wise sparsity ratios in LLMs, achieving 80% sparsity in LLaMA-7B with reasonable perplexity.

AlphaMath Almost Zero: Process Supervision without Process

26 September 2024·2731 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tongyi Lab

AlphaMath: LLMs excel at math reasoning without human-annotated process supervision, using Monte Carlo Tree Search.

Alignment for Honesty

26 September 2024·3666 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

This paper introduces a novel framework for aligning LLMs with honesty, proposing new metrics and training techniques to make LLMs more truthful and less prone to confidently incorrect responses.

Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

26 September 2024·2342 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 King Abdullah University of Science and Technology

This study introduces ’native alignment’ for Arabic LLMs, achieving state-of-the-art results by aligning models during pre-training, rather than post-training.

Aligning to Thousands of Preferences via System Message Generalization

26 September 2024·3279 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 KAIST AI

JANUS, a 7B LLM, achieves high alignment to thousands of user preferences by generalizing from diverse system messages, outperforming existing LLMs on various benchmarks.

Aligning LLM Agents by Learning Latent Preference from User Edits

26 September 2024·2688 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

PRELUDE, a novel framework, leverages user edits of LLM outputs to learn latent preferences, improving agent alignment and minimizing edit costs. CIPHER, its efficient algorithm, infers preferences f…