Natural Language Processing
Artemis: Towards Referential Understanding in Complex Videos
·3373 words·16 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 University of Chinese Academy of Sciences
Artemis: A new MLLM excels at video-based referential understanding, accurately describing targets within complex videos using natural language questions and bounding boxes, surpassing existing models…
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction
·2152 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Peking University
ARKVALE boosts LLM inference efficiency by intelligently evicting and recalling key-value pairs from cache, improving latency and throughput without significant accuracy loss.
Are More LLM Calls All You Need? Towards the Scaling Properties of Compound AI Systems
·1725 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Stanford University
More LM calls don’t always mean better results for compound AI; this study reveals performance can initially increase then decrease, highlighting the importance of optimal call number prediction.
Approaching Human-Level Forecasting with Language Models
·4201 words·20 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 UC Berkeley
Language models (LMs) can now forecast future events as accurately as expert human forecasters! This groundbreaking research unveils a retrieval-augmented LM system surpassing human forecasters in spe…
Apathetic or Empathetic? Evaluating LLMs' Emotional Alignments with Humans
·3664 words·18 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
LLMs’ emotional alignment with humans is assessed using emotion appraisal theory, revealing that while LLMs respond appropriately in some cases, they lack alignment with human emotional behaviors and …
AP-Adapter: Improving Generalization of Automatic Prompts on Unseen Text-to-Image Diffusion Models
·2738 words·13 mins·
loading
·
loading
Natural Language Processing
Text Generation
🏢 State Key Laboratory for Novel Software Technology, Nanjing University
AP-Adapter boosts text-to-image diffusion model generalization by using a two-stage prompt optimization method that leverages large language models and inter-model differences.
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
·2347 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 EPFL, Switzerland
This study reveals that modifying optimizers to normalize updates based on angular changes and gradient signal-to-noise ratio significantly reduces the need for learning rate warmup in GPT training.
Analysing the Generalisation and Reliability of Steering Vectors
·2935 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Department of Computer Science, University College London
Steering vectors, while promising for controlling LLMs, show unreliable in- and out-of-distribution performance, highlighting crucial limitations for real-world applications.
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
·2583 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
ANAH-v2 tackles LLM hallucination by introducing a self-training framework that iteratively scales annotation datasets and improves annotator accuracy, achieving state-of-the-art results.
An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding
·2754 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
Extend LLMs context via a simple, training-efficient positional encoding method, CREAM, outperforming existing methods by focusing on crucial mid-context information.
AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback
·3399 words·16 mins·
loading
·
loading
Natural Language Processing
Question Answering
🏢 Tsinghua University
AMOR: Adaptable Modular knowledge agent using LLMs, excels with FSM-based reasoning and process feedback, enabling human supervision and domain adaptation.
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
·1725 words·9 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Georgia Institute of Technology
AmoebaLLM: Instantly create optimally-sized LLMs for any platform!
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models
·3567 words·17 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 MIT
ALPS: An optimization-based framework achieves state-of-the-art one-shot LLM pruning, significantly reducing test perplexity and improving zero-shot performance.
ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models
·2122 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Microsoft Research
ALPINE reveals how Transformer-based LLMs learn planning by embedding graph information into their weights, but also highlights their inability to handle transitive relationships.
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
·3772 words·18 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Nankai University
AlphaPruning leverages Heavy-Tailed Self-Regularization theory to allocate optimal layer-wise sparsity ratios in LLMs, achieving 80% sparsity in LLaMA-7B with reasonable perplexity.
AlphaMath Almost Zero: Process Supervision without Process
·2731 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Tongyi Lab
AlphaMath: LLMs excel at math reasoning without human-annotated process supervision, using Monte Carlo Tree Search.
Alignment for Honesty
·3666 words·18 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
This paper introduces a novel framework for aligning LLMs with honesty, proposing new metrics and training techniques to make LLMs more truthful and less prone to confidently incorrect responses.
Alignment at Pre-training! Towards Native Alignment for Arabic LLMs
·2342 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 King Abdullah University of Science and Technology
This study introduces ’native alignment’ for Arabic LLMs, achieving state-of-the-art results by aligning models during pre-training, rather than post-training.
Aligning to Thousands of Preferences via System Message Generalization
·3279 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 KAIST AI
JANUS, a 7B LLM, achieves high alignment to thousands of user preferences by generalizing from diverse system messages, outperforming existing LLMs on various benchmarks.
Aligning LLM Agents by Learning Latent Preference from User Edits
·2688 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Microsoft Research
PRELUDE, a novel framework, leverages user edits of LLM outputs to learn latent preferences, improving agent alignment and minimizing edit costs. CIPHER, its efficient algorithm, infers preferences f…