Large Language Models

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

26 September 2024·2672 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University

ZipCache: Efficient KV cache quantization for LLMs using salient token identification, achieving 4.98x compression with minimal accuracy loss!

Zero-Shot Tokenizer Transfer

26 September 2024·2795 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Cambridge

Zero-Shot Tokenizer Transfer (ZeTT) detaches language models from their tokenizers via a hypernetwork, enabling efficient on-the-fly tokenizer swapping without retraining, significantly improving LLM …

You Only Cache Once: Decoder-Decoder Architectures for Language Models

26 September 2024·2411 words·12 mins· loading · loading

Large Language Models 🏢 Tsinghua University

YOCO: A decoder-decoder architecture for LLMs dramatically reduces memory usage and improves inference speed by caching key-value pairs only once.

xLSTM: Extended Long Short-Term Memory

26 September 2024·4451 words·21 mins· loading · loading

Large Language Models 🏢 ELLIS Unit, LIT AI Lab

XLSTM: Extended Long Short-Term Memory, introduces exponential gating and novel memory structures to overcome LSTM limitations, achieving performance comparable to state-of-the-art Transformers and St…

WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment

26 September 2024·3322 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

WorldCoder: an LLM agent builds world models via code generation and interaction, proving highly sample-efficient and enabling knowledge transfer.

WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena

26 September 2024·2352 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Microsoft Corporation

WizardArena simulates offline chatbot arena battles to efficiently post-train LLMs, dramatically reducing costs and improving model performance.

WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

26 September 2024·3638 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University

WISE, a novel dual-memory architecture, solves the impossible triangle of reliability, generalization, and locality in lifelong LLM editing by employing a side memory for knowledge updates and a route…

Who's asking? User personas and the mechanics of latent misalignment

26 September 2024·3650 words·18 mins· loading · loading

Large Language Models 🏢 Google Research

User personas significantly impact the safety of large language models, bypassing safety filters more effectively than direct prompting methods.

Where does In-context Learning \ Happen in Large Language Models?

26 September 2024·2289 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Johns Hopkins University

LLMs learn tasks via in-context learning, but the task recognition location is unknown. This paper reveals that LLMs transition from task recognition to task performance at specific layers, enabling s…

When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search

26 September 2024·2980 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Purdue University

RLbreaker uses deep reinforcement learning to efficiently create highly effective jailbreaking prompts, outperforming existing methods against multiple state-of-the-art LLMs and defenses.

What Rotary Position Embedding Can Tell Us: Identifying Query and Key Weights Corresponding to Basic Syntactic or High-level Semantic Information

26 September 2024·1978 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Dept. of CSE & School of AI & MoE Key Lab of AI, Shanghai Jiao Tong University

LLM fine-tuning made easy! This paper reveals how analyzing weight vector angles in RoPE positional embeddings helps optimize LLMs, reducing parameter count and improving efficiency.

What Makes and Breaks Safety Fine-tuning? A Mechanistic Study

26 September 2024·10141 words·48 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Oxford

Safety fine-tuning for LLMs is shown to minimally transform weights, clustering inputs based on safety, but is easily bypassed by adversarial attacks.

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

26 September 2024·2463 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Shanghai Artificial Intelligence Laboratory

Align LLMs efficiently via test-time search using smaller models!

Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles

26 September 2024·1675 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Australian Institute for Machine Learning, University of Adelaide

SPLAT, a new benchmark using situation puzzles, effectively evaluates and elicits lateral thinking in LLMs through a multi-turn player-judge framework, revealing significant performance improvements o…

WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off

26 September 2024·2720 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Inria, CNRS, IRISA

WaterMax: a novel LLM watermarking scheme achieving high detectability and preserving text quality by cleverly generating multiple texts and selecting the most suitable one.

Watermarking Makes Language Models Radioactive

26 September 2024·3285 words·16 mins· loading · loading

Large Language Models 🏢 Meta FAIR

LLM watermarking leaves detectable traces in subsequently trained models, enabling detection of synthetic data usage—a phenomenon termed ‘radioactivity’.

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents

26 September 2024·2783 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Peking University

LLM-based agents are vulnerable to diverse backdoor attacks that manipulate their reasoning and outputs, highlighting the urgent need for targeted defenses.

WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models

26 September 2024·2439 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

WAGLE: A novel weight attribution-guided LLM unlearning framework boosts unlearning performance by strategically identifying and manipulating influential model weights, achieving a better balance betw…

Verified Code Transpilation with LLMs

26 September 2024·2009 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

LLMLIFT: An LLM-powered approach builds verified lifting tools for DSLs, outperforming prior symbolic methods in benchmark transpilation and requiring less development effort.

VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

26 September 2024·1706 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab

VeLoRA: Train massive LLMs efficiently by compressing intermediate activations!