Skip to main content

Large Language Models

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
·2672 words·13 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University
ZipCache: Efficient KV cache quantization for LLMs using salient token identification, achieving 4.98x compression with minimal accuracy loss!
Zero-Shot Tokenizer Transfer
·2795 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Cambridge
Zero-Shot Tokenizer Transfer (ZeTT) detaches language models from their tokenizers via a hypernetwork, enabling efficient on-the-fly tokenizer swapping without retraining, significantly improving LLM …
You Only Cache Once: Decoder-Decoder Architectures for Language Models
·2411 words·12 mins· loading · loading
Large Language Models 🏢 Tsinghua University
YOCO: A decoder-decoder architecture for LLMs dramatically reduces memory usage and improves inference speed by caching key-value pairs only once.
xLSTM: Extended Long Short-Term Memory
·4451 words·21 mins· loading · loading
Large Language Models 🏢 ELLIS Unit, LIT AI Lab
XLSTM: Extended Long Short-Term Memory, introduces exponential gating and novel memory structures to overcome LSTM limitations, achieving performance comparable to state-of-the-art Transformers and St…
WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment
·3322 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Cornell University
WorldCoder: an LLM agent builds world models via code generation and interaction, proving highly sample-efficient and enabling knowledge transfer.
WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena
·2352 words·12 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Microsoft Corporation
WizardArena simulates offline chatbot arena battles to efficiently post-train LLMs, dramatically reducing costs and improving model performance.
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
·3638 words·18 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University
WISE, a novel dual-memory architecture, solves the impossible triangle of reliability, generalization, and locality in lifelong LLM editing by employing a side memory for knowledge updates and a route…
Who's asking? User personas and the mechanics of latent misalignment
·3650 words·18 mins· loading · loading
Large Language Models 🏢 Google Research
User personas significantly impact the safety of large language models, bypassing safety filters more effectively than direct prompting methods.
Where does In-context Learning \ Happen in Large Language Models?
·2289 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Johns Hopkins University
LLMs learn tasks via in-context learning, but the task recognition location is unknown. This paper reveals that LLMs transition from task recognition to task performance at specific layers, enabling s…
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
·2980 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Purdue University
RLbreaker uses deep reinforcement learning to efficiently create highly effective jailbreaking prompts, outperforming existing methods against multiple state-of-the-art LLMs and defenses.
What Rotary Position Embedding Can Tell Us: Identifying Query and Key Weights Corresponding to Basic Syntactic or High-level Semantic Information
·1978 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Dept. of CSE & School of AI & MoE Key Lab of AI, Shanghai Jiao Tong University
LLM fine-tuning made easy! This paper reveals how analyzing weight vector angles in RoPE positional embeddings helps optimize LLMs, reducing parameter count and improving efficiency.
What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
·10141 words·48 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Oxford
Safety fine-tuning for LLMs is shown to minimally transform weights, clustering inputs based on safety, but is easily bypassed by adversarial attacks.
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
·2463 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Shanghai Artificial Intelligence Laboratory
Align LLMs efficiently via test-time search using smaller models!
Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles
·1675 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Australian Institute for Machine Learning, University of Adelaide
SPLAT, a new benchmark using situation puzzles, effectively evaluates and elicits lateral thinking in LLMs through a multi-turn player-judge framework, revealing significant performance improvements o…
WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off
·2720 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Inria, CNRS, IRISA
WaterMax: a novel LLM watermarking scheme achieving high detectability and preserving text quality by cleverly generating multiple texts and selecting the most suitable one.
Watermarking Makes Language Models Radioactive
·3285 words·16 mins· loading · loading
Large Language Models 🏢 Meta FAIR
LLM watermarking leaves detectable traces in subsequently trained models, enabling detection of synthetic data usage—a phenomenon termed ‘radioactivity’.
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
·2783 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Peking University
LLM-based agents are vulnerable to diverse backdoor attacks that manipulate their reasoning and outputs, highlighting the urgent need for targeted defenses.
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
·2439 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 IBM Research
WAGLE: A novel weight attribution-guided LLM unlearning framework boosts unlearning performance by strategically identifying and manipulating influential model weights, achieving a better balance betw…
Verified Code Transpilation with LLMs
·2009 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 UC Berkeley
LLMLIFT: An LLM-powered approach builds verified lifting tools for DSLs, outperforming prior symbolic methods in benchmark transpilation and requiring less development effort.
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
·1706 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab
VeLoRA: Train massive LLMs efficiently by compressing intermediate activations!