Skip to main content

Large Language Models

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
·1828 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 HPI / University of Potsdam
Boosting LLM accuracy, a new calibration method using a special [IDK] token explicitly models uncertainty, mitigating hallucinations, and improving factual precision while maintaining knowledge retent…
HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis
·2870 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 UC San Diego
HYSYNTH: A hybrid approach uses LLMs to create context-free surrogate models that guide efficient program synthesis, outperforming LLMs alone and existing synthesizers across multiple domains.
HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
·1914 words·9 mins· loading · loading
Large Language Models 🏢 University of Texas at Austin
HydraLoRA: Asymmetric LoRA boosts LLM fine-tuning efficiency by sharing parameters across tasks while specializing others, outperforming existing methods.
HYDRA: Model Factorization Framework for Black-Box LLM Personalization
·2980 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Georgia Institute of Technology
HYDRA, a novel model factorization framework, significantly improves black-box LLM personalization by capturing both user-specific behavior and shared knowledge, achieving a 9.01% average relative imp…
Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
·2522 words·12 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
Hydra: Bidirectional sequence modeling redefined with quasiseparable matrix mixers, outperforming existing models on various benchmarks!
HuRef: HUman-REadable Fingerprint for Large Language Models
·2598 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Shanghai Jiao Tong University
HuRef: Generate unique, human-readable fingerprints for LLMs to protect copyright without exposing model parameters or impeding training.
How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad
·3573 words·17 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Apple
Transformers struggle with complex reasoning tasks. This paper introduces ‘globality degree’ to measure task difficulty and shows that high globality hinders efficient learning. However, using ‘induc…
How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers
·3480 words·17 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology
Pre-trained language models’ base capabilities are significantly influenced by architecture, not just scale; a novel Combination Enhanced Architecture (CEA) improves performance by addressing FFN-Wide…
How do Large Language Models Handle Multilingualism?
·2895 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 DAMO Academy, Alibaba Group, Singapore
LLMs surprisingly process multilingual queries via an English-centric intermediate stage before generating responses in the original language, a phenomenon explained by the proposed MWork framework an…
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
·4632 words·22 mins· loading · loading
Natural Language Processing Large Language Models 🏢 KAIST
LLMs’ factual knowledge acquisition during pretraining is surprisingly non-linear: more data doesn’t guarantee better knowledge retention, and forgetting follows a power law.
HonestLLM: Toward an Honest and Helpful Large Language Model
·3514 words·17 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Peking University
HonestLLM boosts LLM honesty & helpfulness by 65.3% (Llama3-8b) and 124.7% (Mistral-7b) using training-free and fine-tuning methods, establishing principles and a new dataset (HONESET) for honesty eva…
HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction
·2361 words·12 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Tsinghua University
HLM-Cite: A hybrid language model workflow boosts scientific citation prediction accuracy by 17.6% and scales to 100K candidate papers, surpassing existing methods.
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
·3025 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Ohio State University
HippoRAG, a neurobiologically inspired framework, dramatically improves LLM long-term memory and multi-hop question answering by synergistically orchestrating LLMs, knowledge graphs, and the Personali…
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
·3990 words·19 mins· loading · loading
Large Language Models 🏢 University of British Columbia
Adam’s superior performance on language models stems from its resilience to heavy-tailed class imbalance, unlike SGD, which struggles with infrequent word losses.
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
·2304 words·11 mins· loading · loading
Large Language Models 🏢 University of Wisconsin-Madison
HaloScope leverages unlabeled LLM outputs to accurately detect AI hallucinations without human annotation, significantly outperforming existing methods.
GTBench: Uncovering the Strategic Reasoning Capabilities of LLMs via Game-Theoretic Evaluations
·2898 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Drexel University
GTBENCH reveals LLMs’ strategic reasoning weaknesses via game-theoretic evaluations, showing strengths in probabilistic scenarios but struggles with deterministic ones; code-pretraining helps.
Group Robust Preference Optimization in Reward-free RLHF
·2045 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University College London (UCL)
Group Robust Preference Optimization (GRPO) enhances reward-free RLHF by aligning LLMs to diverse group preferences, maximizing worst-case performance, and significantly improving fairness.
Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization
·2486 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 the Ohio State University
Transformers can learn implicit reasoning through ‘grokking’, achieving high accuracy in composition and comparison tasks; however, generalization varies across reasoning types.
GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration
·1719 words·9 mins· loading · loading
Large Language Models 🏢 Princeton University
GREATS: a novel online batch selection method significantly speeds up LLM training by greedily selecting high-quality data batches in every iteration, improving both convergence and generalization per…
GraphVis: Boosting LLMs with Visual Knowledge Graph Integration
·2376 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 UC Los Angeles
GraphVis boosts LLMs by visualizing knowledge graphs, improving accuracy in textual and visual question answering.