Large Language Models

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

26 September 2024·1828 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 HPI / University of Potsdam

Boosting LLM accuracy, a new calibration method using a special [IDK] token explicitly models uncertainty, mitigating hallucinations, and improving factual precision while maintaining knowledge retent…

HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis

26 September 2024·2870 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 UC San Diego

HYSYNTH: A hybrid approach uses LLMs to create context-free surrogate models that guide efficient program synthesis, outperforming LLMs alone and existing synthesizers across multiple domains.

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

26 September 2024·1914 words·9 mins· loading · loading

Large Language Models 🏢 University of Texas at Austin

HydraLoRA: Asymmetric LoRA boosts LLM fine-tuning efficiency by sharing parameters across tasks while specializing others, outperforming existing methods.

HYDRA: Model Factorization Framework for Black-Box LLM Personalization

26 September 2024·2980 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Georgia Institute of Technology

HYDRA, a novel model factorization framework, significantly improves black-box LLM personalization by capturing both user-specific behavior and shared knowledge, achieving a 9.01% average relative imp…

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

26 September 2024·2522 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Hydra: Bidirectional sequence modeling redefined with quasiseparable matrix mixers, outperforming existing models on various benchmarks!

HuRef: HUman-REadable Fingerprint for Large Language Models

26 September 2024·2598 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Shanghai Jiao Tong University

HuRef: Generate unique, human-readable fingerprints for LLMs to protect copyright without exposing model parameters or impeding training.

How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad

26 September 2024·3573 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Apple

Transformers struggle with complex reasoning tasks. This paper introduces ‘globality degree’ to measure task difficulty and shows that high globality hinders efficient learning. However, using ‘induc…

How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers

26 September 2024·3480 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology

Pre-trained language models’ base capabilities are significantly influenced by architecture, not just scale; a novel Combination Enhanced Architecture (CEA) improves performance by addressing FFN-Wide…

How do Large Language Models Handle Multilingualism?

26 September 2024·2895 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 DAMO Academy, Alibaba Group, Singapore

LLMs surprisingly process multilingual queries via an English-centric intermediate stage before generating responses in the original language, a phenomenon explained by the proposed MWork framework an…

How Do Large Language Models Acquire Factual Knowledge During Pretraining?

26 September 2024·4632 words·22 mins· loading · loading

Natural Language Processing Large Language Models 🏢 KAIST

LLMs’ factual knowledge acquisition during pretraining is surprisingly non-linear: more data doesn’t guarantee better knowledge retention, and forgetting follows a power law.

HonestLLM: Toward an Honest and Helpful Large Language Model

26 September 2024·3514 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Peking University

HonestLLM boosts LLM honesty & helpfulness by 65.3% (Llama3-8b) and 124.7% (Mistral-7b) using training-free and fine-tuning methods, establishing principles and a new dataset (HONESET) for honesty eva…

HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction

26 September 2024·2361 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tsinghua University

HLM-Cite: A hybrid language model workflow boosts scientific citation prediction accuracy by 17.6% and scales to 100K candidate papers, surpassing existing methods.

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

26 September 2024·3025 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Ohio State University

HippoRAG, a neurobiologically inspired framework, dramatically improves LLM long-term memory and multi-hop question answering by synergistically orchestrating LLMs, knowledge graphs, and the Personali…

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

26 September 2024·3990 words·19 mins· loading · loading

Large Language Models 🏢 University of British Columbia

Adam’s superior performance on language models stems from its resilience to heavy-tailed class imbalance, unlike SGD, which struggles with infrequent word losses.

HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection

26 September 2024·2304 words·11 mins· loading · loading

Large Language Models 🏢 University of Wisconsin-Madison

HaloScope leverages unlabeled LLM outputs to accurately detect AI hallucinations without human annotation, significantly outperforming existing methods.

GTBench: Uncovering the Strategic Reasoning Capabilities of LLMs via Game-Theoretic Evaluations

26 September 2024·2898 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Drexel University

GTBENCH reveals LLMs’ strategic reasoning weaknesses via game-theoretic evaluations, showing strengths in probabilistic scenarios but struggles with deterministic ones; code-pretraining helps.

Group Robust Preference Optimization in Reward-free RLHF

26 September 2024·2045 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University College London (UCL)

Group Robust Preference Optimization (GRPO) enhances reward-free RLHF by aligning LLMs to diverse group preferences, maximizing worst-case performance, and significantly improving fairness.

Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization

26 September 2024·2486 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 the Ohio State University

Transformers can learn implicit reasoning through ‘grokking’, achieving high accuracy in composition and comparison tasks; however, generalization varies across reasoning types.

GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration

26 September 2024·1719 words·9 mins· loading · loading

Large Language Models 🏢 Princeton University

GREATS: a novel online batch selection method significantly speeds up LLM training by greedily selecting high-quality data batches in every iteration, improving both convergence and generalization per…

GraphVis: Boosting LLMs with Visual Knowledge Graph Integration

26 September 2024·2376 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Los Angeles

GraphVis boosts LLMs by visualizing knowledge graphs, improving accuracy in textual and visual question answering.