Natural Language Processing

ORID: Organ-Regional Information Driven Framework for Radiology Report Generation

20 November 2024·3437 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 University of Sydney

ORID framework leverages organ-regional information to boost radiology report generation, achieving state-of-the-art accuracy by integrating multi-modal data and reducing noise from unrelated organs.

Hymba: A Hybrid-head Architecture for Small Language Models

20 November 2024·4219 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA

Hymba: Hybrid-head architecture boosts small language model performance by 11.67x cache size reduction and 3.49x throughput, surpassing existing models.

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

20 November 2024·2774 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University College London

BALROG benchmark rigorously evaluates LLMs’/VLMs’ abilities in complex games, revealing their strengths and weaknesses in long-term planning and decision-making, highlighting the need for improved vis…

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

20 November 2024·2311 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Government Technology Agency Singapore

New data-free methodology creates effective, generalizable LLMs guardrails against off-topic prompts, significantly improving LLM safety and responsible use.

Ultra-Sparse Memory Network

19 November 2024·5103 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ByteDance

UltraMem, a novel ultra-sparse memory network, drastically speeds up LLM inference by 6x compared to MoE while maintaining performance, paving the way for efficient large-scale model deployment.

RedPajama: an Open Dataset for Training Large Language Models

19 November 2024·7625 words·36 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

RedPajama, two massive open-source datasets, are released for training LLMs, improving transparency and facilitating the development of high-performing open-source models.

Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages

19 November 2024·3728 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Assam Kaziranga University

SUTRA tokenizer outperforms other LLMs in Indian languages, improving efficiency and facilitating better model performance.

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

18 November 2024·2024 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Chinese Information Processing Laboratory

Verifier engineering: A new post-training paradigm for foundation models using automated verifiers to provide effective supervision signals, enhancing capabilities beyond traditional data-centric meth…

Drowning in Documents: Consequences of Scaling Reranker Inference

18 November 2024·273 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Retrieval 🏢 Databricks

Scaling reranker inference surprisingly degrades retrieval quality beyond a certain point, prompting the need for more robust reranking techniques.

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

17 November 2024·3206 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

SageAttention2 achieves 4-bit accurate attention, boosting inference speed by 2x compared to FlashAttention2, while maintaining end-to-end accuracy across diverse models.

LLäMmlein: Compact and Competitive German-Only Language Models from Scratch

17 November 2024·3133 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Center for Artificial Intelligence and Data Science

New German-only LLMs, LLäMmlein 120M & 1B, trained from scratch & openly released, show competitive performance and offer insights into efficient model training.

SlimLM: An Efficient Small Language Model for On-Device Document Assistance

15 November 2024·2811 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Auburn University

SlimLM: Efficient small language models (SLMs) optimized for mobile document assistance, achieving comparable or superior performance to existing SLMs.

LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

14 November 2024·2885 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

LLaMA-Mesh: Unifying 3D mesh generation with LLMs by directly representing meshes as text, enabling efficient text-to-3D conversion within a single model.

Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering

14 November 2024·5666 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Department of Computer Science, University of Oregon

MedRGB benchmark reveals current LLMs struggle with noisy medical data, emphasizing the need for robust RAG systems in healthcare AI.

Adaptive Decoding via Latent Preference Optimization

14 November 2024·4975 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI

LLMs can dynamically adjust decoding temperature using Adaptive Decoding and Latent Preference Optimization, improving performance across creative and factual tasks.

Cut Your Losses in Large-Vocabulary Language Models

13 November 2024·2958 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Apple

Cut Cross-Entropy (CCE) dramatically reduces the memory footprint of training large language models by cleverly computing the cross-entropy loss without materializing the full logit matrix.

Can sparse autoencoders be used to decompose and interpret steering vectors?

13 November 2024·2017 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Oxford

Sparse autoencoders fail to accurately decompose and interpret steering vectors due to distribution mismatch and the inability to handle negative feature projections; this paper identifies these issue…

CamemBERT 2.0: A Smarter French Language Model Aged to Perfection

13 November 2024·1996 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Inria, Paris, France

CamemBERT 2.0: Two new French language models (CamemBERTav2 & CamemBERTv2) outperform predecessors by addressing temporal concept drift via larger, updated datasets and enhanced tokenization, demonstr…

Top-$nσ$: Not All Logits Are You Need

12 November 2024·2189 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 School of Computer Science and Technology, University of Science and Technology of China

Top-ησ: A novel LLM sampling method outperforms existing approaches by using a statistical threshold on pre-softmax logits, achieving higher accuracy while maintaining diversity, even at high temperat…

Large Language Models Can Self-Improve in Long-context Reasoning

12 November 2024·3316 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Peking University

LLMs can now self-improve long-context reasoning via SEALONG, a novel method leveraging multiple model outputs and minimum Bayes risk scoring to enable effective supervised fine-tuning or preference o…