Skip to main content

Natural Language Processing

Balancing Pipeline Parallelism with Vocabulary Parallelism
·3226 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 National University of Singapore
Boost large language model training speed by 51% with Vocabulary Parallelism, a novel technique that balances computation and memory usage across pipeline stages.
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval
·523 words·3 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Information Extraction 🏒 IIT Kharagpur
RetrieveGPT enhances code-mixed information retrieval by merging GPT-3.5 Turbo prompts with a novel mathematical model, improving the accuracy of relevant document extraction from complex, sequenced c…
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
·5600 words·27 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 INF
OpenCoder, a top-tier open-source code LLM, is introduced, providing not only model weights and code but also reproducible training data, data processing pipelines, and training protocols, enabling co…
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
·6075 words·29 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Cambridge
Can LLMs effectively handle information spread across vast, almost million-scale datasets? This research investigates this question by evaluating 17 LLMs on novel β€˜needle threading’ tasks. These task…
Hardware and Software Platform Inference
·2667 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Imperial College London
Researchers developed Hardware and Software Platform Inference (HSPI) to identify the underlying GPU and software stack used to serve LLMs, enhancing transparency in the industry.
DELIFT: Data Efficient Language model Instruction Fine Tuning
·1830 words·9 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 IBM Research
DELIFT: Data Efficient Language Model Instruction Fine-Tuning, drastically reduces the data needed for effective LLM fine-tuning without sacrificing performance.
BitNet a4.8: 4-bit Activations for 1-bit LLMs
·2844 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Microsoft Research
BitNet a4.8 achieves comparable performance to existing 1-bit LLMs, but with significantly faster inference, by using a hybrid quantization and sparsification strategy for 4-bit activations.
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
·2200 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 Renmin University of China
HtmlRAG boosts RAG system accuracy by using HTML, not plain text, to model retrieved knowledge, improving knowledge representation and mitigating LLM hallucination.
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge
·2051 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 UC San Francisco
Zebra-Llama, a context-aware LLM, democratizes rare disease knowledge by providing highly precise, context-rich information about Ehlers-Danlos Syndrome, significantly improving diagnostic support.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
·3659 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
WEBRL: A self-evolving online curriculum reinforcement learning framework empowers open LLMs to excel as high-performing web agents, surpassing proprietary models.
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
·4028 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
Researchers discovered predictable scaling laws for activation sparsity in LLMs, showing how data, architecture, and model size influence sparsity, paving the way for more efficient and interpretable …
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study
·1998 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Norwegian University of Science and Technology
Boosting unit test generation efficiency, this study empirically evaluates various parameter-efficient fine-tuning methods on LLMs, demonstrating comparable performance to full fine-tuning at signific…
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
·1756 words·9 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tencent AI Lab
Tencent unveils Hunyuan-Large, a groundbreaking open-source MoE LLM boasting 389B parameters and 52B activated parameters, surpassing existing models in performance across various benchmarks.
DynaSaur: Large Language Agents Beyond Predefined Actions
·2738 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Maryland
DynaSaur: a novel LLM agent framework enabling dynamic action creation, surpassing prior methods with greater flexibility and top performance on the GAIA benchmark.
Sample-Efficient Alignment for LLMs
·2536 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Sea AI Lab
Sample-efficient LLM alignment achieved via a novel Thompson sampling algorithm (SEA), outperforming existing methods.
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks
·4411 words·21 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of British Columbia
Swan & ArabicMTEB: New dialect-aware Arabic embedding models and benchmark achieve state-of-the-art performance, addressing limitations of existing multilingual models.
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
·2387 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 FPT Software AI Center
LibMoE: A new library streamlines MoE research by offering standardized training, evaluation, and a modular design, enabling efficient benchmarking of various MoE algorithms for LLMs.
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset
·5467 words·26 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 University of California Santa Cruz
GRS-QA: New benchmark dataset reveals LLM reasoning limitations!
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
·5414 words·26 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Carnegie Mellon University
Specialized Sparse Autoencoders (SSAEs) decode foundation models’ ‘dark matter’ features, efficiently extracting rare subdomain concepts for improved interpretability and safety.
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use
·3802 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Dialogue Systems 🏒 University of Michigan
Teaching AI agents with diverse and informative language feedback dramatically improves their learning, generalization, and adaptability.