Natural Language Processing
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge
·2051 words·10 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC San Francisco
Zebra-Llama, a context-aware LLM, democratizes rare disease knowledge by providing highly precise, context-rich information about Ehlers-Danlos Syndrome, significantly improving diagnostic support.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
·3659 words·18 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
WEBRL: A self-evolving online curriculum reinforcement learning framework empowers open LLMs to excel as high-performing web agents, surpassing proprietary models.
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
·4028 words·19 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
Researchers discovered predictable scaling laws for activation sparsity in LLMs, showing how data, architecture, and model size influence sparsity, paving the way for more efficient and interpretable …
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study
·1998 words·10 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Norwegian University of Science and Technology
Boosting unit test generation efficiency, this study empirically evaluates various parameter-efficient fine-tuning methods on LLMs, demonstrating comparable performance to full fine-tuning at signific…
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
·1756 words·9 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
Tencent unveils Hunyuan-Large, a groundbreaking open-source MoE LLM boasting 389B parameters and 52B activated parameters, surpassing existing models in performance across various benchmarks.
DynaSaur: Large Language Agents Beyond Predefined Actions
·2738 words·13 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Maryland
DynaSaur: a novel LLM agent framework enabling dynamic action creation, surpassing prior methods with greater flexibility and top performance on the GAIA benchmark.
Sample-Efficient Alignment for LLMs
·2536 words·12 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Sea AI Lab
Sample-efficient LLM alignment achieved via a novel Thompson sampling algorithm (SEA), outperforming existing methods.
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks
·4411 words·21 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of British Columbia
Swan & ArabicMTEB: New dialect-aware Arabic embedding models and benchmark achieve state-of-the-art performance, addressing limitations of existing multilingual models.
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
·2387 words·12 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 FPT Software AI Center
LibMoE: A new library streamlines MoE research by offering standardized training, evaluation, and a modular design, enabling efficient benchmarking of various MoE algorithms for LLMs.
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset
·5467 words·26 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 University of California Santa Cruz
GRS-QA: New benchmark dataset reveals LLM reasoning limitations!
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
·5414 words·26 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
Specialized Sparse Autoencoders (SSAEs) decode foundation models’ ‘dark matter’ features, efficiently extracting rare subdomain concepts for improved interpretability and safety.
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use
·3802 words·18 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Dialogue Systems
🏢 University of Michigan
Teaching AI agents with diverse and informative language feedback dramatically improves their learning, generalization, and adaptability.
LLaMo: Large Language Model-based Molecular Graph Assistant
·3401 words·16 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Korea University
LLaMo: a novel large molecular graph-language model seamlessly integrates molecular graph encoders and LLMs, achieving state-of-the-art performance in molecule description generation, property predict…
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
·1865 words·9 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 LMU Munich & Munich Center for Machine Learning
GlotCC: Open multilingual corpus & pipeline for minority languages, exceeding 1000 languages.
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
·3717 words·18 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
Constraint Back-translation enhances complex instruction following in LLMs by leveraging inherent constraints in existing datasets for efficient high-quality data creation.
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments
·6027 words·29 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Fudan University
BitStack: Dynamic LLM sizing for variable memory!
Controlling Language and Diffusion Models by Transporting Activations
·11502 words·54 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Apple
Steering large language and diffusion models is made easy and efficient via Activation Transport (ACT)! This novel framework uses optimal transport theory to precisely control model activations, leadi…
AAAR-1.0: Assessing AI's Potential to Assist Research
·5113 words·25 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Pennsylvania State University
AAAR-1.0 benchmark rigorously evaluates LLMs’ ability to assist in four core research tasks, revealing both potential and limitations.
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents
·2316 words·11 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Dialogue Systems
🏢 Computer Science and Engineering Department, IIT Kharagpur
This research introduces MLMCID, a novel pointer network architecture that excels at jointly extracting multiple intent spans and detecting multi-label, multi-class intents from complex, multilingual …
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
·2943 words·14 mins
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Alberta
NeuZip dynamically compresses neural network weights, achieving memory-efficient training and inference without performance loss, significantly reducing the memory footprint of large language models.