Large Language Models
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
·2984 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tongyi Lab
IOPO empowers LLMs to master complex instructions via input-output preference optimization, boasting significant performance gains on a new benchmark, TRACE.
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models
·3715 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Hong Kong University of Science and Technology
Golden Touchstone, a new bilingual benchmark, comprehensively evaluates financial LLMs across eight tasks, revealing model strengths and weaknesses and advancing FinLLM research.
Balancing Pipeline Parallelism with Vocabulary Parallelism
·3226 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ National University of Singapore
Boost large language model training speed by 51% with Vocabulary Parallelism, a novel technique that balances computation and memory usage across pipeline stages.
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
·5600 words·27 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ INF
OpenCoder, a top-tier open-source code LLM, is introduced, providing not only model weights and code but also reproducible training data, data processing pipelines, and training protocols, enabling co…
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
·6075 words·29 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Cambridge
Can LLMs effectively handle information spread across vast, almost million-scale datasets? This research investigates this question by evaluating 17 LLMs on novel βneedle threadingβ tasks. These task…
Hardware and Software Platform Inference
·2667 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Imperial College London
Researchers developed Hardware and Software Platform Inference (HSPI) to identify the underlying GPU and software stack used to serve LLMs, enhancing transparency in the industry.
DELIFT: Data Efficient Language model Instruction Fine Tuning
·1830 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ IBM Research
DELIFT: Data Efficient Language Model Instruction Fine-Tuning, drastically reduces the data needed for effective LLM fine-tuning without sacrificing performance.
BitNet a4.8: 4-bit Activations for 1-bit LLMs
·2844 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Microsoft Research
BitNet a4.8 achieves comparable performance to existing 1-bit LLMs, but with significantly faster inference, by using a hybrid quantization and sparsification strategy for 4-bit activations.
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge
·2051 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ UC San Francisco
Zebra-Llama, a context-aware LLM, democratizes rare disease knowledge by providing highly precise, context-rich information about Ehlers-Danlos Syndrome, significantly improving diagnostic support.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
·3659 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tsinghua University
WEBRL: A self-evolving online curriculum reinforcement learning framework empowers open LLMs to excel as high-performing web agents, surpassing proprietary models.
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
·4028 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tsinghua University
Researchers discovered predictable scaling laws for activation sparsity in LLMs, showing how data, architecture, and model size influence sparsity, paving the way for more efficient and interpretable …
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study
·1998 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Norwegian University of Science and Technology
Boosting unit test generation efficiency, this study empirically evaluates various parameter-efficient fine-tuning methods on LLMs, demonstrating comparable performance to full fine-tuning at signific…
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
·1756 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tencent AI Lab
Tencent unveils Hunyuan-Large, a groundbreaking open-source MoE LLM boasting 389B parameters and 52B activated parameters, surpassing existing models in performance across various benchmarks.
DynaSaur: Large Language Agents Beyond Predefined Actions
·2738 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Maryland
DynaSaur: a novel LLM agent framework enabling dynamic action creation, surpassing prior methods with greater flexibility and top performance on the GAIA benchmark.
Sample-Efficient Alignment for LLMs
·2536 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Sea AI Lab
Sample-efficient LLM alignment achieved via a novel Thompson sampling algorithm (SEA), outperforming existing methods.
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks
·4411 words·21 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of British Columbia
Swan & ArabicMTEB: New dialect-aware Arabic embedding models and benchmark achieve state-of-the-art performance, addressing limitations of existing multilingual models.
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
·2387 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ FPT Software AI Center
LibMoE: A new library streamlines MoE research by offering standardized training, evaluation, and a modular design, enabling efficient benchmarking of various MoE algorithms for LLMs.
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
·5414 words·26 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Carnegie Mellon University
Specialized Sparse Autoencoders (SSAEs) decode foundation models’ ‘dark matter’ features, efficiently extracting rare subdomain concepts for improved interpretability and safety.
LLaMo: Large Language Model-based Molecular Graph Assistant
·3401 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Korea University
LLaMo: a novel large molecular graph-language model seamlessly integrates molecular graph encoders and LLMs, achieving state-of-the-art performance in molecule description generation, property predict…
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
·1865 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ LMU Munich & Munich Center for Machine Learning
GlotCC: Open multilingual corpus & pipeline for minority languages, exceeding 1000 languages.