Large Language Models
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
·2885 words·14 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tsinghua University
LLaMA-Mesh: Unifying 3D mesh generation with LLMs by directly representing meshes as text, enabling efficient text-to-3D conversion within a single model.
Cut Your Losses in Large-Vocabulary Language Models
·2958 words·14 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Apple
Cut Cross-Entropy (CCE) dramatically reduces the memory footprint of training large language models by cleverly computing the cross-entropy loss without materializing the full logit matrix.
Can sparse autoencoders be used to decompose and interpret steering vectors?
·2017 words·10 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Oxford
Sparse autoencoders fail to accurately decompose and interpret steering vectors due to distribution mismatch and the inability to handle negative feature projections; this paper identifies these issue…
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection
·1996 words·10 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Inria, Paris, France
CamemBERT 2.0: Two new French language models (CamemBERTav2 & CamemBERTv2) outperform predecessors by addressing temporal concept drift via larger, updated datasets and enhanced tokenization, demonstr…
Large Language Models Can Self-Improve in Long-context Reasoning
·3316 words·16 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Peking University
LLMs can now self-improve long-context reasoning via SEALONG, a novel method leveraging multiple model outputs and minimum Bayes risk scoring to enable effective supervised fine-tuning or preference o…
Direct Preference Optimization Using Sparse Feature-Level Constraints
·2078 words·10 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Westlake University
Feature-level constrained Preference Optimization (FPO) boosts LLM alignment efficiency and stability by using sparse autoencoders and feature-level constraints, achieving significant improvements ove…
Stronger Models are NOT Stronger Teachers for Instruction Tuning
·3212 words·16 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Washington
Larger language models aren’t always better teachers for instruction tuning; a new metric, CAR, predicts teacher model effectiveness better than existing methods.
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
·2396 words·12 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Taobao & Tmall Group of Alibaba
Chinese SimpleQA, a new benchmark, offers a comprehensive evaluation of the factuality of LLMs answering short questions in Chinese, exhibiting diversity, high quality, and ease of evaluation.
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction
·2573 words·13 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Oxford
Contrary to common belief, toxicity reduction in language models isn’t simply achieved by dampening toxic neurons; it’s a complex balancing act across multiple neuron groups.
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
·2984 words·15 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tongyi Lab
IOPO empowers LLMs to master complex instructions via input-output preference optimization, boasting significant performance gains on a new benchmark, TRACE.
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models
·3715 words·18 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Hong Kong University of Science and Technology
Golden Touchstone, a new bilingual benchmark, comprehensively evaluates financial LLMs across eight tasks, revealing model strengths and weaknesses and advancing FinLLM research.
Balancing Pipeline Parallelism with Vocabulary Parallelism
·3226 words·16 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ National University of Singapore
Boost large language model training speed by 51% with Vocabulary Parallelism, a novel technique that balances computation and memory usage across pipeline stages.
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
·5600 words·27 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ INF
OpenCoder, a top-tier open-source code LLM, is introduced, providing not only model weights and code but also reproducible training data, data processing pipelines, and training protocols, enabling co…
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
·6075 words·29 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Cambridge
Can LLMs effectively handle information spread across vast, almost million-scale datasets? This research investigates this question by evaluating 17 LLMs on novel βneedle threadingβ tasks. These task…
Hardware and Software Platform Inference
·2667 words·13 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Imperial College London
Researchers developed Hardware and Software Platform Inference (HSPI) to identify the underlying GPU and software stack used to serve LLMs, enhancing transparency in the industry.
DELIFT: Data Efficient Language model Instruction Fine Tuning
·1830 words·9 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ IBM Research
DELIFT: Data Efficient Language Model Instruction Fine-Tuning, drastically reduces the data needed for effective LLM fine-tuning without sacrificing performance.
BitNet a4.8: 4-bit Activations for 1-bit LLMs
·2844 words·14 mins
AI Generated
Natural Language Processing
Large Language Models
π’ Microsoft Research
BitNet a4.8 achieves comparable performance to existing 1-bit LLMs, but with significantly faster inference, by using a hybrid quantization and sparsification strategy for 4-bit activations.
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge
·2051 words·10 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ UC San Francisco
Zebra-Llama, a context-aware LLM, democratizes rare disease knowledge by providing highly precise, context-rich information about Ehlers-Danlos Syndrome, significantly improving diagnostic support.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
·3659 words·18 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tsinghua University
WEBRL: A self-evolving online curriculum reinforcement learning framework empowers open LLMs to excel as high-performing web agents, surpassing proprietary models.
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
·4028 words·19 mins
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tsinghua University
Researchers discovered predictable scaling laws for activation sparsity in LLMs, showing how data, architecture, and model size influence sparsity, paving the way for more efficient and interpretable …