2025-02-19s

SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models

18 February 2025·2481 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KAIST

SafeRoute efficiently enhances LLM safety by adaptively using smaller and larger safety guard models, maximizing accuracy while minimizing costs.

Rethinking Diverse Human Preference Learning through Principal Component Analysis

18 February 2025·2799 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Rice University

Decomposed Reward Models (DRMs) extract diverse human preferences from binary comparisons using PCA, enabling flexible and interpretable LLM alignment.

RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm

18 February 2025·5226 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Sydney

RealSyn: A new, scalable multimodal dataset revolutionizes vision-language learning by effectively using interleaved image-text documents.

Pre-training Auto-regressive Robotic Models with 4D Representations

18 February 2025·2752 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley

ARM4R pre-trains autoregressive robotic models using low-level 4D representations from human videos, achieving efficient transfer learning and improved task performance across various environments.

Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research

18 February 2025·3084 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

Perovskite-LLM: a new knowledge-enhanced system boosts perovskite solar cell research by integrating a domain-specific knowledge graph, high-quality datasets, and specialized LLMs for superior knowled…

PAFT: Prompt-Agnostic Fine-Tuning

18 February 2025·3569 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

PAFT dynamically adjusts prompts during LLM fine-tuning, improving model robustness and generalization across diverse prompts without sacrificing performance or efficiency.

Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation

18 February 2025·2594 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Hong Kong University of Science and Technology

mmMamba: a novel framework creates linear-complexity multimodal models via distillation, drastically improving efficiency without sacrificing performance.

Magma: A Foundation Model for Multimodal AI Agents

18 February 2025·5533 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Microsoft Research

Magma: a new foundation model for multimodal AI agents excels at bridging verbal and spatial intelligence, achieving state-of-the-art performance across various tasks, including UI navigation and robo…

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

18 February 2025·4689 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 California Institute of Technology

HEADINFER achieves memory-efficient LLM inference by cleverly offloading key-value cache to the CPU, enabling 4 million token inference on a single consumer GPU.

Eager Updates For Overlapped Communication and Computation in DiLoCo

18 February 2025·3815 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Federated Learning 🏢 Google DeepMind

Eager updates drastically speed up training massive language models by cleverly overlapping communication and computation in DiLoCo, achieving near-optimal performance even with low bandwidth.

Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge

18 February 2025·3819 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 City University of Hong Kong

Crowd-based comparative evaluation significantly boosts LLM-as-a-judge accuracy by using crowd responses to expose deeper details, resulting in more reliable and efficient auto-evaluation.

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

18 February 2025·2814 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 AIRI

LLMs can losslessly compress 1568 tokens into a single vector, surpassing prior methods by two orders of magnitude.

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

17 February 2025·2710 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 School of Computer Science, Fudan University

Contrary to popular belief, longer reasoning chains don’t always boost Large Language Model (LLM) accuracy; this research reveals that parallel scaling with shorter solutions outperforms sequential sc…

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

17 February 2025·2535 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Finance 🏢 Harvard University

FLAG-TRADER fuses LLMs & RL for enhanced financial trading, achieving superior performance compared to traditional methods by efficiently integrating multimodal data and adapting to market dynamics.

Continuous Diffusion Model for Language Modeling

17 February 2025·1809 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Korea Advanced Institute of Science and Technology

RDLM: A novel continuous diffusion model for language modeling leverages the geometry of categorical distributions, outperforming existing discrete approaches and approaching autoregressive model perf…

Atom of Thoughts for Markov LLM Test-Time Scaling

17 February 2025·2660 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

Atom of Thoughts (AOT) revolutionizes LLM test-time scaling by decomposing complex reasoning into independent sub-questions, drastically reducing computation while maintaining high accuracy.

FinMTEB: Finance Massive Text Embedding Benchmark

16 February 2025·3630 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

FinMTEB: A new benchmark reveals that general-purpose embedding models struggle in the finance domain; domain-specific models excel, and surprisingly, simple BoW outperforms sophisticated models on ce…

Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages

15 February 2025·2355 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Minzu University of China

XLM-SWCM: A novel framework efficiently adapts multilingual encoders for text generation in extremely low-resource languages by cleverly sharing weights between encoder and decoder, achieving superior…

Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey

15 February 2025·1603 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Northeastern University

This survey paper comprehensively analyzes methods for injecting domain-specific knowledge into LLMs, categorizing them into four key approaches and evaluating their trade-offs to enhance performance …

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

14 February 2025·4310 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Peking University

HealthGPT: A novel medical vision-language model unifying comprehension and generation via heterogeneous knowledge adaptation, achieving superior performance on various medical tasks.