Skip to main content

🏢 Baidu Inc.

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
·2779 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Baidu Inc.
Decoupled-Head Attention (DHA) drastically cuts LLM inference costs by adaptively sharing key/value heads, achieving 97.6% of original performance with only 0.25% pre-training.
Automated Multi-level Preference for MLLMs
·2098 words·10 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Baidu Inc.
Automated Multi-level Preference (AMP) framework significantly improves multimodal large language model (MLLM) performance by using multi-level preferences during training, reducing hallucinations and…