🏢 Baidu Inc.
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
·2779 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Baidu Inc.
Decoupled-Head Attention (DHA) drastically cuts LLM inference costs by adaptively sharing key/value heads, achieving 97.6% of original performance with only 0.25% pre-training.
Automated Multi-level Preference for MLLMs
·2098 words·10 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Baidu Inc.
Automated Multi-level Preference (AMP) framework significantly improves multimodal large language model (MLLM) performance by using multi-level preferences during training, reducing hallucinations and…