↓Skip to main content

🏢 Baidu Inc.

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

26 September 2024·2779 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Baidu Inc.

Decoupled-Head Attention (DHA) drastically cuts LLM inference costs by adaptively sharing key/value heads, achieving 97.6% of original performance with only 0.25% pre-training.

Automated Multi-level Preference for MLLMs

26 September 2024·2098 words·10 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Baidu Inc.

Automated Multi-level Preference (AMP) framework significantly improves multimodal large language model (MLLM) performance by using multi-level preferences during training, reducing hallucinations and…