Skip to main content

🏢 Baidu

Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding
·1696 words·8 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Baidu
Octopus, a novel multi-modal LLM, uses parallel visual recognition and sequential understanding to achieve 5x speedup on visual grounding and improved accuracy on various MLLM tasks.