🏢 Baidu
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding
·1696 words·8 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Baidu
Octopus, a novel multi-modal LLM, uses parallel visual recognition and sequential understanding to achieve 5x speedup on visual grounding and improved accuracy on various MLLM tasks.