↓Skip to main content

🏢 DAMO Academy, Alibaba Group

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

2 March 2025·2242 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 DAMO Academy, Alibaba Group

Babel: An open multilingual LLM supports over 90% of global speakers, filling the language coverage gap and setting new performance standards.

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

22 January 2025·4124 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 DAMO Academy, Alibaba Group

VideoLLaMA3: Vision-centric training yields state-of-the-art image & video understanding!

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

31 December 2024·3571 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 DAMO Academy, Alibaba Group

VideoRefer Suite boosts video LLM understanding by introducing a large-scale, high-quality object-level video instruction dataset, a versatile spatial-temporal object encoder model, and a comprehensiv…