↓Skip to main content

🏢 Skywork AI

Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

26 September 2024·3475 words·17 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Skywork AI

VITRON: a unified pixel-level Vision LLM excels in understanding, generating, segmenting, and editing images and videos.

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

26 September 2024·3418 words·17 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Skywork AI

OMG-LLaVA: A single model elegantly bridges image, object, and pixel-level reasoning for superior visual understanding.