🏢 Skywork AI
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
·3475 words·17 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Skywork AI
VITRON: a unified pixel-level Vision LLM excels in understanding, generating, segmenting, and editing images and videos.
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
·3418 words·17 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Skywork AI
OMG-LLaVA: A single model elegantly bridges image, object, and pixel-level reasoning for superior visual understanding.