🏢 Show Lab, National University of Singapore
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
·3294 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Show Lab, National University of Singapore
VideoLISA: A video-based multimodal large language model enabling precise, language-instructed video object segmentation with superior performance.
LOVA3: Learning to Visual Question Answering, Asking and Assessment
·3398 words·16 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Show Lab, National University of Singapore
LOVA³ enhances MLLMs by teaching them to ask and assess image-based questions, improving their multimodal understanding and performance on various benchmarks.
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
·2448 words·12 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Show Lab, National University of Singapore
Visual tokens boost long-text multi-modal models!
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
·2805 words·14 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Show Lab, National University of Singapore
EvolveDirector trains competitive text-to-image models using publicly available data by cleverly leveraging large vision-language models to curate and refine training datasets, dramatically reducing d…