🏢 University of Waterloo
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
·3029 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 University of Waterloo
VISTA synthesizes long-duration, high-resolution video instruction data, creating VISTA-400K and HRVideoBench to significantly boost video LMM performance.
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
·3438 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Waterloo
OmniEdit, a novel instruction-based image editing model, surpasses existing methods by leveraging specialist supervision and high-quality data, achieving superior performance across diverse editing ta…