Skip to main content

🏢 University of Waterloo

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
·3029 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 University of Waterloo
VISTA synthesizes long-duration, high-resolution video instruction data, creating VISTA-400K and HRVideoBench to significantly boost video LMM performance.
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
·3438 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Waterloo
OmniEdit, a novel instruction-based image editing model, surpasses existing methods by leveraging specialist supervision and high-quality data, achieving superior performance across diverse editing ta…