Skip to main content

🏢 University of Waterloo

LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
·2412 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Waterloo
LOCATEdit refines cross-attention maps with graph Laplacian regularization, achieving precise & localized text-guided image editing without artifacts.
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
·2529 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Waterloo
VisualWebInstruct: Scales up multimodal instruction data via web search, enhancing VLMs’ reasoning for complex tasks.
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
·3269 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 University of Waterloo
AceCoder uses automated test-case synthesis to create a large-scale dataset for training reward models, enabling effective reinforcement learning to significantly boost code generation model performan…
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
·3029 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 University of Waterloo
VISTA synthesizes long-duration, high-resolution video instruction data, creating VISTA-400K and HRVideoBench to significantly boost video LMM performance.
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
·3438 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Waterloo
OmniEdit, a novel instruction-based image editing model, surpasses existing methods by leveraging specialist supervision and high-quality data, achieving superior performance across diverse editing ta…