🏢 University of Toronto
MMFactory: A Universal Solution Search Engine for Vision-Language Tasks
·2929 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 University of Toronto
MMFactory: A universal framework for vision-language tasks, offering diverse programmatic solutions based on user needs and constraints, outperforming existing methods.
Wonderland: Navigating 3D Scenes from a Single Image
·3153 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Toronto
Generate wide-scope 3D scenes from single images in a snap!
Mind the Time: Temporally-Controlled Multi-Event Video Generation
·4541 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 University of Toronto
MinT: Generating coherent videos with precisely timed, multiple events via temporal control, surpassing existing methods.
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
·2596 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 University of Toronto
AC3D achieves precise 3D camera control in video diffusion transformers by analyzing camera motion’s spectral properties, optimizing pose conditioning, and using a curated dataset of dynamic videos.
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
·3777 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Toronto
SG-I2V: Zero-shot controllable image-to-video generation using a self-guided approach that leverages pre-trained models for precise object and camera motion control.
Minimum Entropy Coupling with Bottleneck
·2581 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Theory
Optimization
🏢 University of Toronto
A new lossy compression framework handles reconstruction distribution divergence by integrating a bottleneck, extending minimum entropy coupling and offering guaranteed performance.