↓Skip to main content

🏢 University of Technology Sydney

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

3 March 2025·1959 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Generation 🏢 University of Technology Sydney

VideoUFO: A new user-focused, million-scale dataset that improves text-to-video generation by aligning training data with real user interests and preferences!

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

5 November 2024·2197 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Technology Sydney

TIP-I2V: A million-scale dataset provides 1.7 million real user text & image prompts for image-to-video generation, boosting model development and safety.