Skip to main content

🏢 University of Technology Sydney

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
·1959 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Generation 🏢 University of Technology Sydney
VideoUFO: A new user-focused, million-scale dataset that improves text-to-video generation by aligning training data with real user interests and preferences!
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
·2197 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Technology Sydney
TIP-I2V: A million-scale dataset provides 1.7 million real user text & image prompts for image-to-video generation, boosting model development and safety.