🏢 University of Technology Sydney
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
·2197 words·11 mins
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 University of Technology Sydney
TIP-I2V: A million-scale dataset provides 1.7 million real user text & image prompts for image-to-video generation, boosting model development and safety.