Skip to main content

🏢 University of Technology Sydney

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
·2197 words·11 mins
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Technology Sydney
TIP-I2V: A million-scale dataset provides 1.7 million real user text & image prompts for image-to-video generation, boosting model development and safety.