🏢 Meta
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
·4248 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Meta
Scaling visual tokenizers dramatically improves image and video generation, achieving state-of-the-art results and outperforming existing methods with fewer computations by focusing on decoder scaling…
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
·3304 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Meta
Through-The-Mask uses mask-based motion trajectories to generate realistic videos from images and text, overcoming limitations of existing methods in handling complex multi-object motion.
VisualLens: Personalization through Visual History
·2160 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Visual Question Answering
🏢 Meta
VisualLens leverages user visual history for personalized recommendations, improving state-of-the-art by 5-10% and exceeding GPT-4’s performance.