Skip to main content

🏢 Meta GenAI

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
·3592 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Meta GenAI
CrossFlow: Directly evolve any modality to another using flow matching, achieving state-of-the-art results across various tasks!
Apollo: An Exploration of Video Understanding in Large Multimodal Models
·1887 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Meta GenAI
Apollo LMMs achieve SOTA on video understanding tasks by exploring and optimizing the design and training of video-LMMs.