Skip to main content

🏢 Meta AI

Intuitive physics understanding emerges from self-supervised pretraining on natural videos
·4400 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Meta AI
AI models learn intuitive physics from self-supervised video pretraining.
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
·3144 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI
Boosting language model reasoning: A novel hybrid approach using latent tokens drastically shortens reasoning traces, improving model performance and efficiency.
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
·3510 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Meta AI
VideoJAM enhances video generation by jointly learning appearance and motion representations, achieving state-of-the-art motion coherence.
MLLM-as-a-Judge for Image Safety without Human Labeling
·6596 words·31 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Classification 🏢 Meta AI
Zero-shot image safety judgment is achieved using MLLMs and a novel method called CLUE, objectifying safety rules, and significantly reducing the need for human labeling.
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
·3061 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Meta AI
PartGen generates compositional 3D objects with meaningful parts from text, images, or unstructured 3D data using multi-view diffusion models, enabling flexible 3D part editing.
Training Large Language Models to Reason in a Continuous Latent Space
·2859 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI
LLMs are trained to reason using language, but COCONUT lets them reason directly in a continuous latent space, boosting performance on logical tasks requiring complex planning.
Efficient Track Anything
·2319 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Meta AI
EfficientTAMs achieve comparable video object segmentation accuracy to SAM 2 with ~2x speedup using lightweight ViTs and efficient cross-attention.
Adaptive Decoding via Latent Preference Optimization
·4975 words·24 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI
LLMs can dynamically adjust decoding temperature using Adaptive Decoding and Latent Preference Optimization, improving performance across creative and factual tasks.
Adaptive Caching for Faster Video Generation with Diffusion Transformers
·3142 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Meta AI
Adaptive Caching (AdaCache) dramatically speeds up video generation with diffusion transformers by cleverly caching and reusing computations, tailoring the process to each video’s complexity and motio…