🏢 Meta AI

Intuitive physics understanding emerges from self-supervised pretraining on natural videos

17 February 2025·4400 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Meta AI

AI models learn intuitive physics from self-supervised video pretraining.

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

5 February 2025·3144 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI

Boosting language model reasoning: A novel hybrid approach using latent tokens drastically shortens reasoning traces, improving model performance and efficiency.

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

4 February 2025·3510 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Meta AI

VideoJAM enhances video generation by jointly learning appearance and motion representations, achieving state-of-the-art motion coherence.

MLLM-as-a-Judge for Image Safety without Human Labeling

31 December 2024·6596 words·31 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Classification 🏢 Meta AI

Zero-shot image safety judgment is achieved using MLLMs and a novel method called CLUE, objectifying safety rules, and significantly reducing the need for human labeling.

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models

24 December 2024·3061 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Meta AI

PartGen generates compositional 3D objects with meaningful parts from text, images, or unstructured 3D data using multi-view diffusion models, enabling flexible 3D part editing.

Training Large Language Models to Reason in a Continuous Latent Space

9 December 2024·2859 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI

LLMs are trained to reason using language, but COCONUT lets them reason directly in a continuous latent space, boosting performance on logical tasks requiring complex planning.

Efficient Track Anything

28 November 2024·2319 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Meta AI

EfficientTAMs achieve comparable video object segmentation accuracy to SAM 2 with ~2x speedup using lightweight ViTs and efficient cross-attention.

Adaptive Decoding via Latent Preference Optimization

14 November 2024·4975 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Meta AI

LLMs can dynamically adjust decoding temperature using Adaptive Decoding and Latent Preference Optimization, improving performance across creative and factual tasks.

Adaptive Caching for Faster Video Generation with Diffusion Transformers

4 November 2024·3142 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Meta AI

Adaptive Caching (AdaCache) dramatically speeds up video generation with diffusion transformers by cleverly caching and reusing computations, tailoring the process to each video’s complexity and motio…