🏢 Meta AI
MLLM-as-a-Judge for Image Safety without Human Labeling
·6596 words·31 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Classification
🏢 Meta AI
Zero-shot image safety judgment is achieved using MLLMs and a novel method called CLUE, objectifying safety rules, and significantly reducing the need for human labeling.
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
·3061 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Meta AI
PartGen generates compositional 3D objects with meaningful parts from text, images, or unstructured 3D data using multi-view diffusion models, enabling flexible 3D part editing.
Training Large Language Models to Reason in a Continuous Latent Space
·2859 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Meta AI
LLMs are trained to reason using language, but COCONUT lets them reason directly in a continuous latent space, boosting performance on logical tasks requiring complex planning.
Efficient Track Anything
·2319 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 Meta AI
EfficientTAMs achieve comparable video object segmentation accuracy to SAM 2 with ~2x speedup using lightweight ViTs and efficient cross-attention.
Adaptive Decoding via Latent Preference Optimization
·4975 words·24 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Meta AI
LLMs can dynamically adjust decoding temperature using Adaptive Decoding and Latent Preference Optimization, improving performance across creative and factual tasks.
Adaptive Caching for Faster Video Generation with Diffusion Transformers
·3142 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Meta AI
Adaptive Caching (AdaCache) dramatically speeds up video generation with diffusion transformers by cleverly caching and reusing computations, tailoring the process to each video’s complexity and motio…