Computer Vision

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

20 March 2025·4169 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Fudan University

MagicMotion: A controllable video generation framework enabling precise object motion control through dense-to-sparse trajectory guidance.

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

20 March 2025·1572 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Intelligent Creation

InfU: A new framework for flexible photo re-creation while preserving identity using Diffusion Transformers(DiTs).

Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction

20 March 2025·2606 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore

Coarse-to-Fine Token Prediction improves autoregressive image generation by assigning the same coarse label for similar tokens, balancing generation quality and computational efficiency.

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

20 March 2025·3624 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Copenhagen

GFS-VL: Enhancing few-shot 3D segmentation by synergizing vision-language models with few-shot learning for robust real-world application.

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

20 March 2025·4277 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Seed

Expert Race: A flexible routing strategy for scaling diffusion transformer with mixture of experts.

CLS-RL: Image Classification with Rule-Based Reinforcement Learning

20 March 2025·2967 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Classification 🏢 Shanghai AI Laboratory

CLS-RL: Rule-based RL tackles catastrophic forgetting in MLLM image classification, outperforming SFT with better generalization and efficiency.

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

20 March 2025·3405 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

TokenBridge bridges continuous and discrete tokens for autoregressive visual generation, achieving high-quality synthesis with simple autoregressive modeling.

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering

20 March 2025·3897 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore

4DGS-1K: Achieves 1000+ FPS for dynamic scene rendering via a compact, memory-efficient framework, offering a 41x storage reduction and 9x faster speed.

Temporal Regularization Makes Your Video Generator Stronger

19 March 2025·3350 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

FluxFlow: Make your video generator stronger via temporal regularization!

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

19 March 2025·2233 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Artificial Intelligence Laboratory

FakeVLM: A multimodal model & artifact-annotated dataset for detecting synthetic images with interpretable explanations, setting a new benchmark.

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

19 March 2025·3386 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Action Recognition 🏢 Zhejiang University

MotionStreamer: Streaming motion generation w/ diffusion-based autoregressive model in causal latent space.

LEGION: Learning to Ground and Explain for Synthetic Image Detection

19 March 2025·3727 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

LEGION: Grounding and explaining synthetic image detection and refinement via multimodal learning.

Efficient Personalization of Quantized Diffusion Model without Backpropagation

19 March 2025·6238 words·30 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Seoul National University

Personalize diffusion models efficiently on devices without backpropagation.

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

19 March 2025·2721 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University

DeepMesh: RL-guided auto-regressive creation of artist-quality 3D meshes, enhanced by tokenization & DPO for human-aligned aesthetics.

Cube: A Roblox View of 3D Intelligence

19 March 2025·2896 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Roblox

Roblox presents Cube, a 3D intelligence model using shape tokenization for text-to-shape, shape-to-text, and text-to-scene generation.

Make Your Training Flexible: Towards Deployment-Efficient Video Models

18 March 2025·5609 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Shanghai Jiao Tong University

FluxViT: Flexible video models via adaptive token selection for efficient deployment!

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

18 March 2025·3052 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Peking University

MagicComp: Dual-Phase Refinement Enables Training-Free Compositional Video Generation

Impossible Videos

18 March 2025·4228 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 National University of Singapore

Impossible videos expose AI limits!

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

18 March 2025·365 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University

DiffMoE: Dynamically selects tokens for scalable diffusion transformers, unlocking new efficiency levels in image generation.

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

18 March 2025·4257 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NVIDIA

Cosmos-Transfer1: An adaptable conditional world generation model using multimodal control.