Computer Vision
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
·4169 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Fudan University
MagicMotion: A controllable video generation framework enabling precise object motion control through dense-to-sparse trajectory guidance.
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
·1572 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ByteDance Intelligent Creation
InfU: A new framework for flexible photo re-creation while preserving identity using Diffusion Transformers(DiTs).
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction
·2606 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 National University of Singapore
Coarse-to-Fine Token Prediction improves autoregressive image generation by assigning the same coarse label for similar tokens, balancing generation quality and computational efficiency.
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
·3624 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Copenhagen
GFS-VL: Enhancing few-shot 3D segmentation by synergizing vision-language models with few-shot learning for robust real-world application.
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
·4277 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ByteDance Seed
Expert Race: A flexible routing strategy for scaling diffusion transformer with mixture of experts.
CLS-RL: Image Classification with Rule-Based Reinforcement Learning
·2967 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Classification
🏢 Shanghai AI Laboratory
CLS-RL: Rule-based RL tackles catastrophic forgetting in MLLM image classification, outperforming SFT with better generalization and efficiency.
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
·3405 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Hong Kong
TokenBridge bridges continuous and discrete tokens for autoregressive visual generation, achieving high-quality synthesis with simple autoregressive modeling.
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
·3897 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 National University of Singapore
4DGS-1K: Achieves 1000+ FPS for dynamic scene rendering via a compact, memory-efficient framework, offering a 41x storage reduction and 9x faster speed.
Temporal Regularization Makes Your Video Generator Stronger
·3350 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
FluxFlow: Make your video generator stronger via temporal regularization!
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
·2233 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Shanghai Artificial Intelligence Laboratory
FakeVLM: A multimodal model & artifact-annotated dataset for detecting synthetic images with interpretable explanations, setting a new benchmark.
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
·3386 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Action Recognition
🏢 Zhejiang University
MotionStreamer: Streaming motion generation w/ diffusion-based autoregressive model in causal latent space.
LEGION: Learning to Ground and Explain for Synthetic Image Detection
·3727 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Shanghai Jiao Tong University
LEGION: Grounding and explaining synthetic image detection and refinement via multimodal learning.
Efficient Personalization of Quantized Diffusion Model without Backpropagation
·6238 words·30 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Seoul National University
Personalize diffusion models efficiently on devices without backpropagation.
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
·2721 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tsinghua University
DeepMesh: RL-guided auto-regressive creation of artist-quality 3D meshes, enhanced by tokenization & DPO for human-aligned aesthetics.
Cube: A Roblox View of 3D Intelligence
·2896 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Roblox
Roblox presents Cube, a 3D intelligence model using shape tokenization for text-to-shape, shape-to-text, and text-to-scene generation.
Make Your Training Flexible: Towards Deployment-Efficient Video Models
·5609 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Shanghai Jiao Tong University
FluxViT: Flexible video models via adaptive token selection for efficient deployment!
MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation
·3052 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Peking University
MagicComp: Dual-Phase Refinement Enables Training-Free Compositional Video Generation
Impossible Videos
·4228 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 National University of Singapore
Impossible videos expose AI limits!
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
·365 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tsinghua University
DiffMoE: Dynamically selects tokens for scalable diffusion transformers, unlocking new efficiency levels in image generation.
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
·4257 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 NVIDIA
Cosmos-Transfer1: An adaptable conditional world generation model using multimodal control.