Skip to main content

Computer Vision

OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models
·2382 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 OriginAI, Tel-Aviv, Israel
OmnimatteZero: Real-time omnimatte using pre-trained video diffusion, no training needed!
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
·3176 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Pattern Recognition Center, WeChat AI, Tencent
RDTF: Efficient animated sticker generation via dual-mask training, outperforming parameter-efficient tuning under constrained resources.
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
·4361 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 King Abdullah University of Science and Technology
4D-Bench: The first benchmark for assessing MLLMs in 4D object understanding, revealing weak temporal understanding and the need for advancements.
When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO
·1831 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Fudan University
Adaptive Diffusion Models with Minority-Aware Adaptive DPO
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
·3002 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Alibaba Group
TaoAvatar: Lifelike talking avatars in AR, using 3D Gaussian Splatting for real-time rendering and high fidelity.
Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID
·1496 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Object Detection 🏢 University of Melbourne
Presents a strong baseline for multi-UAV tracking in thermal infrared video using YOLOv12 and BoT-SORT, achieving competitive results without complex enhancements.
Optimized Minimal 3D Gaussian Splatting
·3465 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Sungkyunkwan University
OMG: optimized minimal 3D Gaussian splatting, enabling fast and efficient rendering with minimal storage.
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
·2762 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Oxford
Motion blur, usually a problem, is now a solution! This paper estimates camera motion from motion-blurred images, acting like an IMU.
FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields
·2851 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 KAIST, Visual Media Lab
FFaceNeRF: Enables few-shot face editing in NeRFs via geometry adapter & latent mixing, enhancing control & quality with limited training data.
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
·2987 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University
Zero-1-to-A: Animatable avatars from a single image using video diffusion, robust to spatial & temporal inconsistencies!
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
·2005 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 AIRI
Efficient image representation via adaptive token reduction.
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
·1204 words·6 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 EverEx
VideoRFSplat: Direct text-to-3D Gaussian Splatting with flexible pose and multi-view joint modeling, bypassing SDS refinement!
Unleashing Vecset Diffusion Model for Fast Shape Generation
·3881 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 MMLab, CUHK
FlashVDM enables fast 3D shape generation by accelerating both VAE decoding and diffusion sampling.
Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens
·3099 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 DP Technology
Uni-3DAR: Autoregressive framework unifies 3D generation/understanding, compressing spatial tokens for faster, versatile AI.
Ultra-Resolution Adaptation with Ease
·2457 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore
URA: Ultra-resolution adaptation made easy! Uses synthetic data & minor weight tuning for efficient, high-res text-to-image diffusion models.
Tokenize Image as a Set
·3037 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China
TokenSet: Tokenizing images as unordered sets for dynamic capacity allocation and robust generation, breaking from fixed-position latent codes.
Sonata: Self-Supervised Learning of Reliable Point Representations
·2429 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Hong Kong
Sonata: Reliable 3D point cloud self-supervised learning through self-distillation, achieving SOTA with less data.
Scale-wise Distillation of Diffusion Models
·3863 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Yandex Research
SWD: Scale-wise distillation of diffusion models achieves faster image generation by upscaling resolution during denoising, outperforming counterparts with similar computation.
SALT: Singular Value Adaptation with Low-Rank Transformation
·1957 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Mohamed Bin Zayed University of Artificial Intelligence
SALT: Fine-tuning SAM for medical images using Singular Value Adaptation with Low-Rank Transformation for efficient, robust segmentation.
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes
·4268 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Simon Fraser University
NuiScene: Enables efficient & unbounded outdoor scene generation by encoding scene chunks as uniform vector sets and outpainting.