Skip to main content

Computer Vision

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2715 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology
LeviTor: Revolutionizing image-to-video synthesis with intuitive 3D trajectory control, generating realistic videos from static images by abstracting object masks into depth-aware control points.
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
·2004 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent PCG
DI-PCG uses a lightweight diffusion transformer to efficiently and accurately estimate parameters of procedural generators from images, enabling high-fidelity 3D asset creation.
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3907 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Harvard University
Affordance-Aware Object Insertion uses a novel Mask-Aware Dual Diffusion model & SAM-FB dataset to realistically place objects in scenes, considering contextual relationships.
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
·4162 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University
Prompting unlocks 4K metric depth from low-cost LiDAR.
FashionComposer: Compositional Fashion Image Generation
·2265 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong
FashionComposer revolutionizes fashion image creation through flexible composition of garments, faces, and poses.
AniDoc: Animation Creation Made Easier
·2223 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology
AniDoc automates cartoon animation line art video colorization, making animation creation easier!
VidTok: A Versatile and Open-Source Video Tokenizer
·2918 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Microsoft Research
VidTok: an open-source, top performing video tokenizer.
Move-in-2D: 2D-Conditioned Human Motion Generation
·2569 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Adobe Research
Move-in-2D generates realistic human motion sequences conditioned on a 2D scene image and text prompt, overcoming limitations of existing approaches and improving video synthesis.
MIVE: New Design and Benchmark for Multi-Instance Video Editing
·7714 words·37 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 KAIST
Edit many objects at once in videos! MIVE does it accurately without affecting other areas, a big step for AI video editing.
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
·1458 words·7 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tongyi Lab
ChatDiT enables zero-shot, multi-turn image generation using pretrained diffusion transformers and a novel multi-agent framework.
Wonderland: Navigating 3D Scenes from a Single Image
·3153 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Toronto
Generate wide-scope 3D scenes from single images in a snap!
StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors
·2185 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanjing University
Create realistic 3D heads with specific hairstyles from text, no 3D hair data needed!
Sequence Matters: Harnessing Video Models in 3D Super-Resolution
·4603 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Sungkyunkwan University
Leveraging video models, researchers achieve state-of-the-art 3D super-resolution by generating ‘video-like’ sequences from unordered images, eliminating artifacts and computational demands.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
·3969 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Peking University
MOVIS enhances 3D scene generation by improving cross-view consistency in multi-object novel view synthesis.
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
·3912 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Chinese University Hong Kong
IDArb: A diffusion model for decomposing images into intrinsic components like albedo, normal, and material properties, handling varying views and lighting.
ColorFlow: Retrieval-Augmented Image Sequence Colorization
·2655 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
ColorFlow, a new AI model, accurately colorizes black-and-white image sequences while preserving character identity.
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·3380 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology
Training-free method adds physical properties to 3D models using vision-language models.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
·2021 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of British Columbia
New attack fools breast ultrasound AI using subtle text prompts.
BrushEdit: All-In-One Image Inpainting and Editing
·3281 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University
BrushEdit revolutionizes interactive image editing with instructions & inpainting.
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
·3868 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Chinese University of Hong Kong
Neural LightRig uses multi-light diffusion to accurately estimate object normals and materials from a single image, outperforming existing methods.