Computer Vision
Move-in-2D: 2D-Conditioned Human Motion Generation
·2569 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Adobe Research
Move-in-2D generates realistic human motion sequences conditioned on a 2D scene image and text prompt, overcoming limitations of existing approaches and improving video synthesis.
MIVE: New Design and Benchmark for Multi-Instance Video Editing
·7714 words·37 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 KAIST
Edit many objects at once in videos! MIVE does it accurately without affecting other areas, a big step for AI video editing.
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
·1458 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tongyi Lab
ChatDiT enables zero-shot, multi-turn image generation using pretrained diffusion transformers and a novel multi-agent framework.
Wonderland: Navigating 3D Scenes from a Single Image
·3153 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Toronto
Generate wide-scope 3D scenes from single images in a snap!
StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors
·2185 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Nanjing University
Create realistic 3D heads with specific hairstyles from text, no 3D hair data needed!
Sequence Matters: Harnessing Video Models in 3D Super-Resolution
·4603 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Sungkyunkwan University
Leveraging video models, researchers achieve state-of-the-art 3D super-resolution by generating ‘video-like’ sequences from unordered images, eliminating artifacts and computational demands.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
·3969 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Peking University
MOVIS enhances 3D scene generation by improving cross-view consistency in multi-object novel view synthesis.
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
·3912 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Chinese University Hong Kong
IDArb: A diffusion model for decomposing images into intrinsic components like albedo, normal, and material properties, handling varying views and lighting.
ColorFlow: Retrieval-Augmented Image Sequence Colorization
·2655 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tsinghua University
ColorFlow, a new AI model, accurately colorizes black-and-white image sequences while preserving character identity.
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·3380 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Hong Kong University of Science and Technology
Training-free method adds physical properties to 3D models using vision-language models.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
·2021 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of British Columbia
New attack fools breast ultrasound AI using subtle text prompts.
BrushEdit: All-In-One Image Inpainting and Editing
·3281 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
BrushEdit revolutionizes interactive image editing with instructions & inpainting.
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
·3868 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Chinese University of Hong Kong
Neural LightRig uses multi-light diffusion to accurately estimate object normals and materials from a single image, outperforming existing methods.
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
·2785 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ETH Zurich
LoRACLR merges multiple LoRA models for high-fidelity multi-concept image generation, using a contrastive objective to ensure concept distinctiveness and prevent interference.
Learned Compression for Compressed Learning
·2966 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Classification
🏢 University of Texas at Austin
WaLLOC: a novel neural codec boosts compressed-domain learning by combining wavelet transforms with asymmetric autoencoders, achieving high compression ratios with minimal computation and uniform dime…
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
·4018 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Nanjing University
InstanceCap improves text-to-video generation through detailed, instance-aware captions.
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
·6779 words·32 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Georgia Institute of Technology
Gaze-LLE achieves state-of-the-art gaze estimation by using a frozen DINOv2 encoder and a lightweight decoder, simplifying architecture and improving efficiency.
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
·4390 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tencent AI Lab
FreeSplatter: a novel feed-forward framework reconstructs high-quality 3D scenes from uncalibrated sparse-view images, estimating camera poses in seconds.
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
·2401 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Nanyang Technological University
FreeScale generates stunning 8K images and high-fidelity videos without retraining.
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
·2812 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Virginia Tech
Edit images precisely with AI, no masks needed!