Computer Vision

How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?

26 September 2024·2761 words·13 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Mohamed Bin Zayed University of Artificial Intelligence

Concept-Incremental Flexible Customization (CIFC) model tackles catastrophic forgetting and concept neglect in continually adapting text-to-image diffusion models, enabling flexible personalization.

How Diffusion Models Learn to Factorize and Compose

26 September 2024·3926 words·19 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 MIT

Diffusion models surprisingly learn factorized representations, enabling compositional generalization, but struggle with interpolation; training with independent factors drastically improves data effi…

HOPE: Shape Matching Via Aligning Different K-hop Neighbourhoods

26 September 2024·1940 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

HOPE: a novel shape matching method achieving both accuracy and smoothness by aligning different k-hop neighborhoods and refining maps via local map distortion.

Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models

26 September 2024·2415 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Qualcomm AI Research

Hollowed Net efficiently personalizes text-to-image diffusion models on-device by temporarily removing deep U-Net layers during training, drastically reducing memory usage without sacrificing performa…

HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

26 September 2024·4260 words·20 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 University of Texas at Austin

HOI-Swap: a novel diffusion model flawlessly swaps objects in videos while intelligently preserving natural hand interactions, producing high-quality edits.

Historical Test-time Prompt Tuning for Vision Foundation Models

26 September 2024·2286 words·11 mins· loading · loading

Computer Vision Image Segmentation 🏢 Nanyang Technological University

HisTPT: Historical Test-Time Prompt Tuning memorizes past learning, enabling robust online prompt adaptation for vision models, overcoming performance degradation in continuously changing data streams…

High-Resolution Image Harmonization with Adaptive-Interval Color Transformation

26 September 2024·3030 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Harbin Institute of Technology

AICT: Adaptive-Interval Color Transformation harmonizes high-resolution images by predicting pixel-wise color changes, adaptively adjusting sampling intervals to capture local variations, and using a …

Hierarchical Uncertainty Exploration via Feedforward Posterior Trees

26 September 2024·5486 words·26 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Technion-Israel Institute of Technology

Visualizing high-dimensional posterior distributions is challenging. This paper introduces ‘Posterior Trees,’ a novel method using tree-structured neural network predictions for hierarchical uncertai…

Hierarchical Selective Classification

26 September 2024·2174 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Technion

Hierarchical Selective Classification (HSC) improves deep learning model reliability for risk-sensitive tasks by leveraging hierarchical class relationships to provide more informative predictions eve…

HiCoM: Hierarchical Coherent Motion for Dynamic Streamable Scenes with 3D Gaussian Splatting

26 September 2024·2356 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Peking University

HiCoM, a novel framework, achieves high-fidelity streamable dynamic scene reconstruction by using a hierarchical coherent motion mechanism and parallel processing to significantly reduce training time…

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation

26 September 2024·3034 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 360 AI Research

HiCo: Hierarchical Controllable Diffusion Model achieves superior layout-to-image generation by disentangling spatial layouts through a multi-branch network structure, resulting in high-quality images…

HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting

26 September 2024·1800 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Johns Hopkins University

HDR-GS: 1000x faster HDR novel view synthesis via Gaussian splatting!

Harnessing small projectors and multiple views for efficient vision pretraining

26 September 2024·2903 words·14 mins· loading · loading

Computer Vision Self-Supervised Learning 🏢 Mila - Quebec AI Institute & Computer Science, McGill University

Boost self-supervised visual learning: This paper introduces theoretical insights and practical recommendations to significantly improve SSL’s efficiency and reduce data needs.

Harmonizing Stochasticity and Determinism: Scene-responsive Diverse Human Motion Prediction

26 September 2024·2828 words·14 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Zhejiang University

DiMoP3D: Predicting diverse, physically realistic human motions in 3D scenes by harmonizing stochasticity and determinism.

Happy: A Debiased Learning Framework for Continual Generalized Category Discovery

26 September 2024·2362 words·12 mins· loading · loading

Computer Vision Image Classification 🏢 Institute of Automation, Chinese Academy of Sciences

Happy: a novel debiased learning framework, excels at continually discovering new categories from unlabeled data while retaining knowledge of previously learned ones, overcoming existing bias issues a…

Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba

26 September 2024·3671 words·18 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Carnegie Mellon University

Hamba: a novel graph-guided framework for single-view 3D hand reconstruction, significantly outperforms existing methods by efficiently modeling spatial relationships between joints using a fraction o…

Hallo3D: Multi-Modal Hallucination Detection and Mitigation for Consistent 3D Content Generation

26 September 2024·2871 words·14 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Chinese Academy of Sciences

Hallo3D: a tuning-free method resolving 3D generation hallucinations via multi-modal inconsistency detection and mitigation for consistent 3D content.

HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach

26 September 2024·2844 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 HSE University

HairFastGAN achieves realistic and robust hairstyle transfer in near real-time using a novel encoder-based approach, significantly outperforming optimization-based methods.

HairDiffusion: Vivid Multi-Colored Hair Editing via Latent Diffusion

26 September 2024·3966 words·19 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Shenzhen University

HairDiffusion uses latent diffusion models and a multi-stage blending technique to achieve vivid, multi-colored hair editing in images, preserving other facial features.

GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes

26 September 2024·2497 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

GVKF: A novel method achieves highly efficient and accurate 3D surface reconstruction in open scenes by integrating fast 3D Gaussian splatting with continuous scene representation using kernel regres…