Skip to main content

Computer Vision

How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?
·2761 words·13 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Mohamed Bin Zayed University of Artificial Intelligence
Concept-Incremental Flexible Customization (CIFC) model tackles catastrophic forgetting and concept neglect in continually adapting text-to-image diffusion models, enabling flexible personalization.
How Diffusion Models Learn to Factorize and Compose
·3926 words·19 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 MIT
Diffusion models surprisingly learn factorized representations, enabling compositional generalization, but struggle with interpolation; training with independent factors drastically improves data effi…
HOPE: Shape Matching Via Aligning Different K-hop Neighbourhoods
·1940 words·10 mins· loading · loading
Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology
HOPE: a novel shape matching method achieving both accuracy and smoothness by aligning different k-hop neighborhoods and refining maps via local map distortion.
Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models
·2415 words·12 mins· loading · loading
Computer Vision Image Generation 🏢 Qualcomm AI Research
Hollowed Net efficiently personalizes text-to-image diffusion models on-device by temporarily removing deep U-Net layers during training, drastically reducing memory usage without sacrificing performa…
HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness
·4260 words·20 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 University of Texas at Austin
HOI-Swap: a novel diffusion model flawlessly swaps objects in videos while intelligently preserving natural hand interactions, producing high-quality edits.
Historical Test-time Prompt Tuning for Vision Foundation Models
·2286 words·11 mins· loading · loading
Computer Vision Image Segmentation 🏢 Nanyang Technological University
HisTPT: Historical Test-Time Prompt Tuning memorizes past learning, enabling robust online prompt adaptation for vision models, overcoming performance degradation in continuously changing data streams…
High-Resolution Image Harmonization with Adaptive-Interval Color Transformation
·3030 words·15 mins· loading · loading
Computer Vision Image Generation 🏢 Harbin Institute of Technology
AICT: Adaptive-Interval Color Transformation harmonizes high-resolution images by predicting pixel-wise color changes, adaptively adjusting sampling intervals to capture local variations, and using a …
Hierarchical Uncertainty Exploration via Feedforward Posterior Trees
·5486 words·26 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Technion-Israel Institute of Technology
Visualizing high-dimensional posterior distributions is challenging. This paper introduces ‘Posterior Trees,’ a novel method using tree-structured neural network predictions for hierarchical uncertai…
Hierarchical Selective Classification
·2174 words·11 mins· loading · loading
Computer Vision Image Classification 🏢 Technion
Hierarchical Selective Classification (HSC) improves deep learning model reliability for risk-sensitive tasks by leveraging hierarchical class relationships to provide more informative predictions eve…
HiCoM: Hierarchical Coherent Motion for Dynamic Streamable Scenes with 3D Gaussian Splatting
·2356 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Peking University
HiCoM, a novel framework, achieves high-fidelity streamable dynamic scene reconstruction by using a hierarchical coherent motion mechanism and parallel processing to significantly reduce training time…
HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation
·3034 words·15 mins· loading · loading
Computer Vision Image Generation 🏢 360 AI Research
HiCo: Hierarchical Controllable Diffusion Model achieves superior layout-to-image generation by disentangling spatial layouts through a multi-branch network structure, resulting in high-quality images…
HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting
·1800 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Johns Hopkins University
HDR-GS: 1000x faster HDR novel view synthesis via Gaussian splatting!
Harnessing small projectors and multiple views for efficient vision pretraining
·2903 words·14 mins· loading · loading
Computer Vision Self-Supervised Learning 🏢 Mila - Quebec AI Institute & Computer Science, McGill University
Boost self-supervised visual learning: This paper introduces theoretical insights and practical recommendations to significantly improve SSL’s efficiency and reduce data needs.
Harmonizing Stochasticity and Determinism: Scene-responsive Diverse Human Motion Prediction
·2828 words·14 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Zhejiang University
DiMoP3D: Predicting diverse, physically realistic human motions in 3D scenes by harmonizing stochasticity and determinism.
Happy: A Debiased Learning Framework for Continual Generalized Category Discovery
·2362 words·12 mins· loading · loading
Computer Vision Image Classification 🏢 Institute of Automation, Chinese Academy of Sciences
Happy: a novel debiased learning framework, excels at continually discovering new categories from unlabeled data while retaining knowledge of previously learned ones, overcoming existing bias issues a…
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
·3671 words·18 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Carnegie Mellon University
Hamba: a novel graph-guided framework for single-view 3D hand reconstruction, significantly outperforms existing methods by efficiently modeling spatial relationships between joints using a fraction o…
Hallo3D: Multi-Modal Hallucination Detection and Mitigation for Consistent 3D Content Generation
·2871 words·14 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Chinese Academy of Sciences
Hallo3D: a tuning-free method resolving 3D generation hallucinations via multi-modal inconsistency detection and mitigation for consistent 3D content.
HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach
·2844 words·14 mins· loading · loading
Computer Vision Image Generation 🏢 HSE University
HairFastGAN achieves realistic and robust hairstyle transfer in near real-time using a novel encoder-based approach, significantly outperforming optimization-based methods.
HairDiffusion: Vivid Multi-Colored Hair Editing via Latent Diffusion
·3966 words·19 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Shenzhen University
HairDiffusion uses latent diffusion models and a multi-stage blending technique to achieve vivid, multi-colored hair editing in images, preserving other facial features.
GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes
·2497 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology
GVKF: A novel method achieves highly efficient and accurate 3D surface reconstruction in open scenes by integrating fast 3D Gaussian splatting with continuous scene representation using kernel regres…