Computer Vision

Improving the Training of Rectified Flows

26 September 2024·4681 words·22 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Carnegie Mellon University

Researchers significantly boosted the efficiency and quality of rectified flow, a method for generating samples from diffusion models, by introducing novel training techniques that surpass state-of-th…

Improving the Learning Capability of Small-size Image Restoration Network by Deep Fourier Shifting

26 September 2024·1825 words·9 mins· loading · loading

Computer Vision Image Restoration 🏢 AIRI

Deep Fourier Shifting boosts small image restoration networks by using an information-lossless Fourier cycling shift operator, improving performance across various low-level tasks while reducing compu…

Improving Robustness of 3D Point Cloud Recognition from a Fourier Perspective

26 September 2024·2312 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Chinese Academy of Sciences

Boosting 3D point cloud recognition robustness, Frequency Adversarial Training (FAT) leverages frequency-domain adversarial examples to improve model resilience against corruptions, achieving state-of…

ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images

26 September 2024·2938 words·14 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

ImOV3D: Revolutionizing open-vocabulary 3D object detection by learning from 2D images alone!

Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

26 September 2024·3506 words·17 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 UC Berkeley

Immiscible Diffusion boosts diffusion model training efficiency up to 3x by cleverly assigning noise to images, preventing the mixing of data in noise space and thus improving optimization.

IMAGPose: A Unified Conditional Framework for Pose-Guided Person Generation

26 September 2024·2991 words·15 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Nanjing University of Science and Technology

IMAGPose: A unified framework generating high-fidelity person images from single or multiple source images & poses, addressing existing methods’ limitations.

Image Understanding Makes for A Good Tokenizer for Image Generation

26 September 2024·2230 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 ByteDance

Leveraging image understanding models for image tokenizer training dramatically boosts image generation quality, achieving state-of-the-art results.

Image Reconstruction Via Autoencoding Sequential Deep Image Prior

26 September 2024·2498 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 University of Michigan

aSeqDIP: A new unsupervised image reconstruction method using sequential deep image priors, achieving competitive performance with fewer data needs and faster runtimes.

Image Copy Detection for Diffusion Models

26 September 2024·3883 words·19 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 University of Technology Sydney

ICDiff, a novel Image Copy Detection system, tackles the unique challenge of identifying replicated content in diffusion model outputs, introducing a specialized dataset and deep embedding method for …

IllumiNeRF: 3D Relighting Without Inverse Rendering

26 September 2024·2411 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Google Research

IllumiNeRF: Relightable 3D reconstruction without inverse rendering using image diffusion and NeRF.

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

26 September 2024·3521 words·17 mins· loading · loading

Computer Vision Image Generation 🏢 KAIST

MuDI: a novel framework for multi-subject image personalization, effectively decoupling identities to prevent mixing using segmented subjects and a new evaluation metric.

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

26 September 2024·3058 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

Researchers solve the conditional image leakage problem in image-to-video diffusion models by proposing a new inference strategy and a time-dependent noise distribution for training. This yields video…

ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling

26 September 2024·1848 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Imperial College London

ID-to-3D: Generate expressive, identity-consistent 3D human heads from just a few in-the-wild images using score distillation sampling and 2D diffusion models.

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

26 September 2024·2714 words·13 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 ByteDance

Hyper-SD boosts diffusion model speed by using trajectory segmented consistency distillation and human feedback, achieving state-of-the-art performance.

HydraViT: Stacking Heads for a Scalable ViT

26 September 2024·2612 words·13 mins· loading · loading

Computer Vision Image Classification 🏢 Kiel University

HydraViT: Stacking attention heads creates a scalable Vision Transformer, adapting to diverse hardware by dynamically selecting subnetworks during inference, improving accuracy and efficiency.

Hybrid Mamba for Few-Shot Segmentation

26 September 2024·2385 words·12 mins· loading · loading

Computer Vision Image Segmentation 🏢 Nanyang Technological University

Hybrid Mamba Network (HMNet) boosts few-shot segmentation accuracy by efficiently fusing support and query features using a novel hybrid Mamba architecture, significantly outperforming current state-o…

HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

26 September 2024·2066 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 ByteDance

HumanSplat: single image-based 3D human reconstruction using Gaussian Splatting with structural priors, achieving state-of-the-art quality and speed.

Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models

26 September 2024·2348 words·12 mins· loading · loading

AI Generated Computer Vision Scene Understanding 🏢 ReLER, AAII, University of Technology Sydney

DIFFUSIONHOI: A novel HOI detector using text-to-image diffusion models to improve compositional reasoning and handling of novel concepts, achieving state-of-the-art performance.

Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

26 September 2024·4637 words·22 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Tübingen

Human-3Diffusion generates realistic 3D avatars from single RGB images using coupled 2D multi-view and 3D consistent diffusion models, achieving high-fidelity geometry and texture.

How to Use Diffusion Priors under Sparse Views?

26 September 2024·2930 words·14 mins· loading · loading

Computer Vision 3D Vision 🏢 Beihang University

Inline Prior Guided Score Matching (IPSM) improves sparse-view 3D reconstruction by leveraging visual inline priors from pose relationships to rectify rendered image distribution and effectively guide…