Skip to main content

Computer Vision

Improving the Training of Rectified Flows
·4681 words·22 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 Carnegie Mellon University
Researchers significantly boosted the efficiency and quality of rectified flow, a method for generating samples from diffusion models, by introducing novel training techniques that surpass state-of-th…
Improving the Learning Capability of Small-size Image Restoration Network by Deep Fourier Shifting
·1825 words·9 mins· loading · loading
Computer Vision Image Restoration 🏒 AIRI
Deep Fourier Shifting boosts small image restoration networks by using an information-lossless Fourier cycling shift operator, improving performance across various low-level tasks while reducing compu…
Improving Robustness of 3D Point Cloud Recognition from a Fourier Perspective
·2312 words·11 mins· loading · loading
Computer Vision 3D Vision 🏒 Chinese Academy of Sciences
Boosting 3D point cloud recognition robustness, Frequency Adversarial Training (FAT) leverages frequency-domain adversarial examples to improve model resilience against corruptions, achieving state-of…
ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images
·2938 words·14 mins· loading · loading
Computer Vision 3D Vision 🏒 Tsinghua University
ImOV3D: Revolutionizing open-vocabulary 3D object detection by learning from 2D images alone!
Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment
·3506 words·17 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 UC Berkeley
Immiscible Diffusion boosts diffusion model training efficiency up to 3x by cleverly assigning noise to images, preventing the mixing of data in noise space and thus improving optimization.
IMAGPose: A Unified Conditional Framework for Pose-Guided Person Generation
·2991 words·15 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 Nanjing University of Science and Technology
IMAGPose: A unified framework generating high-fidelity person images from single or multiple source images & poses, addressing existing methods’ limitations.
Image Understanding Makes for A Good Tokenizer for Image Generation
·2230 words·11 mins· loading · loading
Computer Vision Image Generation 🏒 ByteDance
Leveraging image understanding models for image tokenizer training dramatically boosts image generation quality, achieving state-of-the-art results.
Image Reconstruction Via Autoencoding Sequential Deep Image Prior
·2498 words·12 mins· loading · loading
Computer Vision Image Generation 🏒 University of Michigan
aSeqDIP: A new unsupervised image reconstruction method using sequential deep image priors, achieving competitive performance with fewer data needs and faster runtimes.
Image Copy Detection for Diffusion Models
·3883 words·19 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 University of Technology Sydney
ICDiff, a novel Image Copy Detection system, tackles the unique challenge of identifying replicated content in diffusion model outputs, introducing a specialized dataset and deep embedding method for …
IllumiNeRF: 3D Relighting Without Inverse Rendering
·2411 words·12 mins· loading · loading
Computer Vision 3D Vision 🏒 Google Research
IllumiNeRF: Relightable 3D reconstruction without inverse rendering using image diffusion and NeRF.
Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
·3521 words·17 mins· loading · loading
Computer Vision Image Generation 🏒 KAIST
MuDI: a novel framework for multi-subject image personalization, effectively decoupling identities to prevent mixing using segmented subjects and a new evaluation metric.
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
·3058 words·15 mins· loading · loading
Computer Vision Image Generation 🏒 Tsinghua University
Researchers solve the conditional image leakage problem in image-to-video diffusion models by proposing a new inference strategy and a time-dependent noise distribution for training. This yields video…
ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling
·1848 words·9 mins· loading · loading
Computer Vision 3D Vision 🏒 Imperial College London
ID-to-3D: Generate expressive, identity-consistent 3D human heads from just a few in-the-wild images using score distillation sampling and 2D diffusion models.
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
·2714 words·13 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 ByteDance
Hyper-SD boosts diffusion model speed by using trajectory segmented consistency distillation and human feedback, achieving state-of-the-art performance.
HydraViT: Stacking Heads for a Scalable ViT
·2612 words·13 mins· loading · loading
Computer Vision Image Classification 🏒 Kiel University
HydraViT: Stacking attention heads creates a scalable Vision Transformer, adapting to diverse hardware by dynamically selecting subnetworks during inference, improving accuracy and efficiency.
Hybrid Mamba for Few-Shot Segmentation
·2385 words·12 mins· loading · loading
Computer Vision Image Segmentation 🏒 Nanyang Technological University
Hybrid Mamba Network (HMNet) boosts few-shot segmentation accuracy by efficiently fusing support and query features using a novel hybrid Mamba architecture, significantly outperforming current state-o…
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
·2066 words·10 mins· loading · loading
Computer Vision 3D Vision 🏒 ByteDance
HumanSplat: single image-based 3D human reconstruction using Gaussian Splatting with structural priors, achieving state-of-the-art quality and speed.
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models
·2348 words·12 mins· loading · loading
AI Generated Computer Vision Scene Understanding 🏒 ReLER, AAII, University of Technology Sydney
DIFFUSIONHOI: A novel HOI detector using text-to-image diffusion models to improve compositional reasoning and handling of novel concepts, achieving state-of-the-art performance.
Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models
·4637 words·22 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏒 University of Tübingen
Human-3Diffusion generates realistic 3D avatars from single RGB images using coupled 2D multi-view and 3D consistent diffusion models, achieving high-fidelity geometry and texture.
How to Use Diffusion Priors under Sparse Views?
·2930 words·14 mins· loading · loading
Computer Vision 3D Vision 🏒 Beihang University
Inline Prior Guided Score Matching (IPSM) improves sparse-view 3D reconstruction by leveraging visual inline priors from pose relationships to rectify rendered image distribution and effectively guide…