Computer Vision
Improving the Training of Rectified Flows
·4681 words·22 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ Carnegie Mellon University
Researchers significantly boosted the efficiency and quality of rectified flow, a method for generating samples from diffusion models, by introducing novel training techniques that surpass state-of-th…
Improving the Learning Capability of Small-size Image Restoration Network by Deep Fourier Shifting
·1825 words·9 mins·
loading
·
loading
Computer Vision
Image Restoration
π’ AIRI
Deep Fourier Shifting boosts small image restoration networks by using an information-lossless Fourier cycling shift operator, improving performance across various low-level tasks while reducing compu…
Improving Robustness of 3D Point Cloud Recognition from a Fourier Perspective
·2312 words·11 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Chinese Academy of Sciences
Boosting 3D point cloud recognition robustness, Frequency Adversarial Training (FAT) leverages frequency-domain adversarial examples to improve model resilience against corruptions, achieving state-of…
ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images
·2938 words·14 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Tsinghua University
ImOV3D: Revolutionizing open-vocabulary 3D object detection by learning from 2D images alone!
Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment
·3506 words·17 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ UC Berkeley
Immiscible Diffusion boosts diffusion model training efficiency up to 3x by cleverly assigning noise to images, preventing the mixing of data in noise space and thus improving optimization.
IMAGPose: A Unified Conditional Framework for Pose-Guided Person Generation
·2991 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ Nanjing University of Science and Technology
IMAGPose: A unified framework generating high-fidelity person images from single or multiple source images & poses, addressing existing methods’ limitations.
Image Understanding Makes for A Good Tokenizer for Image Generation
·2230 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
π’ ByteDance
Leveraging image understanding models for image tokenizer training dramatically boosts image generation quality, achieving state-of-the-art results.
Image Reconstruction Via Autoencoding Sequential Deep Image Prior
·2498 words·12 mins·
loading
·
loading
Computer Vision
Image Generation
π’ University of Michigan
aSeqDIP: A new unsupervised image reconstruction method using sequential deep image priors, achieving competitive performance with fewer data needs and faster runtimes.
Image Copy Detection for Diffusion Models
·3883 words·19 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ University of Technology Sydney
ICDiff, a novel Image Copy Detection system, tackles the unique challenge of identifying replicated content in diffusion model outputs, introducing a specialized dataset and deep embedding method for …
IllumiNeRF: 3D Relighting Without Inverse Rendering
·2411 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Google Research
IllumiNeRF: Relightable 3D reconstruction without inverse rendering using image diffusion and NeRF.
Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
·3521 words·17 mins·
loading
·
loading
Computer Vision
Image Generation
π’ KAIST
MuDI: a novel framework for multi-subject image personalization, effectively decoupling identities to prevent mixing using segmented subjects and a new evaluation metric.
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
·3058 words·15 mins·
loading
·
loading
Computer Vision
Image Generation
π’ Tsinghua University
Researchers solve the conditional image leakage problem in image-to-video diffusion models by proposing a new inference strategy and a time-dependent noise distribution for training. This yields video…
ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling
·1848 words·9 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Imperial College London
ID-to-3D: Generate expressive, identity-consistent 3D human heads from just a few in-the-wild images using score distillation sampling and 2D diffusion models.
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
·2714 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ ByteDance
Hyper-SD boosts diffusion model speed by using trajectory segmented consistency distillation and human feedback, achieving state-of-the-art performance.
HydraViT: Stacking Heads for a Scalable ViT
·2612 words·13 mins·
loading
·
loading
Computer Vision
Image Classification
π’ Kiel University
HydraViT: Stacking attention heads creates a scalable Vision Transformer, adapting to diverse hardware by dynamically selecting subnetworks during inference, improving accuracy and efficiency.
Hybrid Mamba for Few-Shot Segmentation
·2385 words·12 mins·
loading
·
loading
Computer Vision
Image Segmentation
π’ Nanyang Technological University
Hybrid Mamba Network (HMNet) boosts few-shot segmentation accuracy by efficiently fusing support and query features using a novel hybrid Mamba architecture, significantly outperforming current state-o…
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
·2066 words·10 mins·
loading
·
loading
Computer Vision
3D Vision
π’ ByteDance
HumanSplat: single image-based 3D human reconstruction using Gaussian Splatting with structural priors, achieving state-of-the-art quality and speed.
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models
·2348 words·12 mins·
loading
·
loading
AI Generated
Computer Vision
Scene Understanding
π’ ReLER, AAII, University of Technology Sydney
DIFFUSIONHOI: A novel HOI detector using text-to-image diffusion models to improve compositional reasoning and handling of novel concepts, achieving state-of-the-art performance.
Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models
·4637 words·22 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
π’ University of TΓΌbingen
Human-3Diffusion generates realistic 3D avatars from single RGB images using coupled 2D multi-view and 3D consistent diffusion models, achieving high-fidelity geometry and texture.
How to Use Diffusion Priors under Sparse Views?
·2930 words·14 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Beihang University
Inline Prior Guided Score Matching (IPSM) improves sparse-view 3D reconstruction by leveraging visual inline priors from pose relationships to rectify rendered image distribution and effectively guide…