Skip to main content

Image Generation

TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
·2847 words·14 mins· loading · loading
Computer Vision Image Generation 🏢 Korea Advanced Institute of Science and Technology (KAIST)
Boosting diffusion-based human image animation, Test-time Procrustes Calibration (TPC) ensures high-quality outputs by aligning reference and target images, overcoming common compositional misalignmen…
TinyLUT: Tiny Look-Up Table for Efficient Image Restoration at the Edge
·1979 words·10 mins· loading · loading
Computer Vision Image Generation 🏢 School of Integrated Circuits, Xidian University
TinyLUT achieves 10x lower memory consumption and superior accuracy in image restoration on edge devices using innovative separable mapping and dynamic discretization of LUTs.
Time-Varying LoRA: Towards Effective Cross-Domain Fine-Tuning of Diffusion Models
·3031 words·15 mins· loading · loading
Computer Vision Image Generation 🏢 Southern University of Science and Technology
Terra, a novel time-varying low-rank adapter, enables effective cross-domain fine-tuning of diffusion models by creating a continuous parameter manifold, facilitating efficient knowledge sharing and g…
The GAN is dead; long live the GAN! A Modern GAN Baseline
·3072 words·15 mins· loading · loading
Computer Vision Image Generation 🏢 Brown University
R3GAN, a minimalist GAN baseline, surpasses state-of-the-art models by using a novel regularized relativistic GAN loss and modern architectures, proving GANs can be trained efficiently without relying…
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
·2216 words·11 mins· loading · loading
Image Generation 🏢 Institute of Information Engineering, Chinese Academy of Sciences
TextCtrl: a novel diffusion-based scene text editing method using prior guidance control, achieving superior style fidelity and accuracy with a new real-world benchmark dataset, ScenePair.
Taming Generative Diffusion Prior for Universal Blind Image Restoration
·4450 words·21 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Fudan University
BIR-D tames generative diffusion models for universal blind image restoration, dynamically updating parameters to handle various complex degradations without assuming degradation model types.
Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
·2142 words·11 mins· loading · loading
Computer Vision Image Generation 🏢 Advanced Micro Devices Inc.
DoSSR: A novel SR model boosts efficiency by 5-7x, achieving state-of-the-art performance with only 5 sampling steps by cleverly integrating a domain shift equation into pretrained diffusion models.
SyncTweedies: A General Generative Framework Based on Synchronized Diffusions
·4065 words·20 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 KAIST
SyncTweedies: a zero-shot diffusion synchronization framework generates diverse visual content (images, panoramas, 3D textures) by synchronizing multiple diffusion processes without fine-tuning, demon…
Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques
·3213 words·16 mins· loading · loading
Computer Vision Image Generation 🏢 Institute of Information Engineering, CAS
Boosting diffusion model features: This paper introduces GATE, a novel method to suppress ‘content shift’ in diffusion features, improving their quality via off-the-shelf generation techniques.
Stylus: Automatic Adapter Selection for Diffusion Models
·3022 words·15 mins· loading · loading
Image Generation 🏢 University of California, Berkeley
Stylus: an automatic adapter selection system for diffusion models, boosts image quality and diversity by intelligently composing task-specific adapters based on prompt keywords.
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
·2393 words·12 mins· loading · loading
Image Generation 🏢 Nankai University
StoryDiffusion enhances long-range image & video generation by introducing a simple yet effective self-attention mechanism and a semantic motion predictor, achieving high content consistency without t…
StepbaQ: Stepping backward as Correction for Quantized Diffusion Models
·2381 words·12 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 MediaTek
StepbaQ enhances quantized diffusion models by correcting accumulated quantization errors via a novel sampling step correction mechanism, significantly improving model accuracy without modifying exist…
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation
·3451 words·17 mins· loading · loading
Computer Vision Image Generation 🏢 Munich Center for Machine Learning
Stable-Pose: Precise human pose guidance for text-to-image synthesis.
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
·2011 words·10 mins· loading · loading
Computer Vision Image Generation 🏢 School of Data Science, University of Science and Technology of China
DiGIT stabilizes image autoregressive models’ latent space using a novel discrete tokenizer from self-supervised learning, achieving state-of-the-art image generation.
Stability and Generalizability in SDE Diffusion Models with Measure-Preserving Dynamics
·2499 words·12 mins· loading · loading
Computer Vision Image Generation 🏢 University of Oxford
D³GM, a novel score-based diffusion model, enhances stability & generalizability in solving inverse problems by leveraging measure-preserving dynamics, enabling robust image reconstruction across dive…
SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening
·2088 words·10 mins· loading · loading
Computer Vision Image Generation 🏢 University of Electronic Science and Technology of China
SSDiff: A novel spatial-spectral integrated diffusion model for superior remote sensing pansharpening.
SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams
·2631 words·13 mins· loading · loading
Image Generation 🏢 Peking University
SpikeReveal: Self-supervised learning unlocks sharp video sequences from blurry, real-world spike camera data, overcoming limitations of prior supervised approaches.
Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras
·2395 words·12 mins· loading · loading
Computer Vision Image Generation 🏢 Peking University
STIR: A novel spatio-temporal network reconstructs high-quality images from spiking camera data by jointly refining motion and intensity information for efficient and accurate high-speed imaging.
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention
·2786 words·14 mins· loading · loading
Computer Vision Image Generation 🏢 University of Washington
Smoothed Energy Guidance (SEG) improves unconditional image generation by reducing self-attention’s energy curvature, leading to higher-quality outputs with fewer artifacts.
Slight Corruption in Pre-training Data Makes Better Diffusion Models
·4250 words·20 mins· loading · loading
Image Generation 🏢 Carnegie Mellon University
Slightly corrupting pre-training data significantly improves diffusion models’ image generation quality, diversity, and fidelity.