Skip to main content

Computer Vision

Taming Generative Diffusion Prior for Universal Blind Image Restoration
·4450 words·21 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 Fudan University
BIR-D tames generative diffusion models for universal blind image restoration, dynamically updating parameters to handle various complex degradations without assuming degradation model types.
Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
·2142 words·11 mins· loading · loading
Computer Vision Image Generation 🏒 Advanced Micro Devices Inc.
DoSSR: A novel SR model boosts efficiency by 5-7x, achieving state-of-the-art performance with only 5 sampling steps by cleverly integrating a domain shift equation into pretrained diffusion models.
SyncVIS: Synchronized Video Instance Segmentation
·2160 words·11 mins· loading · loading
Computer Vision Video Understanding 🏒 University of Hong Kong
SyncVIS: A new framework for video instance segmentation achieves state-of-the-art results by synchronously modeling video and frame-level information, overcoming limitations of asynchronous approache…
SyncTweedies: A General Generative Framework Based on Synchronized Diffusions
·4065 words·20 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 KAIST
SyncTweedies: a zero-shot diffusion synchronization framework generates diverse visual content (images, panoramas, 3D textures) by synchronizing multiple diffusion processes without fine-tuning, demon…
Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques
·3213 words·16 mins· loading · loading
Computer Vision Image Generation 🏒 Institute of Information Engineering, CAS
Boosting diffusion model features: This paper introduces GATE, a novel method to suppress ‘content shift’ in diffusion features, improving their quality via off-the-shelf generation techniques.
SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition
·3456 words·17 mins· loading · loading
AI Generated Computer Vision Visual Place Recognition 🏒 Tsinghua University
SuperVLAD: A new visual place recognition method boasts superior robustness and compactness, outperforming state-of-the-art techniques by significantly reducing parameters and dimensions.
Subsurface Scattering for Gaussian Splatting
·2275 words·11 mins· loading · loading
Computer Vision 3D Vision 🏒 University of Tübingen
Real-time rendering of objects with subsurface scattering effects is now possible with SSS-GS, a novel method combining explicit surface geometry and implicit subsurface scattering for high-quality no…
Structured Unrestricted-Rank Matrices for Parameter Efficient Finetuning
·3674 words·18 mins· loading · loading
AI Generated Computer Vision Image Classification 🏒 Google Research
Structured Unrestricted-Rank Matrices (SURMs) revolutionize parameter-efficient fine-tuning by offering greater flexibility and accuracy than existing methods like LoRA, achieving significant gains in…
StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences
·2803 words·14 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏒 Peking University
StreamFlow accelerates video optical flow estimation by 44% via a streamlined in-batch multi-frame pipeline and innovative spatiotemporal modeling, achieving state-of-the-art results.
STONE: A Submodular Optimization Framework for Active 3D Object Detection
·2151 words·11 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏒 University of Texas at Dallas
STONE: A novel submodular optimization framework drastically cuts 3D object detection training costs by cleverly selecting the most informative LiDAR point cloud data for labeling, achieving state-of-…
StepbaQ: Stepping backward as Correction for Quantized Diffusion Models
·2381 words·12 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 MediaTek
StepbaQ enhances quantized diffusion models by correcting accumulated quantization errors via a novel sampling step correction mechanism, significantly improving model accuracy without modifying exist…
START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation
·2428 words·12 mins· loading · loading
Computer Vision Domain Generalization 🏒 Nanjing University
START, a novel SSM-based architecture with saliency-driven token-aware transformation, achieves state-of-the-art domain generalization performance with efficient linear complexity.
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation
·3451 words·17 mins· loading · loading
Computer Vision Image Generation 🏒 Munich Center for Machine Learning
Stable-Pose: Precise human pose guidance for text-to-image synthesis.
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
·2011 words·10 mins· loading · loading
Computer Vision Image Generation 🏒 School of Data Science, University of Science and Technology of China
DiGIT stabilizes image autoregressive models’ latent space using a novel discrete tokenizer from self-supervised learning, achieving state-of-the-art image generation.
Stability and Generalizability in SDE Diffusion Models with Measure-Preserving Dynamics
·2499 words·12 mins· loading · loading
Computer Vision Image Generation 🏒 University of Oxford
DΒ³GM, a novel score-based diffusion model, enhances stability & generalizability in solving inverse problems by leveraging measure-preserving dynamics, enabling robust image reconstruction across dive…
SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening
·2088 words·10 mins· loading · loading
Computer Vision Image Generation 🏒 University of Electronic Science and Technology of China
SSDiff: A novel spatial-spectral integrated diffusion model for superior remote sensing pansharpening.
SSA-Seg: Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation
·2332 words·11 mins· loading · loading
Computer Vision Image Segmentation 🏒 Huawei Noah's Ark Lab Zhejiang University
SSA-Seg improves semantic segmentation by adapting pixel-level classifiers to the test image’s semantic and spatial features, achieving state-of-the-art performance with minimal extra computational co…
SplitNeRF: Split Sum Approximation Neural Field for Joint Geometry, Illumination, and Material Estimation
·5201 words·25 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏒 King Abdullah University of Science and Technology
SplitNeRF: One-hour training on a single GPU yields state-of-the-art scene geometry, lighting, and material property estimation!
Splatter a Video: Video Gaussian Representation for Versatile Processing
·2610 words·13 mins· loading · loading
Computer Vision Video Understanding 🏒 University of Hong Kong
Researchers introduce Video Gaussian Representation (VGR) for versatile video processing, embedding videos into explicit 3D Gaussians for intuitive motion and appearance modeling.
Spiking Transformer with Experts Mixture
·2017 words·10 mins· loading · loading
Computer Vision Image Classification 🏒 Peking University
Spiking Experts Mixture Mechanism (SEMM) boosts Spiking Transformers by integrating Mixture-of-Experts for efficient, sparse conditional computation, achieving significant performance improvements on …