Computer Vision

ReGS: Reference-based Controllable Scene Stylization with Gaussian Splatting

26 September 2024·1952 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Johns Hopkins University

ReGS: Real-time reference-based 3D scene stylization using Gaussian Splatting for high-fidelity texture editing and free-view navigation.

ReFIR: Grounding Large Restoration Models with Retrieval Augmentation

26 September 2024·3091 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

ReFIR enhances Large Restoration Models’ accuracy by incorporating retrieved images as external knowledge, mitigating hallucination without retraining.

RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance

26 September 2024·4522 words·22 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Georgia Tech

RefDrop: A training-free method enhances image and video generation consistency by directly controlling the influence of reference features on the diffusion process, enabling precise manipulation of c…

ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration

26 September 2024·2816 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 MediaTek

ReF-LDM uses reference images to improve the accuracy of face image restoration, achieving high-quality results faithful to the subject’s true appearance.

Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery

26 September 2024·2697 words·13 mins· loading · loading

Computer Vision Image Segmentation 🏢 Google DeepMind

SynCx, a novel recurrent autoencoder with complex weights, surpasses state-of-the-art models in unsupervised object discovery by iteratively refining phase relationships to achieve robust object bindi…

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

26 September 2024·2658 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 Peking University

RectifID personalizes image generation by cleverly guiding a diffusion model using off-the-shelf classifiers, achieving identity preservation without needing extra training data.

Recovering Complete Actions for Cross-dataset Skeleton Action Recognition

26 September 2024·2959 words·14 mins· loading · loading

Computer Vision Action Recognition 🏢 Tsinghua University

Boost skeleton action recognition accuracy across datasets by recovering complete actions and resampling; outperforms existing methods.

Reconstruction of Manipulated Garment with Guided Deformation Prior

26 September 2024·2931 words·14 mins· loading · loading

Computer Vision 3D Vision 🏢 Computer Vision Lab, EPFL

Researchers developed a novel method for reconstructing the 3D shape of manipulated garments, achieving superior accuracy compared to existing techniques, particularly for complex, non-rigid deformati…

Reconstructing the Image Stitching Pipeline: Integrating Fusion and Rectangling into a Unified Inpainting Model

26 September 2024·2463 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 College of Computer Science and Technology, Tongji University

SRStitcher revolutionizes image stitching by integrating fusion and rectangling into a unified inpainting model, eliminating model training and achieving superior performance and stability.

RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

26 September 2024·2585 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

RealCompo: A novel training-free framework dynamically balances realism and compositionality in text-to-image generation, achieving state-of-the-art results.

Real-time Stereo-based 3D Object Detection for Streaming Perception

26 September 2024·2407 words·12 mins· loading · loading

Computer Vision Object Detection 🏢 Sun Yat-Sen University

StreamDSGN: a real-time stereo 3D object detection framework significantly boosts streaming perception accuracy by leveraging historical information, a feature-flow fusion method, and a motion consist…

Real-time Core-Periphery Guided ViT with Smart Data Layout Selection on Mobile Devices

26 September 2024·1912 words·9 mins· loading · loading

Computer Vision Image Classification 🏢 University of Georgia

ECP-ViT: Real-time Vision Transformer on Mobile Devices via Core-Periphery Attention and Smart Data Layout.

RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees

26 September 2024·2208 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 University of Minnesota

RAW: A novel watermark framework ensures the authenticity of AI-generated images by embedding learnable watermarks directly into the image data, providing provable guarantees even under adversarial at…

RAMP: Boosting Adversarial Robustness Against Multiple $l_p$ Perturbations for Universal Robustness

26 September 2024·3379 words·16 mins· loading · loading

Computer Vision Image Classification 🏢 University of Illinois Urbana-Champaign

RAMP: A novel training framework significantly boosts DNN robustness against diverse adversarial attacks by mitigating accuracy-robustness tradeoffs and improving generalization.

QUEEN: QUantized Efficient ENcoding for Streaming Free-viewpoint Videos

26 September 2024·3903 words·19 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Maryland

QUEEN: A novel framework for quantized and efficient streaming of free-viewpoint videos achieving high compression, quality, and speed.

Quality-Improved and Property-Preserved Polarimetric Imaging via Complementarily Fusing

26 September 2024·1809 words·9 mins· loading · loading

Computer Vision Image Enhancement 🏢 Peking University

This paper introduces a novel three-phase neural network framework that significantly enhances the quality of polarimetric images by complementarily fusing degraded noisy and blurry snapshots, preserv…

QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model

26 September 2024·2714 words·13 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Shanghai Jiao Tong University

QuadMamba: A novel vision model leveraging quadtree-based scanning for superior performance in visual tasks, achieving state-of-the-art results with linear-time complexity.

QT-ViT: Improving Linear Attention in ViT with Quadratic Taylor Expansion

26 September 2024·1611 words·8 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Advanced Micro Devices, Inc.

QT-ViT boosts Vision Transformer efficiency by using quadratic Taylor expansion to approximate self-attention, achieving state-of-the-art accuracy and speed.

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

26 September 2024·3805 words·18 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 ByteDance Inc.

PuLID: Lightning-fast, tuning-free ID customization for text-to-image!

PTQ4DiT: Post-training Quantization for Diffusion Transformers

26 September 2024·2510 words·12 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 University of Illinois Chicago

PTQ4DiT achieves 8-bit and even 4-bit weight precision for Diffusion Transformers, significantly improving efficiency for image generation without sacrificing quality.