Computer Vision
ReGS: Reference-based Controllable Scene Stylization with Gaussian Splatting
·1952 words·10 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Johns Hopkins University
ReGS: Real-time reference-based 3D scene stylization using Gaussian Splatting for high-fidelity texture editing and free-view navigation.
ReFIR: Grounding Large Restoration Models with Retrieval Augmentation
·3091 words·15 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Tsinghua University
ReFIR enhances Large Restoration Models’ accuracy by incorporating retrieved images as external knowledge, mitigating hallucination without retraining.
RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance
·4522 words·22 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 Georgia Tech
RefDrop: A training-free method enhances image and video generation consistency by directly controlling the influence of reference features on the diffusion process, enabling precise manipulation of c…
ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration
·2816 words·14 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 MediaTek
ReF-LDM uses reference images to improve the accuracy of face image restoration, achieving high-quality results faithful to the subject’s true appearance.
Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery
·2697 words·13 mins·
loading
·
loading
Computer Vision
Image Segmentation
🏢 Google DeepMind
SynCx, a novel recurrent autoencoder with complex weights, surpasses state-of-the-art models in unsupervised object discovery by iteratively refining phase relationships to achieve robust object bindi…
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
·2658 words·13 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Peking University
RectifID personalizes image generation by cleverly guiding a diffusion model using off-the-shelf classifiers, achieving identity preservation without needing extra training data.
Recovering Complete Actions for Cross-dataset Skeleton Action Recognition
·2959 words·14 mins·
loading
·
loading
Computer Vision
Action Recognition
🏢 Tsinghua University
Boost skeleton action recognition accuracy across datasets by recovering complete actions and resampling; outperforms existing methods.
Reconstruction of Manipulated Garment with Guided Deformation Prior
·2931 words·14 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Computer Vision Lab, EPFL
Researchers developed a novel method for reconstructing the 3D shape of manipulated garments, achieving superior accuracy compared to existing techniques, particularly for complex, non-rigid deformati…
Reconstructing the Image Stitching Pipeline: Integrating Fusion and Rectangling into a Unified Inpainting Model
·2463 words·12 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 College of Computer Science and Technology, Tongji University
SRStitcher revolutionizes image stitching by integrating fusion and rectangling into a unified inpainting model, eliminating model training and achieving superior performance and stability.
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
·2585 words·13 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Tsinghua University
RealCompo: A novel training-free framework dynamically balances realism and compositionality in text-to-image generation, achieving state-of-the-art results.
Real-time Stereo-based 3D Object Detection for Streaming Perception
·2407 words·12 mins·
loading
·
loading
Computer Vision
Object Detection
🏢 Sun Yat-Sen University
StreamDSGN: a real-time stereo 3D object detection framework significantly boosts streaming perception accuracy by leveraging historical information, a feature-flow fusion method, and a motion consist…
Real-time Core-Periphery Guided ViT with Smart Data Layout Selection on Mobile Devices
·1912 words·9 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 University of Georgia
ECP-ViT: Real-time Vision Transformer on Mobile Devices via Core-Periphery Attention and Smart Data Layout.
RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees
·2208 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 University of Minnesota
RAW: A novel watermark framework ensures the authenticity of AI-generated images by embedding learnable watermarks directly into the image data, providing provable guarantees even under adversarial at…
RAMP: Boosting Adversarial Robustness Against Multiple $l_p$ Perturbations for Universal Robustness
·3379 words·16 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 University of Illinois Urbana-Champaign
RAMP: A novel training framework significantly boosts DNN robustness against diverse adversarial attacks by mitigating accuracy-robustness tradeoffs and improving generalization.
QUEEN: QUantized Efficient ENcoding for Streaming Free-viewpoint Videos
·3903 words·19 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 University of Maryland
QUEEN: A novel framework for quantized and efficient streaming of free-viewpoint videos achieving high compression, quality, and speed.
Quality-Improved and Property-Preserved Polarimetric Imaging via Complementarily Fusing
·1809 words·9 mins·
loading
·
loading
Computer Vision
Image Enhancement
🏢 Peking University
This paper introduces a novel three-phase neural network framework that significantly enhances the quality of polarimetric images by complementarily fusing degraded noisy and blurry snapshots, preserv…
QuadMamba: Learning Quadtree-based Selective Scan for Visual State Space Model
·2714 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 Shanghai Jiao Tong University
QuadMamba: A novel vision model leveraging quadtree-based scanning for superior performance in visual tasks, achieving state-of-the-art results with linear-time complexity.
QT-ViT: Improving Linear Attention in ViT with Quadratic Taylor Expansion
·1611 words·8 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 Advanced Micro Devices, Inc.
QT-ViT boosts Vision Transformer efficiency by using quadratic Taylor expansion to approximate self-attention, achieving state-of-the-art accuracy and speed.
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
·3805 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 ByteDance Inc.
PuLID: Lightning-fast, tuning-free ID customization for text-to-image!
PTQ4DiT: Post-training Quantization for Diffusion Transformers
·2510 words·12 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 University of Illinois Chicago
PTQ4DiT achieves 8-bit and even 4-bit weight precision for Diffusion Transformers, significantly improving efficiency for image generation without sacrificing quality.