Computer Vision

Spiking Neural Network as Adaptive Event Stream Slicer

26 September 2024·2956 words·14 mins· loading · loading

Computer Vision Object Detection 🏢 Hong Kong University of Science and Technology

SpikeSlicer: An adaptive event stream slicer using a spiking neural network (SNN) to efficiently split events for improved downstream processing in object tracking and recognition.

Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation

26 September 2024·2477 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Shanghai Jiao Tong University

SFCNet, a novel spherical frustum sparse convolution network, tackles LiDAR point cloud semantic segmentation by eliminating quantized information loss, leading to superior performance, especially for…

SpelsNet: Surface Primitive Elements Segmentation by B-Rep Graph Structure Supervision

26 September 2024·1917 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Luxembourg

SpelsNet, a novel neural architecture, achieves accurate 3D point cloud segmentation into surface primitives by incorporating B-Rep graph structure supervision, leading to topologically consistent res…

SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection

26 September 2024·2151 words·11 mins· loading · loading

Computer Vision Face Recognition 🏢 Institute of Information Engineering, Chinese Academy of Sciences

SpeechForensics leverages audio-visual speech representation learning to achieve superior face forgery detection, outperforming state-of-the-art methods in cross-dataset generalization and robustness.

Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting

26 September 2024·3727 words·18 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Zhejiang University

Spec-Gaussian enhances 3D Gaussian splatting by using anisotropic spherical Gaussians for view-dependent appearance modeling, achieving superior real-time rendering of scenes with specular and anisotr…

Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras

26 September 2024·2395 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Peking University

STIR: A novel spatio-temporal network reconstructs high-quality images from spiking camera data by jointly refining motion and intensity information for efficient and accurate high-speed imaging.

Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis

26 September 2024·2245 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Carnegie Mellon University

SparseAGS: High-fidelity 3D reconstruction & camera pose estimation from sparse views via generative synthesis.

SOI: Scaling Down Computational Complexity by Estimating Partial States of the Model

26 September 2024·2817 words·14 mins· loading · loading

Computer Vision Action Recognition 🏢 Samsung AI Center Warsaw

Scattered Online Inference (SOI) drastically cuts down ANN computational complexity by leveraging data continuity and prediction seasonality, enabling faster real-time inference on low-power devices.

Soft Tensor Product Representations for Fully Continuous, Compositional Visual Representations

26 September 2024·8738 words·42 mins· loading · loading

Computer Vision Representation Learning 🏢 UNSW, Sydney

Soft Tensor Product Representations (Soft TPRs) revolutionize compositional visual representation learning by seamlessly blending continuous vector spaces and compositional structures, leading to supe…

Soft Superpixel Neighborhood Attention

26 September 2024·3657 words·18 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 Purdue University

Soft Superpixel Neighborhood Attention (SNA) optimally denoises images by incorporating superpixel probabilities into an attention module, outperforming traditional methods.

Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention

26 September 2024·2786 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 University of Washington

Smoothed Energy Guidance (SEG) improves unconditional image generation by reducing self-attention’s energy curvature, leading to higher-quality outputs with fewer artifacts.

Slot State Space Models

26 September 2024·2613 words·13 mins· loading · loading

Computer Vision Video Understanding 🏢 Rutgers University

SlotSSMs: a novel framework for modular sequence modeling, achieving significant performance gains by incorporating independent mechanisms and sparse interactions into State Space Models.

SlimSAM: 0.1% Data Makes Segment Anything Slim

26 September 2024·2447 words·12 mins· loading · loading

Computer Vision Image Segmentation 🏢 National University of Singapore

SlimSAM achieves near original SAM performance using 0.1% of its training data by employing a novel alternate slimming framework and disturbed Taylor pruning, significantly advancing data-efficient mo…

Slicing Vision Transformer for Flexibile Inference

26 September 2024·2922 words·14 mins· loading · loading

Computer Vision Image Classification 🏢 Snap Inc.

Scala: One-shot training enables flexible ViT inference!

Single Image Reflection Separation via Dual-Stream Interactive Transformers

26 September 2024·2158 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 College of Intelligence and Computing, Tianjin University

Dual-Stream Interactive Transformers (DSIT) revolutionizes single image reflection separation by using a novel dual-attention mechanism that captures inter- and intra-layer correlations, significantly…

Simple and Fast Distillation of Diffusion Models

26 September 2024·3151 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Zhejiang University

Simple and Fast Distillation (SFD) drastically accelerates diffusion model training by 1000x, achieving state-of-the-art results in few-step image generation with minimal fine-tuning.

ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling

26 September 2024·2221 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

ShowMaker: Generating high-fidelity 2D human conversational videos using fine-grained diffusion modeling and 2D key points.

SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models

26 September 2024·3049 words·15 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 DAMO Academy, Alibaba Group

SHMT: Self-supervised Hierarchical Makeup Transfer uses latent diffusion models to realistically and precisely apply diverse makeup styles to faces, even without paired training data, achieving high f…

Sharing Key Semantics in Transformer Makes Efficient Image Restoration

26 September 2024·3184 words·15 mins· loading · loading

Computer Vision Image Restoration 🏢 Peking University

SemanIR boosts image restoration efficiency by cleverly sharing key semantic information within Transformer layers, achieving state-of-the-art results across multiple tasks.

SfPUEL: Shape from Polarization under Unknown Environment Light

26 September 2024·2725 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 Peking University

SfPUEL: A novel end-to-end SfP method achieves robust single-shot surface normal estimation under diverse lighting, integrating PS priors and material segmentation.