Computer Vision
Spiking Neural Network as Adaptive Event Stream Slicer
·2956 words·14 mins·
loading
·
loading
Computer Vision
Object Detection
🏢 Hong Kong University of Science and Technology
SpikeSlicer: An adaptive event stream slicer using a spiking neural network (SNN) to efficiently split events for improved downstream processing in object tracking and recognition.
Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation
·2477 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Shanghai Jiao Tong University
SFCNet, a novel spherical frustum sparse convolution network, tackles LiDAR point cloud semantic segmentation by eliminating quantized information loss, leading to superior performance, especially for…
SpelsNet: Surface Primitive Elements Segmentation by B-Rep Graph Structure Supervision
·1917 words·9 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 University of Luxembourg
SpelsNet, a novel neural architecture, achieves accurate 3D point cloud segmentation into surface primitives by incorporating B-Rep graph structure supervision, leading to topologically consistent res…
SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection
·2151 words·11 mins·
loading
·
loading
Computer Vision
Face Recognition
🏢 Institute of Information Engineering, Chinese Academy of Sciences
SpeechForensics leverages audio-visual speech representation learning to achieve superior face forgery detection, outperforming state-of-the-art methods in cross-dataset generalization and robustness.
Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting
·3727 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Zhejiang University
Spec-Gaussian enhances 3D Gaussian splatting by using anisotropic spherical Gaussians for view-dependent appearance modeling, achieving superior real-time rendering of scenes with specular and anisotr…
Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras
·2395 words·12 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Peking University
STIR: A novel spatio-temporal network reconstructs high-quality images from spiking camera data by jointly refining motion and intensity information for efficient and accurate high-speed imaging.
Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis
·2245 words·11 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Carnegie Mellon University
SparseAGS: High-fidelity 3D reconstruction & camera pose estimation from sparse views via generative synthesis.
SOI: Scaling Down Computational Complexity by Estimating Partial States of the Model
·2817 words·14 mins·
loading
·
loading
Computer Vision
Action Recognition
🏢 Samsung AI Center Warsaw
Scattered Online Inference (SOI) drastically cuts down ANN computational complexity by leveraging data continuity and prediction seasonality, enabling faster real-time inference on low-power devices.
Soft Tensor Product Representations for Fully Continuous, Compositional Visual Representations
·8738 words·42 mins·
loading
·
loading
Computer Vision
Representation Learning
🏢 UNSW, Sydney
Soft Tensor Product Representations (Soft TPRs) revolutionize compositional visual representation learning by seamlessly blending continuous vector spaces and compositional structures, leading to supe…
Soft Superpixel Neighborhood Attention
·3657 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
Image Segmentation
🏢 Purdue University
Soft Superpixel Neighborhood Attention (SNA) optimally denoises images by incorporating superpixel probabilities into an attention module, outperforming traditional methods.
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention
·2786 words·14 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 University of Washington
Smoothed Energy Guidance (SEG) improves unconditional image generation by reducing self-attention’s energy curvature, leading to higher-quality outputs with fewer artifacts.
Slot State Space Models
·2613 words·13 mins·
loading
·
loading
Computer Vision
Video Understanding
🏢 Rutgers University
SlotSSMs: a novel framework for modular sequence modeling, achieving significant performance gains by incorporating independent mechanisms and sparse interactions into State Space Models.
SlimSAM: 0.1% Data Makes Segment Anything Slim
·2447 words·12 mins·
loading
·
loading
Computer Vision
Image Segmentation
🏢 National University of Singapore
SlimSAM achieves near original SAM performance using 0.1% of its training data by employing a novel alternate slimming framework and disturbed Taylor pruning, significantly advancing data-efficient mo…
Slicing Vision Transformer for Flexibile Inference
·2922 words·14 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Snap Inc.
Scala: One-shot training enables flexible ViT inference!
Single Image Reflection Separation via Dual-Stream Interactive Transformers
·2158 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 College of Intelligence and Computing, Tianjin University
Dual-Stream Interactive Transformers (DSIT) revolutionizes single image reflection separation by using a novel dual-attention mechanism that captures inter- and intra-layer correlations, significantly…
Simple and Fast Distillation of Diffusion Models
·3151 words·15 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Zhejiang University
Simple and Fast Distillation (SFD) drastically accelerates diffusion model training by 1000x, achieving state-of-the-art results in few-step image generation with minimal fine-tuning.
ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling
·2221 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Tsinghua University
ShowMaker: Generating high-fidelity 2D human conversational videos using fine-grained diffusion modeling and 2D key points.
SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models
·3049 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 DAMO Academy, Alibaba Group
SHMT: Self-supervised Hierarchical Makeup Transfer uses latent diffusion models to realistically and precisely apply diverse makeup styles to faces, even without paired training data, achieving high f…
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
·3184 words·15 mins·
loading
·
loading
Computer Vision
Image Restoration
🏢 Peking University
SemanIR boosts image restoration efficiency by cleverly sharing key semantic information within Transformer layers, achieving state-of-the-art results across multiple tasks.
SfPUEL: Shape from Polarization under Unknown Environment Light
·2725 words·13 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Peking University
SfPUEL: A novel end-to-end SfP method achieves robust single-shot surface normal estimation under diverse lighting, integrating PS priors and material segmentation.