Computer Vision

SF-V: Single Forward Video Generation Model

26 September 2024·1607 words·8 mins· loading · loading

Computer Vision Video Understanding 🏢 Snap Inc.

Researchers developed SF-V, a single-step image-to-video generation model, achieving a 23x speedup compared to existing models without sacrificing quality, paving the way for real-time video synthesis…

Semi-Open 3D Object Retrieval via Hierarchical Equilibrium on Hypergraph

26 September 2024·2346 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tsinghua University

HERT: a novel framework for semi-open 3D object retrieval using hierarchical hypergraph equilibrium, achieving state-of-the-art performance on four new benchmark datasets.

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

26 September 2024·2658 words·13 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Peking University

SemFlow: A unified framework uses rectified flow to seamlessly bridge semantic segmentation and image synthesis, achieving competitive results and offering reversible image-mask transformations.

Semantic Feature Learning for Universal Unsupervised Cross-Domain Retrieval

26 September 2024·1809 words·9 mins· loading · loading

Computer Vision Cross-Modal Retrieval 🏢 Northwestern University

Universal Unsupervised Cross-Domain Retrieval (U2CDR) framework learns semantic features to enable accurate retrieval even when category spaces differ across domains.

Self-Play Fine-tuning of Diffusion Models for Text-to-image Generation

26 September 2024·4025 words·19 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 UC Los Angeles

Self-Play Fine-Tuning (SPIN-Diffusion) revolutionizes diffusion model training, achieving superior text-to-image results with less data via iterative self-improvement, surpassing supervised and RLHF m…

Self-Guided Masked Autoencoder

26 September 2024·3698 words·18 mins· loading · loading

AI Generated Computer Vision Self-Supervised Learning 🏢 Seoul National University

Self-guided MAE boosts self-supervised learning by intelligently masking image patches based on internal clustering patterns, dramatically accelerating training without external data.

Self-Distilled Depth Refinement with Noisy Poisson Fusion

26 September 2024·2691 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology

Self-Distilled Depth Refinement (SDDR) tackles noisy depth maps via a novel noisy Poisson fusion approach, achieving significant improvements in depth accuracy and edge quality.

Segment Anything without Supervision

26 September 2024·1959 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 UC Berkeley

Unsupervised SAM (UnSAM) achieves competitive image segmentation results without human annotation, surpassing previous unsupervised methods and even improving supervised SAM’s accuracy.

Segment Any Change

26 September 2024·2244 words·11 mins· loading · loading

Computer Vision Image Segmentation 🏢 Stanford University

AnyChange achieves zero-shot image change detection by adapting the Segment Anything Model (SAM) via a training-free bitemporal latent matching method, significantly outperforming previous state-of-th…

SE(3)-bi-equivariant Transformers for Point Cloud Assembly

26 September 2024·3085 words·15 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Gothenburg

SE(3)-bi-equivariant Transformers (BITR) revolutionizes point cloud assembly by guaranteeing robust alignment even with non-overlapping clouds, thanks to its unique equivariance properties.

SCube: Instant Large-Scale Scene Reconstruction using VoxSplats

26 September 2024·3116 words·15 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Toronto

SCube: Instant large-scale 3D scene reconstruction from sparse images using VoxSplats, a novel 3D Gaussian splat representation.

Score Distillation via Reparametrized DDIM

26 September 2024·4128 words·20 mins· loading · loading

Computer Vision Image Generation 🏢 MIT

Researchers improved 3D shape generation from 2D diffusion models by showing that existing Score Distillation Sampling is a reparameterized version of DDIM and fixing its high-variance noise issue via…

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

26 September 2024·3730 words·18 mins· loading · loading

Computer Vision Image Generation 🏢 State Grid Corporation of China

Logistic Schedule: A novel noise schedule revolutionizes image editing by improving DDIM inversion, enhancing content preservation and edit fidelity without model retraining!

Scaling White-Box Transformers for Vision

26 September 2024·2209 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 UC Santa Cruz

CRATE-a: A new white-box vision transformer architecture achieves 85.1% ImageNet accuracy by strategically scaling model size and datasets, outperforming prior white-box models and preserving interpre…

Scaling the Codebook Size of VQ-GAN to 100,000 with a Utilization Rate of 99%

26 September 2024·2947 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 Microsoft Research

VQGAN-LC massively scales VQGAN’s codebook to 100,000 entries while maintaining a 99% utilization rate, significantly boosting image generation and downstream task performance.

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers

26 September 2024·3783 words·18 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Intel Labs

ScaleKD: Pre-trained vision transformers make excellent teachers for diverse student networks, improving efficiency and performance in knowledge distillation.

Samba: Severity-aware Recurrent Modeling for Cross-domain Medical Image Grading

26 September 2024·2230 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Westlake University

Samba: a novel severity-aware recurrent model, tackles cross-domain medical image grading by sequentially encoding image patches and recalibrating states using EM, significantly improving accuracy.

SAM-Guided Masked Token Prediction for 3D Scene Understanding

26 September 2024·1740 words·9 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Clemson University

This paper introduces SAM-guided masked token prediction, a novel framework for 3D scene understanding that leverages foundation models to significantly improve 3D object detection and semantic segmen…

RTify: Aligning Deep Neural Networks with Human Behavioral Decisions

26 September 2024·1884 words·9 mins· loading · loading

Computer Vision Image Classification 🏢 Brown University

RTify: A novel framework aligns deep neural networks’ dynamics with human reaction times for improved visual decision-making models.

Robustly overfitting latents for flexible neural image compression

26 September 2024·4120 words·20 mins· loading · loading

AI Generated Computer Vision Image Compression 🏢 Vrije Universiteit Amsterdam

SGA+ significantly boosts neural image compression by refining latents, offering a flexible, hyperparameter-insensitive approach with improved rate-distortion trade-off.