Computer Vision
SF-V: Single Forward Video Generation Model
·1607 words·8 mins·
loading
·
loading
Computer Vision
Video Understanding
🏢 Snap Inc.
Researchers developed SF-V, a single-step image-to-video generation model, achieving a 23x speedup compared to existing models without sacrificing quality, paving the way for real-time video synthesis…
Semi-Open 3D Object Retrieval via Hierarchical Equilibrium on Hypergraph
·2346 words·12 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Tsinghua University
HERT: a novel framework for semi-open 3D object retrieval using hierarchical hypergraph equilibrium, achieving state-of-the-art performance on four new benchmark datasets.
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
·2658 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 Peking University
SemFlow: A unified framework uses rectified flow to seamlessly bridge semantic segmentation and image synthesis, achieving competitive results and offering reversible image-mask transformations.
Semantic Feature Learning for Universal Unsupervised Cross-Domain Retrieval
·1809 words·9 mins·
loading
·
loading
Computer Vision
Cross-Modal Retrieval
🏢 Northwestern University
Universal Unsupervised Cross-Domain Retrieval (U2CDR) framework learns semantic features to enable accurate retrieval even when category spaces differ across domains.
Self-Play Fine-tuning of Diffusion Models for Text-to-image Generation
·4025 words·19 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 UC Los Angeles
Self-Play Fine-Tuning (SPIN-Diffusion) revolutionizes diffusion model training, achieving superior text-to-image results with less data via iterative self-improvement, surpassing supervised and RLHF m…
Self-Guided Masked Autoencoder
·3698 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
Self-Supervised Learning
🏢 Seoul National University
Self-guided MAE boosts self-supervised learning by intelligently masking image patches based on internal clustering patterns, dramatically accelerating training without external data.
Self-Distilled Depth Refinement with Noisy Poisson Fusion
·2691 words·13 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Huazhong University of Science and Technology
Self-Distilled Depth Refinement (SDDR) tackles noisy depth maps via a novel noisy Poisson fusion approach, achieving significant improvements in depth accuracy and edge quality.
Segment Anything without Supervision
·1959 words·10 mins·
loading
·
loading
Computer Vision
Image Segmentation
🏢 UC Berkeley
Unsupervised SAM (UnSAM) achieves competitive image segmentation results without human annotation, surpassing previous unsupervised methods and even improving supervised SAM’s accuracy.
Segment Any Change
·2244 words·11 mins·
loading
·
loading
Computer Vision
Image Segmentation
🏢 Stanford University
AnyChange achieves zero-shot image change detection by adapting the Segment Anything Model (SAM) via a training-free bitemporal latent matching method, significantly outperforming previous state-of-th…
SE(3)-bi-equivariant Transformers for Point Cloud Assembly
·3085 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 University of Gothenburg
SE(3)-bi-equivariant Transformers (BITR) revolutionizes point cloud assembly by guaranteeing robust alignment even with non-overlapping clouds, thanks to its unique equivariance properties.
SCube: Instant Large-Scale Scene Reconstruction using VoxSplats
·3116 words·15 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 University of Toronto
SCube: Instant large-scale 3D scene reconstruction from sparse images using VoxSplats, a novel 3D Gaussian splat representation.
Score Distillation via Reparametrized DDIM
·4128 words·20 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 MIT
Researchers improved 3D shape generation from 2D diffusion models by showing that existing Score Distillation Sampling is a reparameterized version of DDIM and fixing its high-variance noise issue via…
Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing
·3730 words·18 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 State Grid Corporation of China
Logistic Schedule: A novel noise schedule revolutionizes image editing by improving DDIM inversion, enhancing content preservation and edit fidelity without model retraining!
Scaling White-Box Transformers for Vision
·2209 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 UC Santa Cruz
CRATE-a: A new white-box vision transformer architecture achieves 85.1% ImageNet accuracy by strategically scaling model size and datasets, outperforming prior white-box models and preserving interpre…
Scaling the Codebook Size of VQ-GAN to 100,000 with a Utilization Rate of 99%
·2947 words·14 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Microsoft Research
VQGAN-LC massively scales VQGAN’s codebook to 100,000 entries while maintaining a 99% utilization rate, significantly boosting image generation and downstream task performance.
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
·3783 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
Image Classification
🏢 Intel Labs
ScaleKD: Pre-trained vision transformers make excellent teachers for diverse student networks, improving efficiency and performance in knowledge distillation.
Samba: Severity-aware Recurrent Modeling for Cross-domain Medical Image Grading
·2230 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Westlake University
Samba: a novel severity-aware recurrent model, tackles cross-domain medical image grading by sequentially encoding image patches and recalibrating states using EM, significantly improving accuracy.
SAM-Guided Masked Token Prediction for 3D Scene Understanding
·1740 words·9 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Clemson University
This paper introduces SAM-guided masked token prediction, a novel framework for 3D scene understanding that leverages foundation models to significantly improve 3D object detection and semantic segmen…
RTify: Aligning Deep Neural Networks with Human Behavioral Decisions
·1884 words·9 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Brown University
RTify: A novel framework aligns deep neural networks’ dynamics with human reaction times for improved visual decision-making models.
Robustly overfitting latents for flexible neural image compression
·4120 words·20 mins·
loading
·
loading
AI Generated
Computer Vision
Image Compression
🏢 Vrije Universiteit Amsterdam
SGA+ significantly boosts neural image compression by refining latents, offering a flexible, hyperparameter-insensitive approach with improved rate-distortion trade-off.