Skip to main content

Computer Vision

Vivid-ZOO: Multi-View Video Generation with Diffusion Model
·2634 words·13 mins· loading · loading
Computer Vision Image Generation 🏢 King Abdullah University of Science and Technology
Vivid-ZOO: Generating high-quality multi-view videos from text using a novel diffusion model.
Visual Prompt Tuning in Null Space for Continual Learning
·2254 words·11 mins· loading · loading
AI Generated Computer Vision Visual Question Answering 🏢 School of Computer Science, Northwestern Polytechnical University
This paper presents NSP², a novel method for visual prompt tuning in continual learning that leverages orthogonal projection to prevent catastrophic forgetting by tuning prompts orthogonal to previous…
Visual Pinwheel Center Act as Geometric Saliency Detector
·2189 words·11 mins· loading · loading
Computer Vision Image Classification 🏢 Research Institute of Intelligent Complex Systems, Fudan University
Visual pinwheel centers in the cortex act as efficient geometric saliency detectors, responding faster and stronger to complex spatial textures than other structures.
Visual Fourier Prompt Tuning
·4269 words·21 mins· loading · loading
Computer Vision Image Classification 🏢 Rochester Institute of Technology
Visual Fourier Prompt Tuning (VFPT) leverages the Fast Fourier Transform to seamlessly integrate spatial and frequency information for superior parameter-efficient vision model fine-tuning, even with …
Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion
·4160 words·20 mins· loading · loading
Computer Vision Image Generation 🏢 Department of Biomedical Engineering, Southern University of Science and Technology
Researchers developed a novel zero-shot EEG-based framework for visual reconstruction using a tailored brain encoder and a two-stage image generation strategy, achieving state-of-the-art performance i…
Visual Data Diagnosis and Debiasing with Concept Graphs
·2767 words·13 mins· loading · loading
Computer Vision Image Classification 🏢 Carnegie Mellon University
CONBIAS tackles dataset bias by representing visual data as concept graphs, diagnosing imbalances via clique analysis, and debiasing through targeted data augmentation for improved model generalizatio…
Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
·4915 words·24 mins· loading · loading
Computer Vision Image Classification 🏢 Singapore University of Technology and Design (SUTD)
OoD-ViT-NAS: a new benchmark reveals how ViT architecture impacts out-of-distribution generalization, highlighting the importance of embedding dimension and challenging the reliance on in-distribution…
Vision Mamba Mender
·2136 words·11 mins· loading · loading
AI Generated Computer Vision Face Recognition 🏢 College of Computer Science and Technology, Zhejiang University
Vision Mamba Mender systematically optimizes the Mamba model by identifying and repairing internal and external state flaws, significantly improving its performance in visual recognition tasks.
Vision Foundation Model Enables Generalizable Object Pose Estimation
·3435 words·17 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Chinese University of Hong Kong
VFM-6D: a novel framework achieving generalizable object pose estimation for unseen categories by leveraging vision-language models.
Virtual Scanning: Unsupervised Non-line-of-sight Imaging from Irregularly Undersampled Transients
·2316 words·11 mins· loading · loading
Computer Vision Image Generation 🏢 Tianjin University
Unsupervised learning framework enables high-fidelity non-line-of-sight (NLOS) imaging from irregularly undersampled transients, surpassing state-of-the-art methods in speed and robustness.
Video Token Merging for Long Video Understanding
·2290 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Korea University
Researchers boost long-form video understanding efficiency by 6.89x and reduce memory usage by 84% using a novel learnable video token merging algorithm.
Video Diffusion Models are Training-free Motion Interpreter and Controller
·2252 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Peking University
Training-free video motion control achieved via novel Motion Feature (MOFT) extraction from existing video diffusion models, offering architecture-agnostic insights and high performance.
VFIMamba: Video Frame Interpolation with State Space Models
·2179 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Tencent AI Lab
VFIMamba uses state-space models for efficient and dynamic video frame interpolation, achieving state-of-the-art results by introducing a novel Mixed-SSM Block and curriculum learning.
VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction
·2586 words·13 mins· loading · loading
Computer Vision 3D Vision 🏢 National University of Singapore
VCR-GauS: Novel view-consistent depth-normal regularizer for superior, real-time 3D surface reconstruction using Gaussian splatting.
Variational Multi-scale Representation for Estimating Uncertainty in 3D Gaussian Splatting
·2343 words·11 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Hong Kong Baptist University
New uncertainty estimation method for 3D Gaussian Splatting improves scene reconstruction quality by leveraging variational multi-scale representation and efficiently removing noisy data.
UV-free Texture Generation with Denoising and Geodesic Heat Diffusion
·2448 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Imperial College London
UV3-TeD generates high-quality 3D textures directly on object surfaces using a novel diffusion probabilistic model, eliminating UV-mapping limitations.
UPS: Unified Projection Sharing for Lightweight Single-Image Super-resolution and Beyond
·2752 words·13 mins· loading · loading
Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology
UPS: A novel algorithm for lightweight single-image super-resolution, decoupling feature extraction and similarity modeling for enhanced efficiency and robustness.
Upping the Game: How 2D U-Net Skip Connections Flip 3D Segmentation
·3785 words·18 mins· loading · loading
Computer Vision Image Segmentation 🏢 Hangzhou Dianzi University
Boosting 3D medical image segmentation, a novel U-shaped Connection (uC) integrates 2D U-Net skip connections into 3D CNNs, improving axial-slice plane feature extraction, surpassing state-of-the-art …
Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization
·1822 words·9 mins· loading · loading
AI Generated Computer Vision Vision Transformers 🏢 University of Tokyo
Vision Transformers (ViTs) generalize surprisingly well, even when overfitting training data; this work provides the first theoretical explanation by characterizing the optimization dynamics of ViTs a…
Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms
·3555 words·17 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 ECE Department, Rutgers University
Untrained neural networks revolutionize snapshot compressive imaging (SCI) by enabling high-dimensional data recovery from a single 2D measurement, achieving state-of-the-art results without needing e…