Computer Vision

Vivid-ZOO: Multi-View Video Generation with Diffusion Model

26 September 2024·2634 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 King Abdullah University of Science and Technology

Vivid-ZOO: Generating high-quality multi-view videos from text using a novel diffusion model.

Visual Prompt Tuning in Null Space for Continual Learning

26 September 2024·2254 words·11 mins· loading · loading

AI Generated Computer Vision Visual Question Answering 🏢 School of Computer Science, Northwestern Polytechnical University

This paper presents NSP², a novel method for visual prompt tuning in continual learning that leverages orthogonal projection to prevent catastrophic forgetting by tuning prompts orthogonal to previous…

Visual Pinwheel Center Act as Geometric Saliency Detector

26 September 2024·2189 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Research Institute of Intelligent Complex Systems, Fudan University

Visual pinwheel centers in the cortex act as efficient geometric saliency detectors, responding faster and stronger to complex spatial textures than other structures.

Visual Fourier Prompt Tuning

26 September 2024·4269 words·21 mins· loading · loading

Computer Vision Image Classification 🏢 Rochester Institute of Technology

Visual Fourier Prompt Tuning (VFPT) leverages the Fast Fourier Transform to seamlessly integrate spatial and frequency information for superior parameter-efficient vision model fine-tuning, even with …

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

26 September 2024·4160 words·20 mins· loading · loading

Computer Vision Image Generation 🏢 Department of Biomedical Engineering, Southern University of Science and Technology

Researchers developed a novel zero-shot EEG-based framework for visual reconstruction using a tailored brain encoder and a two-stage image generation strategy, achieving state-of-the-art performance i…

Visual Data Diagnosis and Debiasing with Concept Graphs

26 September 2024·2767 words·13 mins· loading · loading

Computer Vision Image Classification 🏢 Carnegie Mellon University

CONBIAS tackles dataset bias by representing visual data as concept graphs, diagnosing imbalances via clique analysis, and debiasing through targeted data augmentation for improved model generalizatio…

Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights

26 September 2024·4915 words·24 mins· loading · loading

Computer Vision Image Classification 🏢 Singapore University of Technology and Design (SUTD)

OoD-ViT-NAS: a new benchmark reveals how ViT architecture impacts out-of-distribution generalization, highlighting the importance of embedding dimension and challenging the reliance on in-distribution…

Vision Mamba Mender

26 September 2024·2136 words·11 mins· loading · loading

AI Generated Computer Vision Face Recognition 🏢 College of Computer Science and Technology, Zhejiang University

Vision Mamba Mender systematically optimizes the Mamba model by identifying and repairing internal and external state flaws, significantly improving its performance in visual recognition tasks.

Vision Foundation Model Enables Generalizable Object Pose Estimation

26 September 2024·3435 words·17 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Chinese University of Hong Kong

VFM-6D: a novel framework achieving generalizable object pose estimation for unseen categories by leveraging vision-language models.

Virtual Scanning: Unsupervised Non-line-of-sight Imaging from Irregularly Undersampled Transients

26 September 2024·2316 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Tianjin University

Unsupervised learning framework enables high-fidelity non-line-of-sight (NLOS) imaging from irregularly undersampled transients, surpassing state-of-the-art methods in speed and robustness.

Video Token Merging for Long Video Understanding

26 September 2024·2290 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Korea University

Researchers boost long-form video understanding efficiency by 6.89x and reduce memory usage by 84% using a novel learnable video token merging algorithm.

Video Diffusion Models are Training-free Motion Interpreter and Controller

26 September 2024·2252 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Peking University

Training-free video motion control achieved via novel Motion Feature (MOFT) extraction from existing video diffusion models, offering architecture-agnostic insights and high performance.

VFIMamba: Video Frame Interpolation with State Space Models

26 September 2024·2179 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Tencent AI Lab

VFIMamba uses state-space models for efficient and dynamic video frame interpolation, achieving state-of-the-art results by introducing a novel Mixed-SSM Block and curriculum learning.

VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction

26 September 2024·2586 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 National University of Singapore

VCR-GauS: Novel view-consistent depth-normal regularizer for superior, real-time 3D surface reconstruction using Gaussian splatting.

Variational Multi-scale Representation for Estimating Uncertainty in 3D Gaussian Splatting

26 September 2024·2343 words·11 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Hong Kong Baptist University

New uncertainty estimation method for 3D Gaussian Splatting improves scene reconstruction quality by leveraging variational multi-scale representation and efficiently removing noisy data.

UV-free Texture Generation with Denoising and Geodesic Heat Diffusion

26 September 2024·2448 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Imperial College London

UV3-TeD generates high-quality 3D textures directly on object surfaces using a novel diffusion probabilistic model, eliminating UV-mapping limitations.

UPS: Unified Projection Sharing for Lightweight Single-Image Super-resolution and Beyond

26 September 2024·2752 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

UPS: A novel algorithm for lightweight single-image super-resolution, decoupling feature extraction and similarity modeling for enhanced efficiency and robustness.

Upping the Game: How 2D U-Net Skip Connections Flip 3D Segmentation

26 September 2024·3785 words·18 mins· loading · loading

Computer Vision Image Segmentation 🏢 Hangzhou Dianzi University

Boosting 3D medical image segmentation, a novel U-shaped Connection (uC) integrates 2D U-Net skip connections into 3D CNNs, improving axial-slice plane feature extraction, surpassing state-of-the-art …

Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization

26 September 2024·1822 words·9 mins· loading · loading

AI Generated Computer Vision Vision Transformers 🏢 University of Tokyo

Vision Transformers (ViTs) generalize surprisingly well, even when overfitting training data; this work provides the first theoretical explanation by characterizing the optimization dynamics of ViTs a…

Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms

26 September 2024·3555 words·17 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 ECE Department, Rutgers University

Untrained neural networks revolutionize snapshot compressive imaging (SCI) by enabling high-dimensional data recovery from a single 2D measurement, achieving state-of-the-art results without needing e…