Computer Vision

AdanCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer

26 September 2024·3672 words·18 mins· loading · loading

Computer Vision Image Classification 🏢 École Polytechnique Fédérale De Lausanne

Boosting Vision Transformer robustness against attacks & noisy data, AdaNCA uses Neural Cellular Automata as plug-and-play adaptors between ViT layers, achieving significant accuracy improvement with …

Activating Self-Attention for Multi-Scene Absolute Pose Regression

26 September 2024·2130 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 SungKyunKwan University

Boosting Multi-Scene Pose Regression: Novel methods activate transformer self-attention, significantly improving camera pose estimation accuracy and efficiency.

Action Imitation in Common Action Space for Customized Action Image Synthesis

26 September 2024·1901 words·9 mins· loading · loading

Computer Vision Image Generation 🏢 Zhejiang University

TwinAct: Decoupling actions and actors for customizable text-guided action image generation.

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

26 September 2024·3953 words·19 mins· loading · loading

AI Generated Computer Vision Action Recognition 🏢 Pohang University of Science and Technology

ActFusion: a unified diffusion model achieving state-of-the-art performance in both action segmentation and anticipation by jointly learning visible and invisible parts of video sequences.

ActAnywhere: Subject-Aware Video Background Generation

26 September 2024·1990 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Stanford University

ActAnywhere, a novel video diffusion model, seamlessly integrates foreground subjects into new backgrounds by generating realistic video backgrounds tailored to subject motion, significantly reducing …

ACFun: Abstract-Concrete Fusion Facial Stylization

26 September 2024·2192 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Xidian University

ACFun: A novel facial stylization method fusing abstract & concrete features for high-quality, artistically pleasing results from only one style & one face image.

Accelerating Non-Maximum Suppression: A Graph Theory Perspective

26 September 2024·3325 words·16 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 School of Computer Science and Technology, MOEKLINNS Lab, Xi'an Jiaotong University

This paper presents QSI-NMS and BOE-NMS, novel graph theory-based algorithms that significantly speed up non-maximum suppression (NMS) in object detection without significant accuracy loss, and introd…

Accelerating Augmentation Invariance Pretraining

26 September 2024·1854 words·9 mins· loading · loading

Computer Vision Self-Supervised Learning 🏢 University of Wisconsin-Madison

Boost Vision Transformer pretraining speed by 4x with novel sequence compression techniques!

A Unified Framework for 3D Scene Understanding

26 September 2024·2347 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology

UniSeg3D: One model to rule them all! This unified framework masters six 3D segmentation tasks (panoptic, semantic, instance, interactive, referring, and open-vocabulary) simultaneously, outperforming…

A Surprisingly Simple Approach to Generalized Few-Shot Semantic Segmentation

26 September 2024·2178 words·11 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 IBM Research

Simple rule-based base-class mining (BCM) significantly boosts generalized few-shot semantic segmentation (GFSS) performance, surpassing complex existing methods.

A Simple yet Universal Framework for Depth Completion

26 September 2024·2167 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 AI Graduate School GIST

UniDC framework achieves universal depth completion across various sensors and scenes using minimal labeled data, leveraging a foundation model and hyperbolic embedding for enhanced generalization.

A Siamese Transformer with Hierarchical Refinement for Lane Detection

26 September 2024·2636 words·13 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 Shanghai Jiao Tong University

Siamese Transformer with Hierarchical Refinement achieves state-of-the-art lane detection accuracy by integrating global and local features and a novel Curve-IoU loss.

A robust inlier identification algorithm for point cloud registration via l_0-minimization

26 September 2024·2507 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Huazhong University of Science and Technology

This paper introduces a novel, robust inlier identification algorithm for point cloud registration that leverages lo-minimization.

A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation

26 September 2024·2616 words·13 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 Faculty of Computer and Information Science, University of Ljubljana

GeCo: A novel single-stage low-shot counter achieving ~25% improvement in count accuracy, via unified object detection, segmentation, and counting.

A Motion-aware Spatio-temporal Graph for Video Salient Object Ranking

26 September 2024·2245 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 School of Computer Science and Engineering, Southeast University

A novel motion-aware spatio-temporal graph model surpasses existing methods in video salient object ranking by jointly optimizing multi-scale spatial and temporal features, thus accurately prioritizin…

A Modular Conditional Diffusion Framework for Image Reconstruction

26 September 2024·4235 words·20 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 MTS AI

A novel modular diffusion framework for image reconstruction dramatically cuts computational costs and achieves state-of-the-art perceptual quality across various tasks by cleverly combining pre-train…

A Label is Worth A Thousand Images in Dataset Distillation

26 September 2024·2824 words·14 mins· loading · loading

Computer Vision Image Classification 🏢 Harvard University

Soft labels, not sophisticated data synthesis, are the key to successful dataset distillation, significantly improving data-efficient learning and challenging existing methods.

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

26 September 2024·1812 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Zhejiang University

Depth-range-free MVS network using pose embedding achieves robust and accurate 3D reconstruction.

A General Protocol to Probe Large Vision Models for 3D Physical Understanding

26 September 2024·4012 words·19 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Oxford

Researchers developed a lightweight protocol to probe large vision models’ 3D physical understanding by training classifiers on model features for various scene properties (geometry, material, lightin…

A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration

26 September 2024·2500 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Zhejiang University

CAST: a novel consistency-aware spot-guided Transformer achieves state-of-the-art accuracy and efficiency in point cloud registration.