Skip to main content

Computer Vision

AdanCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer
·3672 words·18 mins· loading · loading
Computer Vision Image Classification 🏒 Γ‰cole Polytechnique FΓ©dΓ©rale De Lausanne
Boosting Vision Transformer robustness against attacks & noisy data, AdaNCA uses Neural Cellular Automata as plug-and-play adaptors between ViT layers, achieving significant accuracy improvement with …
Activating Self-Attention for Multi-Scene Absolute Pose Regression
·2130 words·10 mins· loading · loading
Computer Vision 3D Vision 🏒 SungKyunKwan University
Boosting Multi-Scene Pose Regression: Novel methods activate transformer self-attention, significantly improving camera pose estimation accuracy and efficiency.
Action Imitation in Common Action Space for Customized Action Image Synthesis
·1901 words·9 mins· loading · loading
Computer Vision Image Generation 🏒 Zhejiang University
TwinAct: Decoupling actions and actors for customizable text-guided action image generation.
ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation
·3953 words·19 mins· loading · loading
AI Generated Computer Vision Action Recognition 🏒 Pohang University of Science and Technology
ActFusion: a unified diffusion model achieving state-of-the-art performance in both action segmentation and anticipation by jointly learning visible and invisible parts of video sequences.
ActAnywhere: Subject-Aware Video Background Generation
·1990 words·10 mins· loading · loading
Computer Vision Video Understanding 🏒 Stanford University
ActAnywhere, a novel video diffusion model, seamlessly integrates foreground subjects into new backgrounds by generating realistic video backgrounds tailored to subject motion, significantly reducing …
ACFun: Abstract-Concrete Fusion Facial Stylization
·2192 words·11 mins· loading · loading
Computer Vision Image Generation 🏒 Xidian University
ACFun: A novel facial stylization method fusing abstract & concrete features for high-quality, artistically pleasing results from only one style & one face image.
Accelerating Non-Maximum Suppression: A Graph Theory Perspective
·3325 words·16 mins· loading · loading
AI Generated Computer Vision Object Detection 🏒 School of Computer Science and Technology, MOEKLINNS Lab, Xi'an Jiaotong University
This paper presents QSI-NMS and BOE-NMS, novel graph theory-based algorithms that significantly speed up non-maximum suppression (NMS) in object detection without significant accuracy loss, and introd…
Accelerating Augmentation Invariance Pretraining
·1854 words·9 mins· loading · loading
Computer Vision Self-Supervised Learning 🏒 University of Wisconsin-Madison
Boost Vision Transformer pretraining speed by 4x with novel sequence compression techniques!
A Unified Framework for 3D Scene Understanding
·2347 words·12 mins· loading · loading
Computer Vision 3D Vision 🏒 Huazhong University of Science and Technology
UniSeg3D: One model to rule them all! This unified framework masters six 3D segmentation tasks (panoptic, semantic, instance, interactive, referring, and open-vocabulary) simultaneously, outperforming…
A Surprisingly Simple Approach to Generalized Few-Shot Semantic Segmentation
·2178 words·11 mins· loading · loading
AI Generated Computer Vision Image Segmentation 🏒 IBM Research
Simple rule-based base-class mining (BCM) significantly boosts generalized few-shot semantic segmentation (GFSS) performance, surpassing complex existing methods.
A Simple yet Universal Framework for Depth Completion
·2167 words·11 mins· loading · loading
Computer Vision 3D Vision 🏒 AI Graduate School GIST
UniDC framework achieves universal depth completion across various sensors and scenes using minimal labeled data, leveraging a foundation model and hyperbolic embedding for enhanced generalization.
A Siamese Transformer with Hierarchical Refinement for Lane Detection
·2636 words·13 mins· loading · loading
AI Generated Computer Vision Object Detection 🏒 Shanghai Jiao Tong University
Siamese Transformer with Hierarchical Refinement achieves state-of-the-art lane detection accuracy by integrating global and local features and a novel Curve-IoU loss.
A robust inlier identification algorithm for point cloud registration via l_0-minimization
·2507 words·12 mins· loading · loading
Computer Vision 3D Vision 🏒 Huazhong University of Science and Technology
This paper introduces a novel, robust inlier identification algorithm for point cloud registration that leverages lo-minimization.
A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation
·2616 words·13 mins· loading · loading
AI Generated Computer Vision Object Detection 🏒 Faculty of Computer and Information Science, University of Ljubljana
GeCo: A novel single-stage low-shot counter achieving ~25% improvement in count accuracy, via unified object detection, segmentation, and counting.
A Motion-aware Spatio-temporal Graph for Video Salient Object Ranking
·2245 words·11 mins· loading · loading
Computer Vision Video Understanding 🏒 School of Computer Science and Engineering, Southeast University
A novel motion-aware spatio-temporal graph model surpasses existing methods in video salient object ranking by jointly optimizing multi-scale spatial and temporal features, thus accurately prioritizin…
A Modular Conditional Diffusion Framework for Image Reconstruction
·4235 words·20 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 MTS AI
A novel modular diffusion framework for image reconstruction dramatically cuts computational costs and achieves state-of-the-art perceptual quality across various tasks by cleverly combining pre-train…
A Label is Worth A Thousand Images in Dataset Distillation
·2824 words·14 mins· loading · loading
Computer Vision Image Classification 🏒 Harvard University
Soft labels, not sophisticated data synthesis, are the key to successful dataset distillation, significantly improving data-efficient learning and challenging existing methods.
A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
·1812 words·9 mins· loading · loading
Computer Vision 3D Vision 🏒 Zhejiang University
Depth-range-free MVS network using pose embedding achieves robust and accurate 3D reconstruction.
A General Protocol to Probe Large Vision Models for 3D Physical Understanding
·4012 words·19 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏒 University of Oxford
Researchers developed a lightweight protocol to probe large vision models’ 3D physical understanding by training classifiers on model features for various scene properties (geometry, material, lightin…
A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration
·2500 words·12 mins· loading · loading
Computer Vision 3D Vision 🏒 Zhejiang University
CAST: a novel consistency-aware spot-guided Transformer achieves state-of-the-art accuracy and efficiency in point cloud registration.