Computer Vision
AdanCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer
·3672 words·18 mins·
loading
·
loading
Computer Vision
Image Classification
π’ Γcole Polytechnique FΓ©dΓ©rale De Lausanne
Boosting Vision Transformer robustness against attacks & noisy data, AdaNCA uses Neural Cellular Automata as plug-and-play adaptors between ViT layers, achieving significant accuracy improvement with …
Activating Self-Attention for Multi-Scene Absolute Pose Regression
·2130 words·10 mins·
loading
·
loading
Computer Vision
3D Vision
π’ SungKyunKwan University
Boosting Multi-Scene Pose Regression: Novel methods activate transformer self-attention, significantly improving camera pose estimation accuracy and efficiency.
Action Imitation in Common Action Space for Customized Action Image Synthesis
·1901 words·9 mins·
loading
·
loading
Computer Vision
Image Generation
π’ Zhejiang University
TwinAct: Decoupling actions and actors for customizable text-guided action image generation.
ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation
·3953 words·19 mins·
loading
·
loading
AI Generated
Computer Vision
Action Recognition
π’ Pohang University of Science and Technology
ActFusion: a unified diffusion model achieving state-of-the-art performance in both action segmentation and anticipation by jointly learning visible and invisible parts of video sequences.
ActAnywhere: Subject-Aware Video Background Generation
·1990 words·10 mins·
loading
·
loading
Computer Vision
Video Understanding
π’ Stanford University
ActAnywhere, a novel video diffusion model, seamlessly integrates foreground subjects into new backgrounds by generating realistic video backgrounds tailored to subject motion, significantly reducing …
ACFun: Abstract-Concrete Fusion Facial Stylization
·2192 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
π’ Xidian University
ACFun: A novel facial stylization method fusing abstract & concrete features for high-quality, artistically pleasing results from only one style & one face image.
Accelerating Non-Maximum Suppression: A Graph Theory Perspective
·3325 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Object Detection
π’ School of Computer Science and Technology, MOEKLINNS Lab, Xi'an Jiaotong University
This paper presents QSI-NMS and BOE-NMS, novel graph theory-based algorithms that significantly speed up non-maximum suppression (NMS) in object detection without significant accuracy loss, and introd…
Accelerating Augmentation Invariance Pretraining
·1854 words·9 mins·
loading
·
loading
Computer Vision
Self-Supervised Learning
π’ University of Wisconsin-Madison
Boost Vision Transformer pretraining speed by 4x with novel sequence compression techniques!
A Unified Framework for 3D Scene Understanding
·2347 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Huazhong University of Science and Technology
UniSeg3D: One model to rule them all! This unified framework masters six 3D segmentation tasks (panoptic, semantic, instance, interactive, referring, and open-vocabulary) simultaneously, outperforming…
A Surprisingly Simple Approach to Generalized Few-Shot Semantic Segmentation
·2178 words·11 mins·
loading
·
loading
AI Generated
Computer Vision
Image Segmentation
π’ IBM Research
Simple rule-based base-class mining (BCM) significantly boosts generalized few-shot semantic segmentation (GFSS) performance, surpassing complex existing methods.
A Simple yet Universal Framework for Depth Completion
·2167 words·11 mins·
loading
·
loading
Computer Vision
3D Vision
π’ AI Graduate School GIST
UniDC framework achieves universal depth completion across various sensors and scenes using minimal labeled data, leveraging a foundation model and hyperbolic embedding for enhanced generalization.
A Siamese Transformer with Hierarchical Refinement for Lane Detection
·2636 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Object Detection
π’ Shanghai Jiao Tong University
Siamese Transformer with Hierarchical Refinement achieves state-of-the-art lane detection accuracy by integrating global and local features and a novel Curve-IoU loss.
A robust inlier identification algorithm for point cloud registration via l_0-minimization
·2507 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Huazhong University of Science and Technology
This paper introduces a novel, robust inlier identification algorithm for point cloud registration that leverages lo-minimization.
A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation
·2616 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Object Detection
π’ Faculty of Computer and Information Science, University of Ljubljana
GeCo: A novel single-stage low-shot counter achieving ~25% improvement in count accuracy, via unified object detection, segmentation, and counting.
A Motion-aware Spatio-temporal Graph for Video Salient Object Ranking
·2245 words·11 mins·
loading
·
loading
Computer Vision
Video Understanding
π’ School of Computer Science and Engineering, Southeast University
A novel motion-aware spatio-temporal graph model surpasses existing methods in video salient object ranking by jointly optimizing multi-scale spatial and temporal features, thus accurately prioritizin…
A Modular Conditional Diffusion Framework for Image Reconstruction
·4235 words·20 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ MTS AI
A novel modular diffusion framework for image reconstruction dramatically cuts computational costs and achieves state-of-the-art perceptual quality across various tasks by cleverly combining pre-train…
A Label is Worth A Thousand Images in Dataset Distillation
·2824 words·14 mins·
loading
·
loading
Computer Vision
Image Classification
π’ Harvard University
Soft labels, not sophisticated data synthesis, are the key to successful dataset distillation, significantly improving data-efficient learning and challenging existing methods.
A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
·1812 words·9 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Zhejiang University
Depth-range-free MVS network using pose embedding achieves robust and accurate 3D reconstruction.
A General Protocol to Probe Large Vision Models for 3D Physical Understanding
·4012 words·19 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
π’ University of Oxford
Researchers developed a lightweight protocol to probe large vision models’ 3D physical understanding by training classifiers on model features for various scene properties (geometry, material, lightin…
A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration
·2500 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Zhejiang University
CAST: a novel consistency-aware spot-guided Transformer achieves state-of-the-art accuracy and efficiency in point cloud registration.