Computer Vision

Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects

26 September 2024·9001 words·43 mins· loading · loading

AI Generated Computer Vision Visual Question Answering 🏢 Brown University

Vision transformers surprisingly struggle with visual relations; this study reveals ViTs use distinct perceptual and relational processing stages to solve same/different tasks, highlighting a previous…

Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection

26 September 2024·2466 words·12 mins· loading · loading

Computer Vision Video Understanding 🏢 Chongqing University of Posts and Telecommunications

Beyond Euclidean spaces, Dual-Space Representation Learning (DSRL) enhances weakly supervised video violence detection by cleverly integrating Euclidean and hyperbolic geometries for superior discrimi…

Beyond Accuracy: Tracking more like Human via Visual Search

26 September 2024·2966 words·14 mins· loading · loading

Computer Vision Video Understanding 🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences

CPDTrack: Human-like Visual Search Boosts Object Tracking!

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

26 September 2024·3663 words·18 mins· loading · loading

Computer Vision 3D Vision 🏢 ETH Zurich

BetterDepth: A plug-and-play diffusion refiner boosts zero-shot monocular depth estimation by adding fine details while preserving accurate geometry.

BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models

26 September 2024·3084 words·15 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Zhejiang University

O-BELM, a novel diffusion model sampler, achieves mathematically exact inversion with superior sampling quality, offering a new gold standard for diffusion model applications.

Be Confident in What You Know: Bayesian Parameter Efficient Fine-Tuning of Vision Foundation Models

26 September 2024·3138 words·15 mins· loading · loading

Computer Vision Few-Shot Learning 🏢 Rochester Institute of Technology

Bayesian-PEFT boosts vision model accuracy and confidence in few-shot learning by integrating Bayesian components into PEFT, solving the underconfidence problem.

AverNet: All-in-one Video Restoration for Time-varying Unknown Degradations

26 September 2024·2558 words·13 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 College of Computer Science, Sichuan University, China

AverNet: All-in-one video restoration defying time-varying unknown degradations.

Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs

26 September 2024·3190 words·15 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 Fudan University

GNNs automate multi-dataset semantic segmentation label unification, improving model training efficiency and performance by resolving conflicts across label spaces.

AUCSeg: AUC-oriented Pixel-level Long-tail Semantic Segmentation

26 September 2024·3221 words·16 mins· loading · loading

Computer Vision Image Segmentation 🏢 Key Lab. of Intelligent Information Processing, Institute of Computing Technology, CAS

AUCSeg tackles pixel-level long-tail semantic segmentation by introducing an AUC-oriented loss function and a Tail-Classes Memory Bank to efficiently manage memory and improve performance on imbalance…

Attention Temperature Matters in ViT-Based Cross-Domain Few-Shot Learning

26 September 2024·2330 words·11 mins· loading · loading

Computer Vision Few-Shot Learning 🏢 Huazhong University of Science and Technology

Boosting Vision Transformer’s transferability in cross-domain few-shot learning is achieved by a simple yet effective method: strategically adjusting attention temperature to remedy ineffective target…

Attack-Resilient Image Watermarking Using Stable Diffusion

26 September 2024·3069 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 University of Massachusetts Amherst

ZoDiac: a novel image watermarking framework leveraging pre-trained stable diffusion models for robust, invisible watermarks resistant to state-of-the-art attacks.

Asynchronous Perception Machine for Efficient Test Time Training

26 September 2024·5559 words·27 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 University of Central Florida

APM: Asynchronous Perception Machine, a computationally-efficient architecture for test-time training (TTT), processes image patches asynchronously, encoding semantic awareness without pre-training, a…

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

26 September 2024·2720 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 National University of Singapore

AsyncDiff accelerates diffusion model inference by 2.8x using asynchronous denoising and model parallelism, maintaining near-perfect image quality.

Assembly Fuzzy Representation on Hypergraph for Open-Set 3D Object Retrieval

26 September 2024·2034 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

Hypergraph-Based Assembly Fuzzy Representation (HAFR) excels at open-set 3D object retrieval by using part-level shapes and fuzzy representations to overcome challenges posed by unseen object categori…

AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation

26 September 2024·2478 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Snap Inc.

AsCAN, a novel hybrid architecture, achieves superior efficiency and performance in image recognition and generation by asymmetrically combining convolutional and transformer blocks.

Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis

26 September 2024·3848 words·19 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Edinburgh

Unsupervised Articulated Object Modeling using Conditional View Synthesis learns pose and part segmentation from only two object observations, achieving significantly better performance than previous …

ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users

26 September 2024·3873 words·19 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Nanyang Technological University

ART: A novel automatic red-teaming framework reveals safety vulnerabilities in popular text-to-image models by identifying unsafe outputs even from seemingly harmless prompts.

Are nuclear masks all you need for improved out-of-domain generalisation? A closer look at cancer classification in histopathology

26 September 2024·3878 words·19 mins· loading · loading

Computer Vision Image Classification 🏢 University of Oslo

Focusing on nuclear morphology improves out-of-domain generalization in cancer classification from histopathology images by leveraging nuclear segmentation masks during training.

Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?

26 September 2024·3060 words·15 mins· loading · loading

Computer Vision Image Classification 🏢 Agency for Science, Technology and Research, Singapore

Large-scale dataset distillation can be achieved with significantly less soft labels by using class-wise supervision during image synthesis, enabling simple random label pruning and enhancing model ac…

Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

26 September 2024·2501 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Aalto University

Boosting image generation: Applying guidance selectively during diffusion model sampling drastically enhances image quality and inference speed, achieving state-of-the-art results.