Skip to main content

Computer Vision

Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects
·9001 words·43 mins· loading · loading
AI Generated Computer Vision Visual Question Answering 🏢 Brown University
Vision transformers surprisingly struggle with visual relations; this study reveals ViTs use distinct perceptual and relational processing stages to solve same/different tasks, highlighting a previous…
Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection
·2466 words·12 mins· loading · loading
Computer Vision Video Understanding 🏢 Chongqing University of Posts and Telecommunications
Beyond Euclidean spaces, Dual-Space Representation Learning (DSRL) enhances weakly supervised video violence detection by cleverly integrating Euclidean and hyperbolic geometries for superior discrimi…
Beyond Accuracy: Tracking more like Human via Visual Search
·2966 words·14 mins· loading · loading
Computer Vision Video Understanding 🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
CPDTrack: Human-like Visual Search Boosts Object Tracking!
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation
·3663 words·18 mins· loading · loading
Computer Vision 3D Vision 🏢 ETH Zurich
BetterDepth: A plug-and-play diffusion refiner boosts zero-shot monocular depth estimation by adding fine details while preserving accurate geometry.
BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models
·3084 words·15 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Zhejiang University
O-BELM, a novel diffusion model sampler, achieves mathematically exact inversion with superior sampling quality, offering a new gold standard for diffusion model applications.
Be Confident in What You Know: Bayesian Parameter Efficient Fine-Tuning of Vision Foundation Models
·3138 words·15 mins· loading · loading
Computer Vision Few-Shot Learning 🏢 Rochester Institute of Technology
Bayesian-PEFT boosts vision model accuracy and confidence in few-shot learning by integrating Bayesian components into PEFT, solving the underconfidence problem.
AverNet: All-in-one Video Restoration for Time-varying Unknown Degradations
·2558 words·13 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 College of Computer Science, Sichuan University, China
AverNet: All-in-one video restoration defying time-varying unknown degradations.
Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs
·3190 words·15 mins· loading · loading
AI Generated Computer Vision Image Segmentation 🏢 Fudan University
GNNs automate multi-dataset semantic segmentation label unification, improving model training efficiency and performance by resolving conflicts across label spaces.
AUCSeg: AUC-oriented Pixel-level Long-tail Semantic Segmentation
·3221 words·16 mins· loading · loading
Computer Vision Image Segmentation 🏢 Key Lab. of Intelligent Information Processing, Institute of Computing Technology, CAS
AUCSeg tackles pixel-level long-tail semantic segmentation by introducing an AUC-oriented loss function and a Tail-Classes Memory Bank to efficiently manage memory and improve performance on imbalance…
Attention Temperature Matters in ViT-Based Cross-Domain Few-Shot Learning
·2330 words·11 mins· loading · loading
Computer Vision Few-Shot Learning 🏢 Huazhong University of Science and Technology
Boosting Vision Transformer’s transferability in cross-domain few-shot learning is achieved by a simple yet effective method: strategically adjusting attention temperature to remedy ineffective target…
Attack-Resilient Image Watermarking Using Stable Diffusion
·3069 words·15 mins· loading · loading
Computer Vision Image Generation 🏢 University of Massachusetts Amherst
ZoDiac: a novel image watermarking framework leveraging pre-trained stable diffusion models for robust, invisible watermarks resistant to state-of-the-art attacks.
Asynchronous Perception Machine for Efficient Test Time Training
·5559 words·27 mins· loading · loading
AI Generated Computer Vision Image Classification 🏢 University of Central Florida
APM: Asynchronous Perception Machine, a computationally-efficient architecture for test-time training (TTT), processes image patches asynchronously, encoding semantic awareness without pre-training, a…
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
·2720 words·13 mins· loading · loading
Computer Vision Image Generation 🏢 National University of Singapore
AsyncDiff accelerates diffusion model inference by 2.8x using asynchronous denoising and model parallelism, maintaining near-perfect image quality.
Assembly Fuzzy Representation on Hypergraph for Open-Set 3D Object Retrieval
·2034 words·10 mins· loading · loading
Computer Vision 3D Vision 🏢 Tsinghua University
Hypergraph-Based Assembly Fuzzy Representation (HAFR) excels at open-set 3D object retrieval by using part-level shapes and fuzzy representations to overcome challenges posed by unseen object categori…
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation
·2478 words·12 mins· loading · loading
Computer Vision Image Generation 🏢 Snap Inc.
AsCAN, a novel hybrid architecture, achieves superior efficiency and performance in image recognition and generation by asymmetrically combining convolutional and transformer blocks.
Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis
·3848 words·19 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 University of Edinburgh
Unsupervised Articulated Object Modeling using Conditional View Synthesis learns pose and part segmentation from only two object observations, achieving significantly better performance than previous …
ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users
·3873 words·19 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Nanyang Technological University
ART: A novel automatic red-teaming framework reveals safety vulnerabilities in popular text-to-image models by identifying unsafe outputs even from seemingly harmless prompts.
Are nuclear masks all you need for improved out-of-domain generalisation? A closer look at cancer classification in histopathology
·3878 words·19 mins· loading · loading
Computer Vision Image Classification 🏢 University of Oslo
Focusing on nuclear morphology improves out-of-domain generalization in cancer classification from histopathology images by leveraging nuclear segmentation masks during training.
Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?
·3060 words·15 mins· loading · loading
Computer Vision Image Classification 🏢 Agency for Science, Technology and Research, Singapore
Large-scale dataset distillation can be achieved with significantly less soft labels by using class-wise supervision during image synthesis, enabling simple random label pruning and enhancing model ac…
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
·2501 words·12 mins· loading · loading
Computer Vision Image Generation 🏢 Aalto University
Boosting image generation: Applying guidance selectively during diffusion model sampling drastically enhances image quality and inference speed, achieving state-of-the-art results.