Computer Vision

Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

26 September 2024·4960 words·24 mins· loading · loading

AI Generated Computer Vision Few-Shot Learning 🏢 Australian Institute for Machine Learning

aTLAS: a novel parameter-efficient fine-tuning method using learned anisotropic scaling of task vectors for enhanced knowledge composition and transfer.

Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features

26 September 2024·2426 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Peking University

Key-Grid: An unsupervised 3D keypoint detector achieving state-of-the-art semantic consistency and accuracy for both rigid and deformable objects using novel grid heatmap features.

Just Add $100 More: Augmenting Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem

26 September 2024·4026 words·19 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 Korea University

Boost 3D object detection accuracy by augmenting pseudo-LiDAR point clouds!

Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models.

26 September 2024·3743 words·18 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 University of Glasgow

Pixelsmith: Generate gigapixel images with a single GPU, surpassing limitations of existing methods through a cascading approach and innovative guidance mechanism.

Is Multiple Object Tracking a Matter of Specialization?

26 September 2024·2391 words·12 mins· loading · loading

Computer Vision Object Detection 🏢 University of Modena and Reggio Emilia

PASTA: A novel modular framework boosts MOT tracker generalization by using parameter-efficient fine-tuning and avoiding negative interference through specialized modules for various scenario attribut…

IR-CM: The Fast and Universal Image Restoration Method Based on Consistency Model

26 September 2024·2449 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Huazhong University of Science and Technology

IR-CM: One-step image restoration using a novel consistency model for fast and universal performance.

IODA: Instance-Guided One-shot Domain Adaptation for Super-Resolution

26 September 2024·2808 words·14 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Nanjing University

IODA achieves efficient one-shot domain adaptation for super-resolution using a novel instance-guided strategy and image-level domain alignment, significantly improving performance with limited target…

Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps

26 September 2024·4066 words·20 mins· loading · loading

Computer Vision Image Generation 🏢 HSE University

Invertible Consistency Distillation (iCD) achieves high-quality image editing in ~7 steps by enabling both fast editing and strong generation using a generalized distillation framework and dynamic cla…

Interpreting the Weight Space of Customized Diffusion Models

26 September 2024·3822 words·18 mins· loading · loading

Computer Vision Image Generation 🏢 UC Berkeley

Researchers model a manifold of customized diffusion models as a subspace of weights, enabling controllable creation of new models via sampling, editing, and inversion from a single image.

Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors

26 September 2024·1664 words·8 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 York University

Interpretable lightweight transformers are built by unrolling graph smoothness priors, achieving high performance with significantly fewer parameters than conventional transformers.

Interpretable Image Classification with Adaptive Prototype-based Vision Transformers

26 September 2024·5008 words·24 mins· loading · loading

Computer Vision Image Classification 🏢 Dartmouth College

ProtoViT: a novel interpretable image classification method using Vision Transformers and adaptive prototypes, achieving higher accuracy and providing clear explanations.

Integrating Deep Metric Learning with Coreset for Active Learning in 3D Segmentation

26 September 2024·2210 words·11 mins· loading · loading

Computer Vision Image Segmentation 🏢 UC Los Angeles

Deep metric learning and Coreset integration enables efficient slice-based active learning for 3D medical segmentation, surpassing existing methods in performance with low annotation budgets.

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation

26 September 2024·2536 words·12 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 School of Computer Science and Engineering, Southeast University

LeTs: Learnable Transformation efficiently initializes variable-sized Vision Transformers by learning adaptable transformations from a compact learngene module, outperforming from-scratch training.

Infinite-Dimensional Feature Interaction

26 September 2024·1877 words·9 mins· loading · loading

Computer Vision Image Classification 🏢 Peking University

InfiNet achieves state-of-the-art results by enabling feature interaction in an infinite-dimensional space using RBF kernels, surpassing models limited to finite-dimensional interactions.

Inferring Neural Signed Distance Functions by Overfitting on Single Noisy Point Clouds through Finetuning Data-Driven based Priors

26 September 2024·3586 words·17 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

This research presents LocalN2NM, a novel method for inferring neural signed distance functions (SDF) from single, noisy point clouds by finetuning data-driven priors, achieving faster inference and b…

Incorporating Test-Time Optimization into Training with Dual Networks for Human Mesh Recovery

26 September 2024·2718 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 South China University of Technology

Meta-learning enhances human mesh recovery by unifying training and test-time objectives, significantly improving accuracy and generalization.

In-N-Out: Lifting 2D Diffusion Prior for 3D Object Removal via Tuning-Free Latents Alignment

26 September 2024·2437 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Melbourne

In-N-Out: Lifting 2D Diffusion Priors for 3D Object Removal via Tuning-Free Latents Alignment enhances 3D scene reconstruction by aligning 2D diffusion model latents for consistent multi-view inpainti…

In-Context Symmetries: Self-Supervised Learning through Contextual World Models

26 September 2024·3570 words·17 mins· loading · loading

Computer Vision Self-Supervised Learning 🏢 MIT CSAIL

CONTEXTSSL: A novel self-supervised learning algorithm that adapts to task-specific symmetries by using context, achieving significant performance gains over existing methods.

In Pursuit of Causal Label Correlations for Multi-label Image Recognition

26 September 2024·2377 words·12 mins· loading · loading

Computer Vision Image Classification 🏢 Wenzhou University

This research leverages causal intervention to identify and utilize genuine label correlations in multi-label image recognition, mitigating contextual bias for improved accuracy.

Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection

26 September 2024·2484 words·12 mins· loading · loading

Computer Vision Image Segmentation 🏢 School of Computer Science, Fudan University

Active Viewpoint Selection (AVS) significantly improves viewpoint-independent object-centric representations by actively selecting the most informative viewpoints for each scene, leading to better seg…