Skip to main content

Computer Vision

Learning Low-Rank Feature for Thorax Disease Classification
·3584 words·17 mins· loading · loading
AI Generated Computer Vision Image Classification 🏢 School of Computing and Augmented Intelligence, Arizona State University
Low-Rank Feature Learning (LRFL) significantly boosts thorax disease classification accuracy by reducing noise and background interference in medical images.
Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
·2288 words·11 mins· loading · loading
Computer Vision 3D Vision 🏢 Shenzhen Campus of Sun Yat-Sen University
Create animatable interacting hand avatars from a single image using a novel two-stage interaction-aware 3D Gaussian splatting framework!
Learning Image Priors Through Patch-Based Diffusion Models for Solving Inverse Problems
·3556 words·17 mins· loading · loading
Computer Vision Image Generation 🏢 University of Michigan
PaDIS: Patch-based diffusion inverse solver learns efficient image priors from image patches, enabling high-resolution inverse problem solutions with reduced computational costs and data needs.
Learning Group Actions on Latent Representations
·2124 words·10 mins· loading · loading
Computer Vision Image Generation 🏢 University of Virginia
This paper proposes a novel method to model group actions within autoencoders by learning these actions in the latent space, enhancing model versatility and improving performance in various real-world…
Learning from Pattern Completion: Self-supervised Controllable Generation
·3650 words·18 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Peking University
Self-Supervised Controllable Generation (SCG) framework achieves brain-like associative generation by using a modular autoencoder with equivariance constraints and a self-supervised pattern completion…
Learning from Offline Foundation Features with Tensor Augmentations
·1797 words·9 mins· loading · loading
Computer Vision Image Classification 🏢 KTH Royal Institute of Technology
LOFF-TA leverages offline foundation model features and tensor augmentations for efficient, resource-light training, achieving up to 37x faster training and 26x less GPU memory usage.
Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation
·2199 words·11 mins· loading · loading
Computer Vision Image Segmentation 🏢 Westlake University
FADA: a novel frequency-adapted learning scheme boosts domain-generalized semantic segmentation by decoupling style and content using Haar wavelets, achieving state-of-the-art results.
Learning Disentangled Representations for Perceptual Point Cloud Quality Assessment via Mutual Information Minimization
·1608 words·8 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Cooperative Medianet Innovation Center, Shanghai Jiao Tong University
DisPA: a novel disentangled representation learning framework for perceptual point cloud quality assessment achieves superior performance by minimizing mutual information between content and distortio…
Learning De-Biased Representations for Remote-Sensing Imagery
·2326 words·11 mins· loading · loading
Computer Vision Object Detection 🏢 Singapore Management University
DebLoRA: A novel unsupervised learning approach debiases LoRA for remote sensing imagery, boosting minor class performance without sacrificing major class accuracy.
Learning Commonality, Divergence and Variety for Unsupervised Visible-Infrared Person Re-identification
·1252 words·6 mins· loading · loading
Computer Vision Person Re-Identification 🏢 Institute of Artificial Intelligence, Xiamen University
Progressive Contrastive Learning with Hard & Dynamic Prototypes (PCLHD) revolutionizes unsupervised visible-infrared person re-identification by effectively capturing data commonality, divergence, and…
Learning Bregman Divergences with Application to Robustness
·2210 words·11 mins· loading · loading
Computer Vision Image Classification 🏢 ETH Zurich
Learned Bregman divergences significantly improve image corruption robustness in adversarial training.
Learning 3D Garment Animation from Trajectories of A Piece of Cloth
·2097 words·10 mins· loading · loading
Computer Vision 3D Vision 🏢 Nanyang Technological University
Animates diverse garments realistically from a single cloth’s trajectory using a disentangled learning approach and Energy Unit Network (EUNet).
Learning 3D Equivariant Implicit Function with Patch-Level Pose-Invariant Representation
·2788 words·14 mins· loading · loading
Computer Vision 3D Vision 🏢 Xi'an Jiaotong University
3D surface reconstruction revolutionized: PEIF leverages patch-level pose-invariant representations and 3D patch-level equivariance for state-of-the-art accuracy, even with varied poses and datasets!
LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling
·2913 words·14 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Tsinghua University
LCM: a novel, locally constrained, compact point cloud model surpasses Transformer-based methods by significantly improving performance and efficiency in various downstream tasks.
Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
·3093 words·15 mins· loading · loading
Computer Vision Image Generation 🏢 Artificial and Natural Intelligence Toulouse Institute
AI now draws almost as well as humans, thanks to novel latent diffusion model regularizations that mimic human cognitive biases.
Large Spatial Model: End-to-end Unposed Images to Semantic 3D
·1766 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 NVIDIA Research
Large Spatial Model (LSM) achieves real-time semantic 3D reconstruction from just two unposed images, unifying multiple 3D vision tasks in a single feed-forward pass.
LAM3D: Large Image-Point Clouds Alignment Model for 3D Reconstruction from Single Image
·2617 words·13 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Australian National University
LAM3D: A novel framework uses point cloud data to boost single-image 3D mesh reconstruction accuracy, achieving state-of-the-art results in just 6 seconds.
L4GM: Large 4D Gaussian Reconstruction Model
·2618 words·13 mins· loading · loading
Computer Vision 3D Vision 🏢 University of Toronto
L4GM: The first 4D model generating high-quality animated 3D objects from single-view videos in a single feed-forward pass.
L-TTA: Lightweight Test-Time Adaptation Using a Versatile Stem Layer
·2871 words·14 mins· loading · loading
AI Generated Computer Vision Image Classification 🏢 Seoul National University of Science and Technology
L-TTA: A lightweight test-time adaptation method using a versatile stem layer minimizes channel-wise uncertainty for rapid and memory-efficient adaptation to new domains.
KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis
·5238 words·25 mins· loading · loading
Computer Vision Image Generation 🏢 Electronics and Telecommunications Research Institute
KOALA: New efficient text-to-image diffusion models achieving 4x speed and 69% size reduction of SDXL, generating 1024px images on consumer GPUs with 8GB VRAM.