Computer Vision

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

26 September 2024·2662 words·13 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tencent PCG

MeshXL: Autoregressively generating high-quality 3D meshes using a novel Neural Coordinate Field (NeurCF) representation and large language model approaches.

Measuring Per-Unit Interpretability at Scale Without Humans

26 September 2024·4136 words·20 mins· loading · loading

Computer Vision Interpretability 🏢 Tübingen AI Center

New scalable method measures per-unit interpretability in vision DNNs without human evaluation, revealing anti-correlation between model performance and interpretability.

Measuring Dejavu Memorization Efficiently

26 September 2024·2794 words·14 mins· loading · loading

Computer Vision Representation Learning 🏢 FAIR at Meta

New method efficiently measures how well AI models memorize training data, revealing that open-source models memorize less than expected.

MC-DiT: Contextual Enhancement via Clean-to-Clean Reconstruction for Masked Diffusion Models

26 September 2024·2494 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

MC-DiT: A novel training paradigm for masked diffusion models achieving state-of-the-art image generation by leveraging clean-to-clean reconstruction.

MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation

26 September 2024·1960 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 Zhejiang University

MaskFactory generates high-quality synthetic data for dichotomous image segmentation, improving model training efficiency and accuracy.

Masked Pre-training Enables Universal Zero-shot Denoiser

26 September 2024·4914 words·24 mins· loading · loading

Computer Vision Image Generation 🏢 University of Science and Technology of China

Masked Pre-training empowers a universal, fast zero-shot image denoiser!

ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation

26 September 2024·2484 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Valeo.ai

ManiPose: Manifold-constrained multi-hypothesis model solves 3D human pose estimation’s depth ambiguity, outperforming state-of-the-art models in pose consistency.

MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging

26 September 2024·3150 words·15 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Harbin Institute of Technology (Shenzhen)

MambaSCI: Efficient, novel deep learning model reconstructs high-quality quad-Bayer video from compressed snapshots, surpassing existing methods.

MambaLLIE: Implicit Retinex-Aware Low Light Enhancement with Global-then-Local State Space

26 September 2024·2330 words·11 mins· loading · loading

Computer Vision Image Enhancement 🏢 Nanjing University of Science and Technology

MambaLLIE: a novel implicit Retinex-aware low-light enhancer using a global-then-local state space, significantly outperforms existing CNN and Transformer-based methods.

MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

26 September 2024·3318 words·16 mins· loading · loading

AI Generated Computer Vision Anomaly Detection 🏢 Zhejiang University Youtu Lab

MambaAD: Linear-complexity multi-class unsupervised anomaly detection using a novel Mamba-based decoder with Locality-Enhanced State Space modules.

LuSh-NeRF: Lighting up and Sharpening NeRFs for Low-light Scenes

26 September 2024·2414 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 City University of Hong Kong

LuSh-NeRF: A novel model reconstructs sharp, bright NeRFs from hand-held low-light photos by sequentially modeling and removing noise and blur, outperforming existing methods.

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

26 September 2024·3486 words·17 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Adobe Research

LRM-Zero: Training large reconstruction models solely on synthetic data, achieving quality comparable to real-data trained models.

LP-3DGS: Learning to Prune 3D Gaussian Splatting

26 September 2024·2308 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Johns Hopkins University

LP-3DGS learns to optimally prune 3D Gaussian splatting, achieving significant efficiency gains without compromising rendering quality via a trainable binary mask and the Gumbel-Sigmoid method.

LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate

26 September 2024·4927 words·24 mins· loading · loading

Computer Vision Image Classification 🏢 Carleton University

LookHere: Vision Transformers excel at high-resolution image classification by using 2D attention masks to direct attention heads, improving generalization and extrapolation.

Long-Tailed Out-of-Distribution Detection via Normalized Outlier Distribution Adaptation

26 September 2024·2335 words·11 mins· loading · loading

Computer Vision Object Detection 🏢 Beihang University

AdaptOD: a novel approach for robust OOD detection in long-tailed recognition, dynamically adapting outlier distributions to true OOD distributions using a dual-normalized energy loss for improved acc…

Long-tailed Object Detection Pretraining: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction

26 September 2024·2225 words·11 mins· loading · loading

Computer Vision Object Detection 🏢 Nanjing University of Science and Technology

Dynamic Rebalancing Contrastive Learning with Dual Reconstruction (2DRCL) pre-training significantly boosts object detection accuracy, especially for underrepresented classes.

Long-Range Feedback Spiking Network Captures Dynamic and Static Representations of the Visual Cortex under Movie Stimuli

26 September 2024·2020 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Peking University

Long-range feedback spiking network (LoRaFB-SNet) surpasses other models in capturing dynamic and static visual cortical representations under movie stimuli, advancing our understanding of visual syst…

LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment

26 September 2024·3487 words·17 mins· loading · loading

Computer Vision 3D Vision 🏢 National University of Defense Technology

LoD-Loc: A novel aerial visual localization method uses lightweight LoD 3D maps & neural wireframe alignment for accurate and efficient 6-DoF pose estimation, surpassing state-of-the-art methods.

LoCo: Learning 3D Location-Consistent Image Features with a Memory-Efficient Ranking Loss

26 September 2024·1960 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Oxford

LoCo: Memory-efficient location-consistent image features learned via a novel ranking loss, enabling three orders of magnitude memory improvement and outperforming state-of-the-art.

Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild

26 September 2024·3829 words·18 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Zhejiang University

CATOD framework improves text-to-image generation by actively learning high-quality training data to accurately depict out-of-distribution concepts.