Computer Vision
MeshXL: Neural Coordinate Field for Generative 3D Foundation Models
·2662 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
π’ Tencent PCG
MeshXL: Autoregressively generating high-quality 3D meshes using a novel Neural Coordinate Field (NeurCF) representation and large language model approaches.
Measuring Per-Unit Interpretability at Scale Without Humans
·4136 words·20 mins·
loading
·
loading
Computer Vision
Interpretability
π’ TΓΌbingen AI Center
New scalable method measures per-unit interpretability in vision DNNs without human evaluation, revealing anti-correlation between model performance and interpretability.
Measuring Dejavu Memorization Efficiently
·2794 words·14 mins·
loading
·
loading
Computer Vision
Representation Learning
π’ FAIR at Meta
New method efficiently measures how well AI models memorize training data, revealing that open-source models memorize less than expected.
MC-DiT: Contextual Enhancement via Clean-to-Clean Reconstruction for Masked Diffusion Models
·2494 words·12 mins·
loading
·
loading
Computer Vision
Image Generation
π’ Shanghai Jiao Tong University
MC-DiT: A novel training paradigm for masked diffusion models achieving state-of-the-art image generation by leveraging clean-to-clean reconstruction.
MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
·1960 words·10 mins·
loading
·
loading
Computer Vision
Image Segmentation
π’ Zhejiang University
MaskFactory generates high-quality synthetic data for dichotomous image segmentation, improving model training efficiency and accuracy.
Masked Pre-training Enables Universal Zero-shot Denoiser
·4914 words·24 mins·
loading
·
loading
Computer Vision
Image Generation
π’ University of Science and Technology of China
Masked Pre-training empowers a universal, fast zero-shot image denoiser!
ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation
·2484 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Valeo.ai
ManiPose: Manifold-constrained multi-hypothesis model solves 3D human pose estimation’s depth ambiguity, outperforming state-of-the-art models in pose consistency.
MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging
·3150 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
Video Understanding
π’ Harbin Institute of Technology (Shenzhen)
MambaSCI: Efficient, novel deep learning model reconstructs high-quality quad-Bayer video from compressed snapshots, surpassing existing methods.
MambaLLIE: Implicit Retinex-Aware Low Light Enhancement with Global-then-Local State Space
·2330 words·11 mins·
loading
·
loading
Computer Vision
Image Enhancement
π’ Nanjing University of Science and Technology
MambaLLIE: a novel implicit Retinex-aware low-light enhancer using a global-then-local state space, significantly outperforms existing CNN and Transformer-based methods.
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection
·3318 words·16 mins·
loading
·
loading
AI Generated
Computer Vision
Anomaly Detection
π’ Zhejiang University Youtu Lab
MambaAD: Linear-complexity multi-class unsupervised anomaly detection using a novel Mamba-based decoder with Locality-Enhanced State Space modules.
LuSh-NeRF: Lighting up and Sharpening NeRFs for Low-light Scenes
·2414 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
π’ City University of Hong Kong
LuSh-NeRF: A novel model reconstructs sharp, bright NeRFs from hand-held low-light photos by sequentially modeling and removing noise and blur, outperforming existing methods.
LRM-Zero: Training Large Reconstruction Models with Synthesized Data
·3486 words·17 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
π’ Adobe Research
LRM-Zero: Training large reconstruction models solely on synthetic data, achieving quality comparable to real-data trained models.
LP-3DGS: Learning to Prune 3D Gaussian Splatting
·2308 words·11 mins·
loading
·
loading
Computer Vision
3D Vision
π’ Johns Hopkins University
LP-3DGS learns to optimally prune 3D Gaussian splatting, achieving significant efficiency gains without compromising rendering quality via a trainable binary mask and the Gumbel-Sigmoid method.
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
·4927 words·24 mins·
loading
·
loading
Computer Vision
Image Classification
π’ Carleton University
LookHere: Vision Transformers excel at high-resolution image classification by using 2D attention masks to direct attention heads, improving generalization and extrapolation.
Long-Tailed Out-of-Distribution Detection via Normalized Outlier Distribution Adaptation
·2335 words·11 mins·
loading
·
loading
Computer Vision
Object Detection
π’ Beihang University
AdaptOD: a novel approach for robust OOD detection in long-tailed recognition, dynamically adapting outlier distributions to true OOD distributions using a dual-normalized energy loss for improved acc…
Long-tailed Object Detection Pretraining: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction
·2225 words·11 mins·
loading
·
loading
Computer Vision
Object Detection
π’ Nanjing University of Science and Technology
Dynamic Rebalancing Contrastive Learning with Dual Reconstruction (2DRCL) pre-training significantly boosts object detection accuracy, especially for underrepresented classes.
Long-Range Feedback Spiking Network Captures Dynamic and Static Representations of the Visual Cortex under Movie Stimuli
·2020 words·10 mins·
loading
·
loading
Computer Vision
Video Understanding
π’ Peking University
Long-range feedback spiking network (LoRaFB-SNet) surpasses other models in capturing dynamic and static visual cortical representations under movie stimuli, advancing our understanding of visual syst…
LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment
·3487 words·17 mins·
loading
·
loading
Computer Vision
3D Vision
π’ National University of Defense Technology
LoD-Loc: A novel aerial visual localization method uses lightweight LoD 3D maps & neural wireframe alignment for accurate and efficient 6-DoF pose estimation, surpassing state-of-the-art methods.
LoCo: Learning 3D Location-Consistent Image Features with a Memory-Efficient Ranking Loss
·1960 words·10 mins·
loading
·
loading
Computer Vision
3D Vision
π’ University of Oxford
LoCo: Memory-efficient location-consistent image features learned via a novel ranking loss, enabling three orders of magnitude memory improvement and outperforming state-of-the-art.
Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild
·3829 words·18 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
π’ Zhejiang University
CATOD framework improves text-to-image generation by actively learning high-quality training data to accurately depict out-of-distribution concepts.