Computer Vision

MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

26 September 2024·2539 words·12 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Tsinghua University

MSPE empowers Vision Transformers to handle any image resolution by cleverly optimizing patch embedding, achieving superior performance on low-resolution images and comparable results on high-resoluti…

MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting

26 September 2024·3095 words·15 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 University of Science and Technology of China

MotionGS enhances deformable 3D Gaussian splatting for dynamic scenes by using motion flow to guide deformation, significantly improving reconstruction accuracy and outperforming state-of-the-art meth…

MotionCraft: Physics-Based Zero-Shot Video Generation

26 September 2024·2646 words·13 mins· loading · loading

Computer Vision Video Understanding 🏢 Politecnico Di Torino

MotionCraft: Physics-based zero-shot video generation creates realistic videos with complex motion dynamics by cleverly warping the noise latent space of an image diffusion model using optical flow fr…

Motion Graph Unleashed: A Novel Approach to Video Prediction

26 September 2024·2948 words·14 mins· loading · loading

Computer Vision Video Understanding 🏢 Microsoft

Motion Graph unleashes efficient and accurate video prediction by transforming video frames into interconnected graph nodes, capturing complex motion patterns with minimal computational cost.

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

26 September 2024·2797 words·14 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Microsoft

Boosting video diffusion: Motion Consistency Model (MCM) disentangles motion and appearance learning for high-fidelity, fast video generation using few sampling steps.

MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer

26 September 2024·2123 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Tongji University

MoTE: A novel framework harmonizes generalization and specialization for visual-language video knowledge transfer, achieving state-of-the-art results.

MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

26 September 2024·1750 words·9 mins· loading · loading

Computer Vision Object Detection 🏢 UCAS-Terminus AI Lab, University of Chinese Academy of Sciences, China

MonoMAE enhances monocular 3D object detection by using depth-aware masked autoencoders to effectively handle object occlusions, achieving superior performance on both occluded and non-occluded object…

MonkeySee: Space-time-resolved reconstructions of natural images from macaque multi-unit activity

26 September 2024·2882 words·14 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Donders Institute for Brain, Cognition and Behaviour

MonkeySee reconstructs natural images from macaque brain signals with high accuracy using a novel CNN decoder, advancing neural decoding and offering insights into visual perception.

MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

26 September 2024·2159 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Huazhong University of Science and Technology

MoE Jetpack efficiently transforms readily available dense checkpoints into high-performing MoE models, drastically accelerating convergence and improving accuracy.

Mixture of neural fields for heterogeneous reconstruction in cryo-EM

26 September 2024·4281 words·21 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Stanford University

Hydra: a novel cryo-EM reconstruction method resolves both conformational and compositional heterogeneity ab initio, enabling the analysis of complex, unpurified samples with state-of-the-art accuracy…

Mixture of Adversarial LoRAs: Boosting Robust Generalization in Meta-Tuning

26 September 2024·2777 words·14 mins· loading · loading

Computer Vision Few-Shot Learning 🏢 City University of Hong Kong

Boosting Robust Few-Shot Learning with Adversarial Meta-Tuning!

Mitigating Biases in Blackbox Feature Extractors for Image Classification Tasks

26 September 2024·2583 words·13 mins· loading · loading

Computer Vision Image Classification 🏢 Indian Institute of Science

Researchers propose a simple yet effective clustering-based adaptive margin loss to mitigate biases inherited by black-box feature extractors in image classification tasks.

Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration

26 September 2024·2514 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China

INTEGER: a novel unsupervised point cloud registration method leveraging feature-geometry coherence for reliable pseudo-label mining and density-invariant feature learning, achieving state-of-the-art …

Mind the Gap Between Prototypes and Images in Cross-domain Finetuning

26 September 2024·3462 words·17 mins· loading · loading

Computer Vision Few-Shot Learning 🏢 Hong Kong Baptist University

CoPA improves cross-domain few-shot learning by adapting separate transformations for prototype and image embeddings, significantly enhancing performance and revealing better representation clusters.

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

26 September 2024·1847 words·9 mins· loading · loading

Computer Vision Image Generation 🏢 Zhejiang University

MimicTalk generates realistic, expressive talking videos in minutes using a pre-trained model adapted to individual identities.

MIDGArD: Modular Interpretable Diffusion over Graphs for Articulated Designs

26 September 2024·2491 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Intel Labs

MIDGARD: Generate high-quality, simulatable 3D articulated assets with enhanced control and interpretability using a novel diffusion-based framework.

Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation

26 September 2024·4145 words·20 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Carnegie Mellon University

Humans as landmarks: A novel zero-shot monocular metric depth estimation method leverages generative models and human mesh recovery to transfer metric scale information, achieving superior generalizat…

MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning

26 September 2024·1904 words·9 mins· loading · loading

Computer Vision Image Segmentation 🏢 Tencent Youtu Lab

MetaUAS achieves universal visual anomaly segmentation using only one normal image prompt via a pure vision model, surpassing previous zero-shot, few-shot, and full-shot methods.

Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

26 September 2024·1996 words·10 mins· loading · loading

Computer Vision Few-Shot Learning 🏢 Northwestern Polytechnical University

Meta-Exploiting Frequency Prior enhances cross-domain few-shot learning by leveraging image frequency decomposition and consistency priors to improve model generalization and efficiency.

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

26 September 2024·2436 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Meta AI

Meta 3D AssetGen: High-quality text-to-mesh generation with realistic PBR materials and lighting, exceeding prior methods in speed and accuracy.