Skip to main content

Computer Vision

MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
·2539 words·12 mins· loading · loading
AI Generated Computer Vision Image Classification 🏢 Tsinghua University
MSPE empowers Vision Transformers to handle any image resolution by cleverly optimizing patch embedding, achieving superior performance on low-resolution images and comparable results on high-resoluti…
MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
·3095 words·15 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 University of Science and Technology of China
MotionGS enhances deformable 3D Gaussian splatting for dynamic scenes by using motion flow to guide deformation, significantly improving reconstruction accuracy and outperforming state-of-the-art meth…
MotionCraft: Physics-Based Zero-Shot Video Generation
·2646 words·13 mins· loading · loading
Computer Vision Video Understanding 🏢 Politecnico Di Torino
MotionCraft: Physics-based zero-shot video generation creates realistic videos with complex motion dynamics by cleverly warping the noise latent space of an image diffusion model using optical flow fr…
Motion Graph Unleashed: A Novel Approach to Video Prediction
·2948 words·14 mins· loading · loading
Computer Vision Video Understanding 🏢 Microsoft
Motion Graph unleashes efficient and accurate video prediction by transforming video frames into interconnected graph nodes, capturing complex motion patterns with minimal computational cost.
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
·2797 words·14 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Microsoft
Boosting video diffusion: Motion Consistency Model (MCM) disentangles motion and appearance learning for high-fidelity, fast video generation using few sampling steps.
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
·2123 words·10 mins· loading · loading
Computer Vision Video Understanding 🏢 Tongji University
MoTE: A novel framework harmonizes generalization and specialization for visual-language video knowledge transfer, achieving state-of-the-art results.
MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders
·1750 words·9 mins· loading · loading
Computer Vision Object Detection 🏢 UCAS-Terminus AI Lab, University of Chinese Academy of Sciences, China
MonoMAE enhances monocular 3D object detection by using depth-aware masked autoencoders to effectively handle object occlusions, achieving superior performance on both occluded and non-occluded object…
MonkeySee: Space-time-resolved reconstructions of natural images from macaque multi-unit activity
·2882 words·14 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Donders Institute for Brain, Cognition and Behaviour
MonkeySee reconstructs natural images from macaque brain signals with high accuracy using a novel CNN decoder, advancing neural decoding and offering insights into visual perception.
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
·2159 words·11 mins· loading · loading
Computer Vision Image Classification 🏢 Huazhong University of Science and Technology
MoE Jetpack efficiently transforms readily available dense checkpoints into high-performing MoE models, drastically accelerating convergence and improving accuracy.
Mixture of neural fields for heterogeneous reconstruction in cryo-EM
·4281 words·21 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Stanford University
Hydra: a novel cryo-EM reconstruction method resolves both conformational and compositional heterogeneity ab initio, enabling the analysis of complex, unpurified samples with state-of-the-art accuracy…
Mixture of Adversarial LoRAs: Boosting Robust Generalization in Meta-Tuning
·2777 words·14 mins· loading · loading
Computer Vision Few-Shot Learning 🏢 City University of Hong Kong
Boosting Robust Few-Shot Learning with Adversarial Meta-Tuning!
Mitigating Biases in Blackbox Feature Extractors for Image Classification Tasks
·2583 words·13 mins· loading · loading
Computer Vision Image Classification 🏢 Indian Institute of Science
Researchers propose a simple yet effective clustering-based adaptive margin loss to mitigate biases inherited by black-box feature extractors in image classification tasks.
Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration
·2514 words·12 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China
INTEGER: a novel unsupervised point cloud registration method leveraging feature-geometry coherence for reliable pseudo-label mining and density-invariant feature learning, achieving state-of-the-art …
Mind the Gap Between Prototypes and Images in Cross-domain Finetuning
·3462 words·17 mins· loading · loading
Computer Vision Few-Shot Learning 🏢 Hong Kong Baptist University
CoPA improves cross-domain few-shot learning by adapting separate transformations for prototype and image embeddings, significantly enhancing performance and revealing better representation clusters.
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
·1847 words·9 mins· loading · loading
Computer Vision Image Generation 🏢 Zhejiang University
MimicTalk generates realistic, expressive talking videos in minutes using a pre-trained model adapted to individual identities.
MIDGArD: Modular Interpretable Diffusion over Graphs for Articulated Designs
·2491 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Intel Labs
MIDGARD: Generate high-quality, simulatable 3D articulated assets with enhanced control and interpretability using a novel diffusion-based framework.
Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation
·4145 words·20 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Carnegie Mellon University
Humans as landmarks: A novel zero-shot monocular metric depth estimation method leverages generative models and human mesh recovery to transfer metric scale information, achieving superior generalizat…
MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning
·1904 words·9 mins· loading · loading
Computer Vision Image Segmentation 🏢 Tencent Youtu Lab
MetaUAS achieves universal visual anomaly segmentation using only one normal image prompt via a pure vision model, surpassing previous zero-shot, few-shot, and full-shot methods.
Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning
·1996 words·10 mins· loading · loading
Computer Vision Few-Shot Learning 🏢 Northwestern Polytechnical University
Meta-Exploiting Frequency Prior enhances cross-domain few-shot learning by leveraging image frequency decomposition and consistency priors to improve model generalization and efficiency.
Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials
·2436 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Meta AI
Meta 3D AssetGen: High-quality text-to-mesh generation with realistic PBR materials and lighting, exceeding prior methods in speed and accuracy.