Skip to main content

Multimodal Understanding

Probabilistic Conformal Distillation for Enhancing Missing Modality Robustness
·3353 words·16 mins· loading · loading
AI Generated Multimodal Learning Multimodal Understanding 🏢 Shanghai Jiao Tong University
Enhance multimodal model robustness against missing data with Probabilistic Conformal Distillation (PCD)! PCD models missing modalities probabilistically, achieving superior performance on multiple be…
MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities
·3642 words·18 mins· loading · loading
Multimodal Learning Multimodal Understanding 🏢 ETH Zurich
MultiOOD benchmark and novel A2D & NP-Mix algorithms drastically improve multimodal out-of-distribution detection.
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
·2650 words·13 mins· loading · loading
Multimodal Learning Multimodal Understanding 🏢 Department of Computer Science Johns Hopkins University
FuseMoE, a novel mixture-of-experts transformer, efficiently fuses diverse and incomplete multimodal data, achieving superior predictive performance via a unique Laplace gating function.
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
·2128 words·10 mins· loading · loading
Multimodal Learning Multimodal Understanding 🏢 Shanghai AI Lab
Classifier-Guided Gradient Modulation (CGGM) enhances multimodal learning by balancing the training process, considering both gradient magnitude and direction, leading to consistent performance improv…
Animal-Bench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding
·2713 words·13 mins· loading · loading
Multimodal Learning Multimodal Understanding 🏢 Beijing University of Posts and Telecommunications
Animal-Bench, a new benchmark, comprehensively evaluates multimodal video models for animal-centric video understanding, featuring 13 diverse tasks across 7 animal categories and 819 species.