Skip to main content

Posters

2024

Vision-Language Navigation with Energy-Based Policy
·1855 words·9 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Zhejiang University
Energy-based Navigation Policy (ENP) revolutionizes Vision-Language Navigation by modeling joint state-action distributions, achieving superior performance across diverse benchmarks.
Vision-Language Models are Strong Noisy Label Detectors
·2173 words·11 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 School of Computer Science and Engineering, Southeast University
Vision-language models effectively detect noisy labels, improving image classification accuracy with DEFT.
Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
·4915 words·24 mins· loading · loading
Computer Vision Image Classification 🏢 Singapore University of Technology and Design (SUTD)
OoD-ViT-NAS: a new benchmark reveals how ViT architecture impacts out-of-distribution generalization, highlighting the importance of embedding dimension and challenging the reliance on in-distribution…
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
·2294 words·11 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Tsinghua University
Latent Compression Learning (LCL) revolutionizes vision model pre-training by effectively leveraging readily available interleaved image-text data, achieving performance comparable to models trained o…
Vision Mamba Mender
·2136 words·11 mins· loading · loading
AI Generated Computer Vision Face Recognition 🏢 College of Computer Science and Technology, Zhejiang University
Vision Mamba Mender systematically optimizes the Mamba model by identifying and repairing internal and external state flaws, significantly improving its performance in visual recognition tasks.
Vision Foundation Model Enables Generalizable Object Pose Estimation
·3435 words·17 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Chinese University of Hong Kong
VFM-6D: a novel framework achieving generalizable object pose estimation for unseen categories by leveraging vision-language models.
VISA: Variational Inference with Sequential Sample-Average Approximations
·1502 words·8 mins· loading · loading
Machine Learning Variational Inference 🏢 Amsterdam Machine Learning Lab
VISA, a new variational inference method, significantly speeds up approximate inference for complex models by reusing model evaluations across multiple gradient steps, achieving comparable accuracy wi…
Virtual Scanning: Unsupervised Non-line-of-sight Imaging from Irregularly Undersampled Transients
·2316 words·11 mins· loading · loading
Computer Vision Image Generation 🏢 Tianjin University
Unsupervised learning framework enables high-fidelity non-line-of-sight (NLOS) imaging from irregularly undersampled transients, surpassing state-of-the-art methods in speed and robustness.
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation
·3245 words·16 mins· loading · loading
AI Applications Robotics 🏢 Shenzhen Campus of Sun Yat-Sen University
VidMan: a novel framework leverages video diffusion models and a two-stage training mechanism to significantly improve robot manipulation precision by effectively using robot trajectory data and impli…
VideoTetris: Towards Compositional Text-to-Video Generation
·2282 words·11 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Peking University
VideoTetris: a novel framework enabling compositional text-to-video generation by precisely following complex textual semantics through spatio-temporal compositional diffusion, achieving impressive qu…
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
·2766 words·13 mins· loading · loading
AI Generated Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China
VIDEOLLM-MOD boosts online video-language model efficiency by selectively skipping redundant vision token computations, achieving ~42% faster training and ~30% memory savings without sacrificing perfo…
Video Token Merging for Long Video Understanding
·2290 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Korea University
Researchers boost long-form video understanding efficiency by 6.89x and reduce memory usage by 84% using a novel learnable video token merging algorithm.
Video Diffusion Models are Training-free Motion Interpreter and Controller
·2252 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Peking University
Training-free video motion control achieved via novel Motion Feature (MOFT) extraction from existing video diffusion models, offering architecture-agnostic insights and high performance.
VFIMamba: Video Frame Interpolation with State Space Models
·2179 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Tencent AI Lab
VFIMamba uses state-space models for efficient and dynamic video frame interpolation, achieving state-of-the-art results by introducing a novel Mixed-SSM Block and curriculum learning.
VeXKD: The Versatile Integration of Cross-Modal Fusion and Knowledge Distillation for 3D Perception
·3369 words·16 mins· loading · loading
AI Applications Autonomous Vehicles 🏢 Hong Kong University of Science and Technology
VeXKD: A versatile framework boosts 3D perception by cleverly combining cross-modal fusion and knowledge distillation, improving single-modal student model accuracy without extra inference time.
Verified Safe Reinforcement Learning for Neural Network Dynamic Models
·1254 words·6 mins· loading · loading
Machine Learning Reinforcement Learning 🏢 Washington University in St. Louis
Learning verified safe neural network controllers for complex nonlinear systems is now possible, achieving an order of magnitude longer safety horizons than state-of-the-art methods while maintaining …
Verified Code Transpilation with LLMs
·2009 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 UC Berkeley
LLMLIFT: An LLM-powered approach builds verified lifting tools for DSLs, outperforming prior symbolic methods in benchmark transpilation and requiring less development effort.
Verifiably Robust Conformal Prediction
·1918 words·10 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 King's College London
VRCP, a new framework, uses neural network verification to make conformal prediction robust against adversarial attacks, supporting various norms and regression tasks.
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
·1706 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab
VeLoRA: Train massive LLMs efficiently by compressing intermediate activations!
Vector Quantization Prompting for Continual Learning
·1795 words·9 mins· loading · loading
AI Generated Machine Learning Continual Learning 🏢 Communication University of China
VQ-Prompt uses vector quantization to optimize discrete prompts for continual learning, achieving state-of-the-art performance by effectively abstracting task knowledge and optimizing prompt selection…