Posters

Vision-Language Navigation with Energy-Based Policy

26 September 2024·1855 words·9 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Zhejiang University

Energy-based Navigation Policy (ENP) revolutionizes Vision-Language Navigation by modeling joint state-action distributions, achieving superior performance across diverse benchmarks.

Vision-Language Models are Strong Noisy Label Detectors

26 September 2024·2173 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 School of Computer Science and Engineering, Southeast University

Vision-language models effectively detect noisy labels, improving image classification accuracy with DEFT.

Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights

26 September 2024·4915 words·24 mins· loading · loading

Computer Vision Image Classification 🏢 Singapore University of Technology and Design (SUTD)

OoD-ViT-NAS: a new benchmark reveals how ViT architecture impacts out-of-distribution generalization, highlighting the importance of embedding dimension and challenging the reliance on in-distribution…

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

26 September 2024·2294 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Tsinghua University

Latent Compression Learning (LCL) revolutionizes vision model pre-training by effectively leveraging readily available interleaved image-text data, achieving performance comparable to models trained o…

Vision Mamba Mender

26 September 2024·2136 words·11 mins· loading · loading

AI Generated Computer Vision Face Recognition 🏢 College of Computer Science and Technology, Zhejiang University

Vision Mamba Mender systematically optimizes the Mamba model by identifying and repairing internal and external state flaws, significantly improving its performance in visual recognition tasks.

Vision Foundation Model Enables Generalizable Object Pose Estimation

26 September 2024·3435 words·17 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Chinese University of Hong Kong

VFM-6D: a novel framework achieving generalizable object pose estimation for unseen categories by leveraging vision-language models.

VISA: Variational Inference with Sequential Sample-Average Approximations

26 September 2024·1502 words·8 mins· loading · loading

Machine Learning Variational Inference 🏢 Amsterdam Machine Learning Lab

VISA, a new variational inference method, significantly speeds up approximate inference for complex models by reusing model evaluations across multiple gradient steps, achieving comparable accuracy wi…

Virtual Scanning: Unsupervised Non-line-of-sight Imaging from Irregularly Undersampled Transients

26 September 2024·2316 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Tianjin University

Unsupervised learning framework enables high-fidelity non-line-of-sight (NLOS) imaging from irregularly undersampled transients, surpassing state-of-the-art methods in speed and robustness.

VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation

26 September 2024·3245 words·16 mins· loading · loading

AI Applications Robotics 🏢 Shenzhen Campus of Sun Yat-Sen University

VidMan: a novel framework leverages video diffusion models and a two-stage training mechanism to significantly improve robot manipulation precision by effectively using robot trajectory data and impli…

VideoTetris: Towards Compositional Text-to-Video Generation

26 September 2024·2282 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Peking University

VideoTetris: a novel framework enabling compositional text-to-video generation by precisely following complex textual semantics through spatio-temporal compositional diffusion, achieving impressive qu…

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

26 September 2024·2766 words·13 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China

VIDEOLLM-MOD boosts online video-language model efficiency by selectively skipping redundant vision token computations, achieving ~42% faster training and ~30% memory savings without sacrificing perfo…

Video Token Merging for Long Video Understanding

26 September 2024·2290 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Korea University

Researchers boost long-form video understanding efficiency by 6.89x and reduce memory usage by 84% using a novel learnable video token merging algorithm.

Video Diffusion Models are Training-free Motion Interpreter and Controller

26 September 2024·2252 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Peking University

Training-free video motion control achieved via novel Motion Feature (MOFT) extraction from existing video diffusion models, offering architecture-agnostic insights and high performance.

VFIMamba: Video Frame Interpolation with State Space Models

26 September 2024·2179 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Tencent AI Lab

VFIMamba uses state-space models for efficient and dynamic video frame interpolation, achieving state-of-the-art results by introducing a novel Mixed-SSM Block and curriculum learning.

VeXKD: The Versatile Integration of Cross-Modal Fusion and Knowledge Distillation for 3D Perception

26 September 2024·3369 words·16 mins· loading · loading

AI Applications Autonomous Vehicles 🏢 Hong Kong University of Science and Technology

VeXKD: A versatile framework boosts 3D perception by cleverly combining cross-modal fusion and knowledge distillation, improving single-modal student model accuracy without extra inference time.

Verified Safe Reinforcement Learning for Neural Network Dynamic Models

26 September 2024·1254 words·6 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Washington University in St. Louis

Learning verified safe neural network controllers for complex nonlinear systems is now possible, achieving an order of magnitude longer safety horizons than state-of-the-art methods while maintaining …

Verified Code Transpilation with LLMs

26 September 2024·2009 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

LLMLIFT: An LLM-powered approach builds verified lifting tools for DSLs, outperforming prior symbolic methods in benchmark transpilation and requiring less development effort.

Verifiably Robust Conformal Prediction

26 September 2024·1918 words·10 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 King's College London

VRCP, a new framework, uses neural network verification to make conformal prediction robust against adversarial attacks, supporting various norms and regression tasks.

VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

26 September 2024·1706 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab

VeLoRA: Train massive LLMs efficiently by compressing intermediate activations!

Vector Quantization Prompting for Continual Learning

26 September 2024·1795 words·9 mins· loading · loading

AI Generated Machine Learning Continual Learning 🏢 Communication University of China

VQ-Prompt uses vector quantization to optimize discrete prompts for continual learning, achieving state-of-the-art performance by effectively abstracting task knowledge and optimizing prompt selection…