Posters

Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes

26 September 2024·193 words·1 min· loading · loading

Machine Learning Reinforcement Learning 🏢 Tel Aviv University

Warm-up-free policy optimization achieves rate-optimal regret in linear Markov decision processes, improving efficiency and dependence on problem parameters.

Warm-starting Push-Relabel

26 September 2024·1936 words·10 mins· loading · loading

AI Theory Optimization 🏢 UC Berkeley

This research introduces the first theoretical guarantees for warm-starting the celebrated Push-Relabel network flow algorithm, improving its speed using a predicted flow, while maintaining worst-case…

WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models

26 September 2024·2439 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

WAGLE: A novel weight attribution-guided LLM unlearning framework boosts unlearning performance by strategically identifying and manipulating influential model weights, achieving a better balance betw…

VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization

26 September 2024·2230 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), CASIA

VQ-Map leverages vector quantization to estimate bird’s-eye-view maps with unprecedented accuracy, setting new benchmarks.

Voxel Proposal Network via Multi-Frame Knowledge Distillation for Semantic Scene Completion

26 September 2024·2307 words·11 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tianjin University

VPNet, a novel semantic scene completion network, uses multi-frame knowledge distillation and confident voxel proposals to improve accuracy and handle dynamic aspects of 3D scenes from point clouds, a…

VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions

26 September 2024·2504 words·12 mins· loading · loading

AI Applications Robotics 🏢 Peking University

VLMimic: Vision-Language Models enable robots to master intricate actions using only a few human video demonstrations, surpassing existing methods by a significant margin.

VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance

26 September 2024·3011 words·15 mins· loading · loading

Computer Vision Visual Question Answering 🏢 UC San Diego

VLG-CBM enhances concept bottleneck models with vision-language guidance for faithful interpretability and improved accuracy.

Vivid-ZOO: Multi-View Video Generation with Diffusion Model

26 September 2024·2634 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 King Abdullah University of Science and Technology

Vivid-ZOO: Generating high-quality multi-view videos from text using a novel diffusion model.

Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

26 September 2024·3475 words·17 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Skywork AI

VITRON: a unified pixel-level Vision LLM excels in understanding, generating, segmenting, and editing images and videos.

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

26 September 2024·3551 words·17 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 University of Washington

Visual SKETCHPAD empowers multimodal language models (LLMs) with visual reasoning abilities by allowing them to generate intermediate sketches. This innovative framework substantially enhances LLM per…

Visual Prompt Tuning in Null Space for Continual Learning

26 September 2024·2254 words·11 mins· loading · loading

AI Generated Computer Vision Visual Question Answering 🏢 School of Computer Science, Northwestern Polytechnical University

This paper presents NSP², a novel method for visual prompt tuning in continual learning that leverages orthogonal projection to prevent catastrophic forgetting by tuning prompts orthogonal to previous…

Visual Pinwheel Center Act as Geometric Saliency Detector

26 September 2024·2189 words·11 mins· loading · loading

Computer Vision Image Classification 🏢 Research Institute of Intelligent Complex Systems, Fudan University

Visual pinwheel centers in the cortex act as efficient geometric saliency detectors, responding faster and stronger to complex spatial textures than other structures.

Visual Perception by Large Language Model’s Weights

26 September 2024·2070 words·10 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Tencent AI Lab

VLORA: Boosting Multimodal LLMs efficiency by merging visual features into model weights instead of extending input sequences.

Visual Fourier Prompt Tuning

26 September 2024·4269 words·21 mins· loading · loading

Computer Vision Image Classification 🏢 Rochester Institute of Technology

Visual Fourier Prompt Tuning (VFPT) leverages the Fast Fourier Transform to seamlessly integrate spatial and frequency information for superior parameter-efficient vision model fine-tuning, even with …

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

26 September 2024·4160 words·20 mins· loading · loading

Computer Vision Image Generation 🏢 Department of Biomedical Engineering, Southern University of Science and Technology

Researchers developed a novel zero-shot EEG-based framework for visual reconstruction using a tailored brain encoder and a two-stage image generation strategy, achieving state-of-the-art performance i…

Visual Data Diagnosis and Debiasing with Concept Graphs

26 September 2024·2767 words·13 mins· loading · loading

Computer Vision Image Classification 🏢 Carnegie Mellon University

CONBIAS tackles dataset bias by representing visual data as concept graphs, diagnosing imbalances via clique analysis, and debiasing through targeted data augmentation for improved model generalizatio…

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

26 September 2024·2150 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Chinese Academy of Sciences

AcFormer, a novel vision-language connector for MLLMs, leverages ‘visual anchors’ to reduce computation cost by ~66% while improving accuracy.

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

26 September 2024·4202 words·20 mins· loading · loading

AI Generated AI Applications Autonomous Vehicles 🏢 Hong Kong University of Science and Technology

Vista: a novel driving world model achieving high-fidelity prediction and versatile controllability, outperforming state-of-the-art models in generalization and prediction accuracy.

VisMin: Visual Minimal-Change Understanding

26 September 2024·2710 words·13 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Mila - Quebec AI Institute

VisMin benchmark evaluates visual-language models’ fine-grained understanding by identifying minimal image-text differences (object, attribute, count, spatial relation). Current VLMs struggle with sp…

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

26 September 2024·6701 words·32 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Tsinghua University

VisionLLM v2 unifies visual perception, understanding, and generation, excelling in various vision tasks and achieving performance comparable to task-specific models.