Posters
2024
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
·193 words·1 min·
loading
·
loading
Machine Learning
Reinforcement Learning
π’ Tel Aviv University
Warm-up-free policy optimization achieves rate-optimal regret in linear Markov decision processes, improving efficiency and dependence on problem parameters.
Warm-starting Push-Relabel
·1936 words·10 mins·
loading
·
loading
AI Theory
Optimization
π’ UC Berkeley
This research introduces the first theoretical guarantees for warm-starting the celebrated Push-Relabel network flow algorithm, improving its speed using a predicted flow, while maintaining worst-case…
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
·2439 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ IBM Research
WAGLE: A novel weight attribution-guided LLM unlearning framework boosts unlearning performance by strategically identifying and manipulating influential model weights, achieving a better balance betw…
VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization
·2230 words·11 mins·
loading
·
loading
Computer Vision
3D Vision
π’ State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), CASIA
VQ-Map leverages vector quantization to estimate bird’s-eye-view maps with unprecedented accuracy, setting new benchmarks.
Voxel Proposal Network via Multi-Frame Knowledge Distillation for Semantic Scene Completion
·2307 words·11 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
π’ Tianjin University
VPNet, a novel semantic scene completion network, uses multi-frame knowledge distillation and confident voxel proposals to improve accuracy and handle dynamic aspects of 3D scenes from point clouds, a…
VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions
·2504 words·12 mins·
loading
·
loading
AI Applications
Robotics
π’ Peking University
VLMimic: Vision-Language Models enable robots to master intricate actions using only a few human video demonstrations, surpassing existing methods by a significant margin.
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
·3011 words·15 mins·
loading
·
loading
Computer Vision
Visual Question Answering
π’ UC San Diego
VLG-CBM enhances concept bottleneck models with vision-language guidance for faithful interpretability and improved accuracy.
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
·2634 words·13 mins·
loading
·
loading
Computer Vision
Image Generation
π’ King Abdullah University of Science and Technology
Vivid-ZOO: Generating high-quality multi-view videos from text using a novel diffusion model.
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
·3475 words·17 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Skywork AI
VITRON: a unified pixel-level Vision LLM excels in understanding, generating, segmenting, and editing images and videos.
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
·3551 words·17 mins·
loading
·
loading
AI Generated
Multimodal Learning
Vision-Language Models
π’ University of Washington
Visual SKETCHPAD empowers multimodal language models (LLMs) with visual reasoning abilities by allowing them to generate intermediate sketches. This innovative framework substantially enhances LLM per…
Visual Prompt Tuning in Null Space for Continual Learning
·2254 words·11 mins·
loading
·
loading
AI Generated
Computer Vision
Visual Question Answering
π’ School of Computer Science, Northwestern Polytechnical University
This paper presents NSPΒ², a novel method for visual prompt tuning in continual learning that leverages orthogonal projection to prevent catastrophic forgetting by tuning prompts orthogonal to previous…
Visual Pinwheel Center Act as Geometric Saliency Detector
·2189 words·11 mins·
loading
·
loading
Computer Vision
Image Classification
π’ Research Institute of Intelligent Complex Systems, Fudan University
Visual pinwheel centers in the cortex act as efficient geometric saliency detectors, responding faster and stronger to complex spatial textures than other structures.
Visual Perception by Large Language Modelβs Weights
·2070 words·10 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Tencent AI Lab
VLORA: Boosting Multimodal LLMs efficiency by merging visual features into model weights instead of extending input sequences.
Visual Fourier Prompt Tuning
·4269 words·21 mins·
loading
·
loading
Computer Vision
Image Classification
π’ Rochester Institute of Technology
Visual Fourier Prompt Tuning (VFPT) leverages the Fast Fourier Transform to seamlessly integrate spatial and frequency information for superior parameter-efficient vision model fine-tuning, even with …
Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion
·4160 words·20 mins·
loading
·
loading
Computer Vision
Image Generation
π’ Department of Biomedical Engineering, Southern University of Science and Technology
Researchers developed a novel zero-shot EEG-based framework for visual reconstruction using a tailored brain encoder and a two-stage image generation strategy, achieving state-of-the-art performance i…
Visual Data Diagnosis and Debiasing with Concept Graphs
·2767 words·13 mins·
loading
·
loading
Computer Vision
Image Classification
π’ Carnegie Mellon University
CONBIAS tackles dataset bias by representing visual data as concept graphs, diagnosing imbalances via clique analysis, and debiasing through targeted data augmentation for improved model generalizatio…
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
·2150 words·11 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Chinese Academy of Sciences
AcFormer, a novel vision-language connector for MLLMs, leverages ‘visual anchors’ to reduce computation cost by ~66% while improving accuracy.
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
·4202 words·20 mins·
loading
·
loading
AI Generated
AI Applications
Autonomous Vehicles
π’ Hong Kong University of Science and Technology
Vista: a novel driving world model achieving high-fidelity prediction and versatile controllability, outperforming state-of-the-art models in generalization and prediction accuracy.
VisMin: Visual Minimal-Change Understanding
·2710 words·13 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Mila - Quebec AI Institute
VisMin benchmark evaluates visual-language models’ fine-grained understanding by identifying minimal image-text differences (object, attribute, count, spatial relation). Current VLMs struggle with sp…
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
·6701 words·32 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
π’ Tsinghua University
VisionLLM v2 unifies visual perception, understanding, and generation, excelling in various vision tasks and achieving performance comparable to task-specific models.