Posters

Yo'LLaVA: Your Personalized Language and Vision Assistant

26 September 2024·4272 words·21 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 University of Wisconsin-Madison

Yo’LLaVA personalizes Large Multimodal Models (LMMs) to converse about specific subjects using just a few images, embedding concepts into latent tokens for efficient and effective personalized convers…

xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

26 September 2024·1618 words·8 mins· loading · loading

AI Applications Healthcare 🏢 Berlin Institute for the Foundations of Learning and Data

xMIL-LRP: Enhanced explainable AI for multiple instance learning in histopathology, boosting model transparency and enabling new knowledge discovery.

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation

26 September 2024·2133 words·11 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Tsinghua University

XMask3D uses cross-modal mask reasoning to achieve state-of-the-art open vocabulary 3D semantic segmentation by aligning 2D and 3D features at the mask level, resulting in precise segmentation boundar…

Worst-Case Offline Reinforcement Learning with Arbitrary Data Support

26 September 2024·450 words·3 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 IBM Research

Worst-case offline RL guarantees near-optimal policy performance without data support assumptions, achieving a sample complexity bound of O(ε⁻²).

WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment

26 September 2024·3322 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

WorldCoder: an LLM agent builds world models via code generation and interaction, proving highly sample-efficient and enabling knowledge transfer.

WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena

26 September 2024·2352 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Microsoft Corporation

WizardArena simulates offline chatbot arena battles to efficiently post-train LLMs, dramatically reducing costs and improving model performance.

WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

26 September 2024·3638 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University

WISE, a novel dual-memory architecture, solves the impossible triangle of reliability, generalization, and locality in lifelong LLM editing by employing a side memory for knowledge updates and a route…

Wings: Learning Multimodal LLMs without Text-only Forgetting

26 September 2024·1958 words·10 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Alibaba International Digital Commerce

WINGS: A novel multimodal LLM combats ’text-only forgetting’ by using complementary visual and textual learners, achieving superior performance on text-only and visual tasks.

WildGaussians: 3D Gaussian Splatting In the Wild

26 September 2024·2601 words·13 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 ETH Zurich

WildGaussians enhances 3D Gaussian splatting for real-time rendering of photorealistic 3D scenes from in-the-wild images featuring occlusions and appearance changes.

Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections

26 September 2024·1766 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 Johns Hopkins University

Wild-GS achieves real-time novel view synthesis from unconstrained photos by efficiently adapting 3D Gaussian Splatting, significantly improving speed and quality over existing methods.

Wide Two-Layer Networks can Learn from Adversarial Perturbations

26 September 2024·2045 words·10 mins· loading · loading

AI Theory Robustness 🏢 University of Tokyo

Wide two-layer neural networks can generalize well from mislabeled adversarial examples because adversarial perturbations surprisingly contain sufficient class-specific features.

Why Warmup the Learning Rate? Underlying Mechanisms and Improvements

26 September 2024·7149 words·34 mins· loading · loading

AI Generated AI Theory Optimization 🏢 University of Maryland

Deep learning’s learning rate warmup improves performance by allowing larger learning rates, pushing networks to better-conditioned loss landscape areas.

Why Transformers Need Adam: A Hessian Perspective

26 September 2024·2407 words·12 mins· loading · loading

AI Theory Optimization 🏢 Chinese University of Hong Kong, Shenzhen, China

Adam’s superiority over SGD in Transformer training is explained by the ‘block heterogeneity’ of the Hessian matrix, highlighting the need for adaptive learning rates.

Why the Metric Backbone Preserves Community Structure

26 September 2024·2073 words·10 mins· loading · loading

AI Theory Optimization 🏢 EPFL

Metric backbone graph sparsification surprisingly preserves community structure, offering an efficient and robust method for analyzing large networks.

Why Go Full? Elevating Federated Learning Through Partial Network Updates

26 September 2024·3064 words·15 mins· loading · loading

AI Generated Machine Learning Federated Learning 🏢 Beihang University

FedPart boosts federated learning by updating only parts of the network, solving the layer mismatch problem, and achieving faster convergence with higher accuracy.

Why Do We Need Weight Decay in Modern Deep Learning?

26 September 2024·3285 words·16 mins· loading · loading

AI Theory Optimization 🏢 EPFL

Weight decay’s role in modern deep learning is surprisingly multifaceted, impacting optimization dynamics rather than solely regularization, improving generalization and training stability.

Why are Visually-Grounded Language Models Bad at Image Classification?

26 September 2024·3661 words·18 mins· loading · loading

AI Generated Multimodal Learning Vision-Language Models 🏢 Stanford University

Visually-grounded Language Models (VLMs) surprisingly underperform in image classification. This study reveals that this is primarily due to a lack of sufficient classification data during VLM trainin…

Who’s Gaming the System? A Causally-Motivated Approach for Detecting Strategic Adaptation

26 September 2024·3724 words·18 mins· loading · loading

AI Applications Healthcare 🏢 University of Michigan

Researchers developed a causally-motivated approach for ranking agents based on their gaming propensity, addressing the challenge of identifying ‘worst offenders’ in strategic classification settings.

Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval

26 September 2024·2035 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 NVIDIA Research

Unlocking personalized image retrieval and segmentation, a novel approach uses pre-trained text-to-image diffusion models to surpass supervised methods, addressing limitations of existing self-supervi…

Where does In-context Learning \ Happen in Large Language Models?

26 September 2024·2289 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Johns Hopkins University

LLMs learn tasks via in-context learning, but the task recognition location is unknown. This paper reveals that LLMs transition from task recognition to task performance at specific layers, enabling s…