Skip to main content

Posters

2024

Yo'LLaVA: Your Personalized Language and Vision Assistant
·4272 words·21 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 University of Wisconsin-Madison
Yo’LLaVA personalizes Large Multimodal Models (LMMs) to converse about specific subjects using just a few images, embedding concepts into latent tokens for efficient and effective personalized convers…
xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology
·1618 words·8 mins· loading · loading
AI Applications Healthcare 🏢 Berlin Institute for the Foundations of Learning and Data
xMIL-LRP: Enhanced explainable AI for multiple instance learning in histopathology, boosting model transparency and enabling new knowledge discovery.
XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation
·2133 words·11 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Tsinghua University
XMask3D uses cross-modal mask reasoning to achieve state-of-the-art open vocabulary 3D semantic segmentation by aligning 2D and 3D features at the mask level, resulting in precise segmentation boundar…
Worst-Case Offline Reinforcement Learning with Arbitrary Data Support
·450 words·3 mins· loading · loading
AI Generated Machine Learning Reinforcement Learning 🏢 IBM Research
Worst-case offline RL guarantees near-optimal policy performance without data support assumptions, achieving a sample complexity bound of O(ε⁻²).
WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment
·3322 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Cornell University
WorldCoder: an LLM agent builds world models via code generation and interaction, proving highly sample-efficient and enabling knowledge transfer.
WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena
·2352 words·12 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Microsoft Corporation
WizardArena simulates offline chatbot arena battles to efficiently post-train LLMs, dramatically reducing costs and improving model performance.
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
·3638 words·18 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University
WISE, a novel dual-memory architecture, solves the impossible triangle of reliability, generalization, and locality in lifelong LLM editing by employing a side memory for knowledge updates and a route…
Wings: Learning Multimodal LLMs without Text-only Forgetting
·1958 words·10 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 Alibaba International Digital Commerce
WINGS: A novel multimodal LLM combats ’text-only forgetting’ by using complementary visual and textual learners, achieving superior performance on text-only and visual tasks.
WildGaussians: 3D Gaussian Splatting In the Wild
·2601 words·13 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 ETH Zurich
WildGaussians enhances 3D Gaussian splatting for real-time rendering of photorealistic 3D scenes from in-the-wild images featuring occlusions and appearance changes.
Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections
·1766 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 Johns Hopkins University
Wild-GS achieves real-time novel view synthesis from unconstrained photos by efficiently adapting 3D Gaussian Splatting, significantly improving speed and quality over existing methods.
Wide Two-Layer Networks can Learn from Adversarial Perturbations
·2045 words·10 mins· loading · loading
AI Theory Robustness 🏢 University of Tokyo
Wide two-layer neural networks can generalize well from mislabeled adversarial examples because adversarial perturbations surprisingly contain sufficient class-specific features.
Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
·7149 words·34 mins· loading · loading
AI Generated AI Theory Optimization 🏢 University of Maryland
Deep learning’s learning rate warmup improves performance by allowing larger learning rates, pushing networks to better-conditioned loss landscape areas.
Why Transformers Need Adam: A Hessian Perspective
·2407 words·12 mins· loading · loading
AI Theory Optimization 🏢 Chinese University of Hong Kong, Shenzhen, China
Adam’s superiority over SGD in Transformer training is explained by the ‘block heterogeneity’ of the Hessian matrix, highlighting the need for adaptive learning rates.
Why the Metric Backbone Preserves Community Structure
·2073 words·10 mins· loading · loading
AI Theory Optimization 🏢 EPFL
Metric backbone graph sparsification surprisingly preserves community structure, offering an efficient and robust method for analyzing large networks.
Why Go Full? Elevating Federated Learning Through Partial Network Updates
·3064 words·15 mins· loading · loading
AI Generated Machine Learning Federated Learning 🏢 Beihang University
FedPart boosts federated learning by updating only parts of the network, solving the layer mismatch problem, and achieving faster convergence with higher accuracy.
Why Do We Need Weight Decay in Modern Deep Learning?
·3285 words·16 mins· loading · loading
AI Theory Optimization 🏢 EPFL
Weight decay’s role in modern deep learning is surprisingly multifaceted, impacting optimization dynamics rather than solely regularization, improving generalization and training stability.
Why are Visually-Grounded Language Models Bad at Image Classification?
·3661 words·18 mins· loading · loading
AI Generated Multimodal Learning Vision-Language Models 🏢 Stanford University
Visually-grounded Language Models (VLMs) surprisingly underperform in image classification. This study reveals that this is primarily due to a lack of sufficient classification data during VLM trainin…
Who’s Gaming the System? A Causally-Motivated Approach for Detecting Strategic Adaptation
·3724 words·18 mins· loading · loading
AI Applications Healthcare 🏢 University of Michigan
Researchers developed a causally-motivated approach for ranking agents based on their gaming propensity, addressing the challenge of identifying ‘worst offenders’ in strategic classification settings.
Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval
·2035 words·10 mins· loading · loading
Computer Vision Image Segmentation 🏢 NVIDIA Research
Unlocking personalized image retrieval and segmentation, a novel approach uses pre-trained text-to-image diffusion models to surpass supervised methods, addressing limitations of existing self-supervi…
Where does In-context Learning \ Happen in Large Language Models?
·2289 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Johns Hopkins University
LLMs learn tasks via in-context learning, but the task recognition location is unknown. This paper reveals that LLMs transition from task recognition to task performance at specific layers, enabling s…