Posters

Once Read is Enough: Domain-specific Pretraining-free Language Models with Cluster-guided Sparse Experts for Long-tail Domain Knowledge

26 September 2024·2658 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Oxford

This research introduces Cluster-guided Sparse Experts (CSE), enabling pretrained language models to effectively learn long-tail domain knowledge without domain-specific pretraining, thus achieving su…

On-Road Object Importance Estimation: A New Dataset and A Model with Multi-Fold Top-Down Guidance

26 September 2024·1952 words·10 mins· loading · loading

AI Applications Autonomous Vehicles 🏢 College of Computer Science, Chongqing University

New large-scale dataset and model boost on-road object importance estimation accuracy by 23.1%!

On Weak Regret Analysis for Dueling Bandits

26 September 2024·1775 words·9 mins· loading · loading

AI Generated AI Theory Optimization 🏢 KAUST

New algorithms achieve optimal weak regret in K-armed dueling bandits by leveraging the full problem structure, improving upon state-of-the-art methods.

On Tractable $ hi$-Equilibria in Non-Concave Games

26 September 2024·1428 words·7 mins· loading · loading

AI Theory Optimization 🏢 Yale University

This paper presents efficient algorithms for approximating equilibria in non-concave games, focusing on tractable ɸ-equilibria and addressing computational challenges posed by infinite strategy sets.

On the Worst Prompt Performance of Large Language Models

26 September 2024·2797 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tencent AI Lab

LLMs’ performance drastically varies depending on prompt phrasing; this paper introduces ROBUSTAL-PACAEVAL to evaluate lower-bound performance via worst-case prompt analysis, revealing model inconsist…

On the Target-kernel Alignment: a Unified Analysis with Kernel Complexity

26 September 2024·2457 words·12 mins· loading · loading

Machine Learning Deep Learning 🏢 School of Statistics and Management, Shanghai University of Finance and Economics

Truncated kernel methods consistently outperform standard methods by eliminating the saturation effect, offering faster learning rates and enhanced theoretical guarantees.

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

26 September 2024·2971 words·14 mins· loading · loading

Computer Vision Image Classification 🏢 Carnegie Mellon University

Vision Transformers achieve surprisingly high accuracy by transferring only pre-training attention maps, challenging the conventional belief that feature learning is crucial.

On the Stability and Generalization of Meta-Learning

26 September 2024·1358 words·7 mins· loading · loading

Machine Learning Meta Learning 🏢 Johns Hopkins University

This paper introduces uniform meta-stability for meta-learning, providing tighter generalization bounds for convex and weakly-convex problems, addressing computational limitations of existing algorith…

On the Sparsity of the Strong Lottery Ticket Hypothesis

26 September 2024·1303 words·7 mins· loading · loading

AI Theory Optimization 🏢 Université Côte D'Azur

Researchers rigorously prove the Strong Lottery Ticket Hypothesis, offering the first theoretical guarantees on the sparsity of winning neural network subnetworks.

On the Scalability of GNNs for Molecular Graphs

26 September 2024·2680 words·13 mins· loading · loading

Machine Learning Deep Learning 🏢 Valence Labs

Giant leap in molecular GNNs! MolGPS, a new foundation model, achieves state-of-the-art performance on molecular property prediction by leveraging massive datasets and demonstrating the scalability o…

On the Scalability of Certified Adversarial Robustness with Generated Data

26 September 2024·2448 words·12 mins· loading · loading

Machine Learning Deep Learning 🏢 Machine Learning and Data Analytics Lab, FAU Erlangen Nürnberg, Germany

Boosting certified robustness of machine learning models by 3-4% using generated data from diffusion models!

On the Saturation Effects of Spectral Algorithms in Large Dimensions

26 September 2024·1464 words·7 mins· loading · loading

AI Theory Generalization 🏢 Tsinghua University

High-dimensional spectral algorithms show saturation effects: Kernel Ridge Regression underperforms optimal algorithms like gradient flow when regression functions are very smooth.

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

26 September 2024·2014 words·10 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Yale University

New reinforcement learning model clarifies the role of information structure in partially-observable sequential decision-making problems, proving an upper bound on learning complexity.

On the Role of Attention Masks and LayerNorm in Transformers

26 September 2024·2522 words·12 mins· loading · loading

AI Generated AI Theory Representation Learning 🏢 MIT

Transformers’ self-attention mechanism, while powerful, suffers from rank collapse with increasing depth. This paper reveals that while masked attention still leads to exponential collapse, sparse att…

On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models

26 September 2024·1629 words·8 mins· loading · loading

AI Theory Robustness 🏢 University of Utah

Spectral algorithms for graph bisection show surprising robustness to helpful adversaries in semirandom models, with unnormalized Laplacian consistently outperforming the normalized one.

On the Power of Small-size Graph Neural Networks for Linear Programming

26 September 2024·2361 words·12 mins· loading · loading

AI Generated AI Theory Optimization 🏢 Peking University

Small-size Graph Neural Networks effectively solve Linear Programs!

On the Power of Decision Trees in Auto-Regressive Language Modeling

26 September 2024·2176 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Massachusetts Institute of Technology

Auto-Regressive Decision Trees (ARDTs) surprisingly outperform Transformers on language tasks!

On the Parameter Identifiability of Partially Observed Linear Causal Models

26 September 2024·3769 words·18 mins· loading · loading

AI Generated AI Theory Causality 🏢 Carnegie Mellon University

Researchers achieve full parameter identifiability in partially observed linear causal models using novel graphical conditions and a likelihood-based estimation method, addressing previous limitations…

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

26 September 2024·1661 words·8 mins· loading · loading

AI Generated AI Theory Optimization 🏢 MIT

Researchers discover Dilated Entropy is the optimal distance-generating function for solving extensive-form games using first-order methods, achieving near-optimal regret bounds.

On the Optimal Time Complexities in Decentralized Stochastic Asynchronous Optimization

26 September 2024·2010 words·10 mins· loading · loading

Machine Learning Optimization 🏢 KAUST AIRI

Fragile SGD & Amelie SGD achieve near-optimal speed in decentralized asynchronous optimization, handling diverse worker & communication speeds.