Posters
2024
Once Read is Enough: Domain-specific Pretraining-free Language Models with Cluster-guided Sparse Experts for Long-tail Domain Knowledge
·2658 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Oxford
This research introduces Cluster-guided Sparse Experts (CSE), enabling pretrained language models to effectively learn long-tail domain knowledge without domain-specific pretraining, thus achieving su…
On-Road Object Importance Estimation: A New Dataset and A Model with Multi-Fold Top-Down Guidance
·1952 words·10 mins·
loading
·
loading
AI Applications
Autonomous Vehicles
🏢 College of Computer Science, Chongqing University
New large-scale dataset and model boost on-road object importance estimation accuracy by 23.1%!
On Weak Regret Analysis for Dueling Bandits
·1775 words·9 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 KAUST
New algorithms achieve optimal weak regret in K-armed dueling bandits by leveraging the full problem structure, improving upon state-of-the-art methods.
On Tractable $
hi$-Equilibria in Non-Concave Games
·1428 words·7 mins·
loading
·
loading
AI Theory
Optimization
🏢 Yale University
This paper presents efficient algorithms for approximating equilibria in non-concave games, focusing on tractable ɸ-equilibria and addressing computational challenges posed by infinite strategy sets.
On the Worst Prompt Performance of Large Language Models
·2797 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
LLMs’ performance drastically varies depending on prompt phrasing; this paper introduces ROBUSTAL-PACAEVAL to evaluate lower-bound performance via worst-case prompt analysis, revealing model inconsist…
On the Target-kernel Alignment: a Unified Analysis with Kernel Complexity
·2457 words·12 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 School of Statistics and Management, Shanghai University of Finance and Economics
Truncated kernel methods consistently outperform standard methods by eliminating the saturation effect, offering faster learning rates and enhanced theoretical guarantees.
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
·2971 words·14 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Carnegie Mellon University
Vision Transformers achieve surprisingly high accuracy by transferring only pre-training attention maps, challenging the conventional belief that feature learning is crucial.
On the Stability and Generalization of Meta-Learning
·1358 words·7 mins·
loading
·
loading
Machine Learning
Meta Learning
🏢 Johns Hopkins University
This paper introduces uniform meta-stability for meta-learning, providing tighter generalization bounds for convex and weakly-convex problems, addressing computational limitations of existing algorith…
On the Sparsity of the Strong Lottery Ticket Hypothesis
·1303 words·7 mins·
loading
·
loading
AI Theory
Optimization
🏢 Université Côte D'Azur
Researchers rigorously prove the Strong Lottery Ticket Hypothesis, offering the first theoretical guarantees on the sparsity of winning neural network subnetworks.
On the Scalability of GNNs for Molecular Graphs
·2680 words·13 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 Valence Labs
Giant leap in molecular GNNs! MolGPS, a new foundation model, achieves state-of-the-art performance on molecular property prediction by leveraging massive datasets and demonstrating the scalability o…
On the Scalability of Certified Adversarial Robustness with Generated Data
·2448 words·12 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 Machine Learning and Data Analytics Lab, FAU Erlangen Nürnberg, Germany
Boosting certified robustness of machine learning models by 3-4% using generated data from diffusion models!
On the Saturation Effects of Spectral Algorithms in Large Dimensions
·1464 words·7 mins·
loading
·
loading
AI Theory
Generalization
🏢 Tsinghua University
High-dimensional spectral algorithms show saturation effects: Kernel Ridge Regression underperforms optimal algorithms like gradient flow when regression functions are very smooth.
On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games
·2014 words·10 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 Yale University
New reinforcement learning model clarifies the role of information structure in partially-observable sequential decision-making problems, proving an upper bound on learning complexity.
On the Role of Attention Masks and LayerNorm in Transformers
·2522 words·12 mins·
loading
·
loading
AI Generated
AI Theory
Representation Learning
🏢 MIT
Transformers’ self-attention mechanism, while powerful, suffers from rank collapse with increasing depth. This paper reveals that while masked attention still leads to exponential collapse, sparse att…
On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models
·1629 words·8 mins·
loading
·
loading
AI Theory
Robustness
🏢 University of Utah
Spectral algorithms for graph bisection show surprising robustness to helpful adversaries in semirandom models, with unnormalized Laplacian consistently outperforming the normalized one.
On the Power of Small-size Graph Neural Networks for Linear Programming
·2361 words·12 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 Peking University
Small-size Graph Neural Networks effectively solve Linear Programs!
On the Power of Decision Trees in Auto-Regressive Language Modeling
·2176 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Massachusetts Institute of Technology
Auto-Regressive Decision Trees (ARDTs) surprisingly outperform Transformers on language tasks!
On the Parameter Identifiability of Partially Observed Linear Causal Models
·3769 words·18 mins·
loading
·
loading
AI Generated
AI Theory
Causality
🏢 Carnegie Mellon University
Researchers achieve full parameter identifiability in partially observed linear causal models using novel graphical conditions and a likelihood-based estimation method, addressing previous limitations…
On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games
·1661 words·8 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 MIT
Researchers discover Dilated Entropy is the optimal distance-generating function for solving extensive-form games using first-order methods, achieving near-optimal regret bounds.
On the Optimal Time Complexities in Decentralized Stochastic Asynchronous Optimization
·2010 words·10 mins·
loading
·
loading
Machine Learning
Optimization
🏢 KAUST AIRI
Fragile SGD & Amelie SGD achieve near-optimal speed in decentralized asynchronous optimization, handling diverse worker & communication speeds.