🏢 Chinese University of Hong Kong, Shenzhen, China
Why Transformers Need Adam: A Hessian Perspective
·2407 words·12 mins·
loading
·
loading
AI Theory
Optimization
🏢 Chinese University of Hong Kong, Shenzhen, China
Adam’s superiority over SGD in Transformer training is explained by the ‘block heterogeneity’ of the Hessian matrix, highlighting the need for adaptive learning rates.
Unsupervised Anomaly Detection in The Presence of Missing Values
·3139 words·15 mins·
loading
·
loading
Machine Learning
Unsupervised Learning
🏢 Chinese University of Hong Kong, Shenzhen, China
ImAD: An end-to-end unsupervised anomaly detection method conquering missing data’s challenge by integrating imputation and detection in a unified framework, achieving superior accuracy!