Skip to main content

🏢 Chinese University of Hong Kong, Shenzhen, China

Why Transformers Need Adam: A Hessian Perspective
·2407 words·12 mins· loading · loading
AI Theory Optimization 🏢 Chinese University of Hong Kong, Shenzhen, China
Adam’s superiority over SGD in Transformer training is explained by the ‘block heterogeneity’ of the Hessian matrix, highlighting the need for adaptive learning rates.
Unsupervised Anomaly Detection in The Presence of Missing Values
·3139 words·15 mins· loading · loading
Machine Learning Unsupervised Learning 🏢 Chinese University of Hong Kong, Shenzhen, China
ImAD: An end-to-end unsupervised anomaly detection method conquering missing data’s challenge by integrating imputation and detection in a unified framework, achieving superior accuracy!