Skip to main content

🏢 Tencent Hunyuan

Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
·2206 words·11 mins· loading · loading
AI Generated Machine Learning Deep Learning 🏢 Tencent Hunyuan
Deep learning’s Adam-style optimizers exhibit a surprising surge phenomenon: optimal learning rates initially increase, then decrease, before converging to a non-zero value as batch size grows.