🏢 Tencent Hunyuan
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
·2206 words·11 mins·
loading
·
loading
AI Generated
Machine Learning
Deep Learning
🏢 Tencent Hunyuan
Deep learning’s Adam-style optimizers exhibit a surprising surge phenomenon: optimal learning rates initially increase, then decrease, before converging to a non-zero value as batch size grows.