🏢 Institute of Science and Technology Austria (ISTA)
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
·2050 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Institute of Science and Technology Austria (ISTA)
MICROADAM: A new Adam optimizer variant dramatically cuts memory usage for training large language models without compromising accuracy or provable convergence.