Optimization
On the difficulty of training Recurrent Neural Networks (2012. 11)
Gradient Clipping
,RNN
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units (2015. 4)
Weight Initialization
,RNN
,Identity Matrix
Cyclical Learning Rates for Training Neural Networks (2015. 6)
CLR
,Triangular, ExpRange
,Longtherm Benefit
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima (2016. 9)
Generalization
,Sharpness of Minima
Neural Optimizer Search with Reinforcement Learning (2017. 9)
Neural Optimizer Search (NOS)
,PowerSign
,AddSign
On the Convergence of Adam and Beyond (2018. 2)
AMSGrad
,Convex optimization
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost (2018. 4)
Adafactor
,Adaptive Method
,Update Clipping
Revisiting Small Batch Training for Deep Neural Networks (2018. 4)
Generalization Performance
,Training Stability
Last updated