(2016. 7) Layer Normalization

  • published in 2016. 7

  • Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

Simple summary

  • Similar to batch normalization.

  • Works well for recurrent networks with mini-batch.

  • Independent with batch size.

  • Robust to Input data's scale

  • Robust to Weight matrix's scale and shift

  • Naturally uploaded scales decreased as learning progresses

  • but, Batch Norm still perform better for CNNs.

Last updated