Model

  • Dropout (2012, 2014)

  • Regularization of Neural Networks using DropConnect (2013)

  • Recurrent Neural Network Regularization (2014. 9)

    • RNN, Dropout to Non-Recurrent Connections

  • Batch Normalization (2015. 2)

    • Regulaizer, Accelerate Training, CNN

  • Training Very Deep Networks (2015. 7)

  • A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (2015. 12)

    • Variational RNN, Dropout - RNN, Bayesian interpretation

  • Deep Networks with Stochastic Depth (2016. 3)

    • Dropout, Ensenble, Beyond 1000 layers

  • Adaptive Computation Time for Recurrent Neural Networks (2016. 3)

    • ACT, Dynamically, Logic Task

  • Layer Normalization (2016. 7)

    • Regulaizer, Accelerate Training, RNN

  • Recurrent Highway Networks (2016. 7)

  • Using Fast Weights to Attend to the Recent Past (2016. 10)

  • Professor Forcing: A New Algorithm for Training Recurrent Networks (2016. 10)

    • Professor Forcing, RNN, Inference Problem, Training with GAN

  • Equality of Opportunity in Supervised Learning (2016. 10)

  • Categorical Reparameterization with Gumbel-Softmax (2016. 11)

    • Gumbel-Softmax distribution, Reparameterization, Smooth relaxation

  • Understanding deep learning requires rethinking generalization (2016. 11)

    • Generalization Error, Role of Regularization

  • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017. 1)

    • MoE Layer, Sparsely-Gated, Capacity, Google Brain

  • A simple neural network module for relational reasoning (2017. 6)

  • On Calibration of Modern Neural Networks (2017. 6)

    • Confidence calibration, Maximum Calibration Error (MCE)

  • When is a Convolutional Filter Easy To Learn? (2017. 9)

  • mixup: Beyond Empirical Risk Minimization (2017. 10)

  • Measuring the tendency of CNNs to Learn Surface Statistical Regularities (2017. 11)

  • MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels (2017. 12)

    • MentorNet - StudentNet, Curriculum Learning, Output is Weight

  • Deep Learning Scaling is Predictable, Empirically (2017. 12)

  • Sensitivity and Generalization in Neural Networks: an Empirical Study (2018. 2)

  • Can recurrent neural networks warp time? (2018. 2)

  • Spectral Normalization for Generative Adversarial Networks (2018. 2)

    • GAN, Training Discriminator, Constrain Lipschitz, Power Method

  • On the importance of single directions for generalization (2018. 3)

  • Group Normalization (2018. 3)

    • Group Normalization (GN), Batch (BN), Layer (LN), Instance (IN), Independent Batch Size

  • Fast Decoding in Sequence Models using Discrete Latent Variables (2018. 3)

    • Autoregressive, Latent Transformer, Discretization

  • Delayed Impact of Fair Machine Learning (2018. 3)

  • How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) (2018. 5)

    • Smoothing Effect, BatchNorm’s Reparametrization

  • When Recurrent Models Don't Need To Be Recurrent (2018. 5)

  • Relational inductive biases, deep learning, and graph networks (2018, 6)

    • Survey, Relation, Graph

  • Universal Transformers (2018. 7)

Last updated