Model
Dropout (2012, 2014)
Regulaizer
,Ensemble
Regularization of Neural Networks using DropConnect (2013)
Regulaizer
,Ensemble
Recurrent Neural Network Regularization (2014. 9)
RNN
,Dropout to Non-Recurrent Connections
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (2015. 12)
Variational RNN
,Dropout - RNN
,Bayesian interpretation
Adaptive Computation Time for Recurrent Neural Networks (2016. 3)
ACT
,Dynamically
,Logic Task
Equality of Opportunity in Supervised Learning (2016. 10)
Equalized Odds
,Demographic Parity
,Bias
Categorical Reparameterization with Gumbel-Softmax (2016. 11)
Gumbel-Softmax distribution
,Reparameterization
,Smooth relaxation
Understanding deep learning requires rethinking generalization (2016. 11)
Generalization Error
,Role of Regularization
On Calibration of Modern Neural Networks (2017. 6)
Confidence calibration
,Maximum Calibration Error (MCE)
When is a Convolutional Filter Easy To Learn? (2017. 9)
Conv + ReLU
,Non-Gaussian Case
,Polynomial Time
mixup: Beyond Empirical Risk Minimization (2017. 10)
Data Augmentation
,Vicinal Risk Minimization
,Generalization
Measuring the tendency of CNNs to Learn Surface Statistical Regularities (2017. 11)
not learn High Level Semantics
,learn Surface Statistical Regularities
MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels (2017. 12)
MentorNet - StudentNet
,Curriculum Learning
,Output is Weight
Deep Learning Scaling is Predictable, Empirically (2017. 12)
Power-Law Exponents
,Grow Training Sets
Sensitivity and Generalization in Neural Networks: an Empirical Study (2018. 2)
Robustness
,Data Perturbations
,Survey
Can recurrent neural networks warp time? (2018. 2)
RNN
,Learnable Gate
,Chrono Initialization
Spectral Normalization for Generative Adversarial Networks (2018. 2)
GAN
,Training Discriminator
,Constrain Lipschitz
,Power Method
On the importance of single directions for generalization (2018. 3)
Importance
,Confusiing Neurons
,Selective Neuron
,DeepMind
Group Normalization (2018. 3)
Group Normalization (GN)
,Batch (BN)
,Layer (LN)
,Instance (IN)
,Independent Batch Size
Fast Decoding in Sequence Models using Discrete Latent Variables (2018. 3)
Autoregressive
,Latent Transformer
,Discretization
Delayed Impact of Fair Machine Learning (2018. 3)
Outcome Curve
,Max Profit, Demographic Parity, Equal Opportunity
How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) (2018. 5)
Smoothing Effect
,BatchNorm’s Reparametrization
Relational inductive biases, deep learning, and graph networks (2018, 6)
Survey
,Relation
,Graph
Universal Transformers (2018. 7)
Transformer
,Weight Sharing
,Adaptive Computation Time (ACT)
Last updated