Model

Dropout (2012, 2014)
- Regulaizer, Ensemble
- arXiv (2012), arXiv (2014), note
Regularization of Neural Networks using DropConnect (2013)
- Regulaizer, Ensemble
- paper, note, wanli_summary
Recurrent Neural Network Regularization (2014. 9)
- RNN, Dropout to Non-Recurrent Connections
- arXiv
Batch Normalization (2015. 2)
- Regulaizer, Accelerate Training, CNN
- arXiv, note
Training Very Deep Networks (2015. 7)
- Highway, LSTM-like
- arXiv, note
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (2015. 12)
- Variational RNN, Dropout - RNN, Bayesian interpretation
- arXiv
Deep Networks with Stochastic Depth (2016. 3)
- Dropout, Ensenble, Beyond 1000 layers
- arXiv, note
Adaptive Computation Time for Recurrent Neural Networks (2016. 3)
- ACT, Dynamically, Logic Task
- arXiv
Layer Normalization (2016. 7)
- Regulaizer, Accelerate Training, RNN
- arXiv, note
Recurrent Highway Networks (2016. 7)
- RHN, Highway, Depth, RNN
- arXiv, note
Using Fast Weights to Attend to the Recent Past (2016. 10)
- Cognitive, Attention, Memory
- arXiv, note
Professor Forcing: A New Algorithm for Training Recurrent Networks (2016. 10)
- Professor Forcing, RNN, Inference Problem, Training with GAN
- arXiv, note
Equality of Opportunity in Supervised Learning (2016. 10)
- Equalized Odds, Demographic Parity, Bias
- arXiv, the_morning_paper
Categorical Reparameterization with Gumbel-Softmax (2016. 11)
- Gumbel-Softmax distribution, Reparameterization, Smooth relaxation
- arXiv, open_review
Understanding deep learning requires rethinking generalization (2016. 11)
- Generalization Error, Role of Regularization
- arXiv
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017. 1)
- MoE Layer, Sparsely-Gated, Capacity, Google Brain
- arXiv, note
A simple neural network module for relational reasoning (2017. 6)
- Relational Reasoning, DeepMind
- arXiv, note, code
On Calibration of Modern Neural Networks (2017. 6)
- Confidence calibration, Maximum Calibration Error (MCE)
- arXiv
When is a Convolutional Filter Easy To Learn? (2017. 9)
- Conv + ReLU, Non-Gaussian Case, Polynomial Time
- arXiv, open_review
mixup: Beyond Empirical Risk Minimization (2017. 10)
- Data Augmentation, Vicinal Risk Minimization, Generalization
- arXiv, open_review
Measuring the tendency of CNNs to Learn Surface Statistical Regularities (2017. 11)
- not learn High Level Semantics, learn Surface Statistical Regularities
- arXiv, the_morning_paper
MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels (2017. 12)
- MentorNet - StudentNet, Curriculum Learning, Output is Weight
- arXiv
Deep Learning Scaling is Predictable, Empirically (2017. 12)
- Power-Law Exponents, Grow Training Sets
- arXiv, the_morning_paper
Sensitivity and Generalization in Neural Networks: an Empirical Study (2018. 2)
- Robustness, Data Perturbations, Survey
- arXiv, open_review
Can recurrent neural networks warp time? (2018. 2)
- RNN, Learnable Gate, Chrono Initialization
- open_review
Spectral Normalization for Generative Adversarial Networks (2018. 2)
- GAN, Training Discriminator, Constrain Lipschitz, Power Method
- open_review
On the importance of single directions for generalization (2018. 3)
- Importance, Confusiing Neurons, Selective Neuron, DeepMind
- arXiv, deepmind_blog
Group Normalization (2018. 3)
- Group Normalization (GN), Batch (BN), Layer (LN), Instance (IN), Independent Batch Size
- arXiv
Fast Decoding in Sequence Models using Discrete Latent Variables (2018. 3)
- Autoregressive, Latent Transformer, Discretization
- arXiv
Delayed Impact of Fair Machine Learning (2018. 3)
- Outcome Curve, Max Profit, Demographic Parity, Equal Opportunity
- arXiv, the_morning_paper, bair_blog
How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) (2018. 5)
- Smoothing Effect, BatchNorm’s Reparametrization
- arXiv
When Recurrent Models Don't Need To Be Recurrent (2018. 5)
- Approximate, Feed-Forward
- arXiv, bair_blog
Relational inductive biases, deep learning, and graph networks (2018, 6)
- Survey, Relation, Graph
- arXiv
Universal Transformers (2018. 7)
- Transformer, Weight Sharing, Adaptive Computation Time (ACT)
- arXiv, google_ai_blog

PreviousFramework & System NextNatural Language Processing

Last updated 6 years ago