papers
Papers
If you want to sync papers.
python scripts/sync_papers --sync_path {path_name}
Category
Description
bold : important
tag
: keywordpaper, article, note and code
Background knowledge
Gaussian Process
Supervised
,Regression
Importance Sampling
Approximate
Information Theory: A Tutorial Introduction (2018. 2)
Shannon's Theory
Research Paper
Deep Learning (2015) Review
Adversarial Example
Explaining and Harnessing Adversarial Examples (2014. 12)
FGSM (Fast Gradient Sign Method)
,Adversarial Training
The Limitations of Deep Learning in Adversarial Settings (2015. 11)
JSMA (Jacobian-based Saliency Map Approach)
,Adversarial Training
Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization (2015. 11)
Adversarial Training (generated adversarial examples)
,Proactive Defense
Practical Black-Box Attacks against Machine Learning (2016. 2)
Black-Box (No Access to Gradient)
,Generate Synthetic
Adversarial Patch (2017. 12)
Patch
,White Box
,Black Box
AI
Machine Theory of Mind (2018. 2)
ToMnet
,Meta-Learning
,General Model
,Agent
Cognitive
Building Machines That Learn and Think Like People (2016. 4)
Human-Like
,Learn
,Think
Computer Vision
Spherical CNNs (2018. 1)
Spherical Correlation
,3D Model
,Fast Fourier Transform (FFT)
Taskonomy: Disentangling Task Transfer Learning (2018. 4)
Taskonomy
,Transfer Learning
,Computational modeling of task relations
AutoAugment: Learning Augmentation Policies from Data (2018. 5)
Search Algorithm (RL)
,Sub-Policy
Exploring Randomly Wired Neural Networks for Image Recognition (2019. 4)
Randomly wired neural networks
,Random Graph Models (ER, BA and WS)
MixMatch: A Holistic Approach to Semi-Supervised Learning (2019. 5)
MixMatch
,Semi-Supervised
,Augumentation -> Label Guessing -> Average -> Sharpening
Framework & System
Snorkel: Rapid Training Data Creation with Weak Supervision (2017. 11)
Labelling Functions
,Data Programming
Training classifiers with natural language explanations (2018. 5)
Babble Labble
,Data Programming
Model
Dropout (2012, 2014)
Regulaizer
,Ensemble
Regularization of Neural Networks using DropConnect (2013)
Regulaizer
,Ensemble
Recurrent Neural Network Regularization (2014. 9)
RNN
,Dropout to Non-Recurrent Connections
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (2015. 12)
Variational RNN
,Dropout - RNN
,Bayesian interpretation
Adaptive Computation Time for Recurrent Neural Networks (2016. 3)
ACT
,Dynamically
,Logic Task
Equality of Opportunity in Supervised Learning (2016. 10)
Equalized Odds
,Demographic Parity
,Bias
Categorical Reparameterization with Gumbel-Softmax (2016. 11)
Gumbel-Softmax distribution
,Reparameterization
,Smooth relaxation
Understanding deep learning requires rethinking generalization (2016. 11)
Generalization Error
,Role of Regularization
On Calibration of Modern Neural Networks (2017. 6)
Confidence calibration
,Maximum Calibration Error (MCE)
When is a Convolutional Filter Easy To Learn? (2017. 9)
Conv + ReLU
,Non-Gaussian Case
,Polynomial Time
mixup: Beyond Empirical Risk Minimization (2017. 10)
Data Augmentation
,Vicinal Risk Minimization
,Generalization
Measuring the tendency of CNNs to Learn Surface Statistical Regularities (2017. 11)
not learn High Level Semantics
,learn Surface Statistical Regularities
MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels (2017. 12)
MentorNet - StudentNet
,Curriculum Learning
,Output is Weight
Deep Learning Scaling is Predictable, Empirically (2017. 12)
Power-Law Exponents
,Grow Training Sets
Sensitivity and Generalization in Neural Networks: an Empirical Study (2018. 2)
Robustness
,Data Perturbations
,Survey
Can recurrent neural networks warp time? (2018. 2)
RNN
,Learnable Gate
,Chrono Initialization
Spectral Normalization for Generative Adversarial Networks (2018. 2)
GAN
,Training Discriminator
,Constrain Lipschitz
,Power Method
On the importance of single directions for generalization (2018. 3)
Importance
,Confusiing Neurons
,Selective Neuron
,DeepMind
Group Normalization (2018. 3)
Group Normalization (GN)
,Batch (BN)
,Layer (LN)
,Instance (IN)
,Independent Batch Size
Fast Decoding in Sequence Models using Discrete Latent Variables (2018. 3)
Autoregressive
,Latent Transformer
,Discretization
Delayed Impact of Fair Machine Learning (2018. 3)
Outcome Curve
,Max Profit, Demographic Parity, Equal Opportunity
How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) (2018. 5)
Smoothing Effect
,BatchNorm’s Reparametrization
Relational inductive biases, deep learning, and graph networks (2018, 6)
Survey
,Relation
,Graph
Universal Transformers (2018. 7)
Transformer
,Weight Sharing
,Adaptive Computation Time (ACT)
Identifying Generalization Properties in Neural Networks (2018. 9)
Generalization
,PAC-Bayes
,Hessian
,Perturbation
No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference (2018. 9)
Quantization
,Store Multiplication Table
,Memory/Power Resources
Natural Language Processing
Distributed Representations of Words and Phrases and their Compositionality (2013. 10)
Word2Vec
,CBOW
,Skip-gram
GloVe: Global Vectors for Word Representation (2014)
Word2Vec
,GloVe
,Co-Occurrence
Text Understanding from Scratch (2015. 2)
CNN
,Character-level
A Neural Conversational Model (2015. 6)
Seq2Seq
,Conversation
Character-Aware Neural Language Models (2015. 8)
CNN
,Character-level
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model (2016. 1)
Seq2Seq
,Attention with Structural Biases
,Translation
Long Short-Term Memory-Networks for Machine Reading (2016. 1)
LSTMN
,Intra-Attention
,RNN
Recurrent Memory Networks for Language Modeling (2016. 1)
RMN
,Memory Bank
Swivel: Improving Embeddings by Noticing What's Missing (2016. 2)
Word2Vec
,Swivel
,Co-Occurrence
Recurrent Neural Machine Translation (2016. 7)
Translation
,Attention (RNN)
Multiplicative LSTM for sequence modelling (2016. 10)
mLSTM
,Language Modeling
,Character-Level
Dynamic Coattention Networks For Question Answering (2016. 11)
QA
,DCN
,Coattention Encoder
,Machine Comprehension
A recurrent neural network without chaos (2016. 12)
RNN
,CFN
,Dynamic
,Chaos
Comparative Study of CNN and RNN for Natural Language Processing (2017. 2)
Systematic Comparison
,CNN vs RNN
Dynamic Word Embeddings for Evolving Semantic Discovery (2017. 3)
Word Embedding
,Temporal
,Alignment
Learning to Generate Reviews and Discovering Sentiment (2017. 4)
Sentiment
,Unsupervised
,OpenAI
Ask the Right Questions: Active Question Reformulation with Reinforcement Learning (2017. 5)
QA
,Active Question Answering
,RL
,Agent (Reformulate, Aggregate)
Reinforced Mnemonic Reader for Machine Reading Comprehension (2017. 5)
QA
,Mnemonic (Syntatic, Lexical)
,RL
,Machine Comprehension
Depthwise Separable Convolutions for Neural Machine Translation (2017. 6)
SliceNet
,Super-Separable Conv
,Depsewise + Conv 1x1
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension (2017. 7)
MEMEN
,QA(MC)
,Embedding(skip-gram)
,Full-Orientation Matching
On the State of the Art of Evaluation in Neural Language Models (2017. 7)
Standard LSTM
,Regularisation
,Hyperparemeter
Adversarial Examples for Evaluating Reading Comprehension Systems (2017. 7)
Concatenative Adversaries(AddSent, AddOneSent)
,SQuAD
Learned in Translation: Contextualized Word Vectors (2017. 8)
Word Embedding
,CoVe
,Context Vector
Unsupervised Neural Machine Translation (2017. 10)
Train with both direction (tandem)
,Shared Encoder
,Denoising Auto-Encoder
Word Translation Without Parallel Data (2017. 10)
Unsupervised
,Multilingual Embedding
,Parallel Dictionary Induction
Unsupervised Machine Translation Using Monolingual Corpora Only (2017. 11)
Unsupervised
,Adversarial
,Monolingual Corpora
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model (2017. 11)
MoS (Mixture of Softmaxes)
,Softmax Bottleneck
Neural Speed Reading via Skim-RNN (2017. 11)
Skim-RNN
,Speed Reading
,Big(Read)-Small(Skim)
,Dynamic
Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks (2017. 11)
SCAN
,Compositional
,Mix-and-Match
Hierarchical Text Generation and Planning for Strategic Dialogue (2017. 12)
End2End Strategic Dialogue
,Latent Sentence Representations
,Planning + RL
Recent Advances in Recurrent Neural Networks (2018. 1)
RNN
,Recent Advances
,Review
Personalizing Dialogue Agents: I have a dog, do you have pets too? (2018. 1)
Chit-chat
,Profile Memory
,Persona-Chat Dataset
,ParlAI
Generating Wikipedia by Summarizing Long Sequences (2018. 1)
Multi-Document Summarization
,Extractive-Abstractive Stage
,T-DMCA
,WikiSum
,Google Brain
MaskGAN: Better Text Generation via Filling in the__ (2018. 1)
MaskGAN
,Neural Text Generation
,RL Approach
Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs (2018. 1)
Contextual Decomposition (CD)
,Disambiguate interactions between Gates
Universal Language Model Fine-tuning for Text Classification (2018. 1)
ULMFiT
,Pre-trained
,Transfer Learning
DeepType: Multilingual Entity Linking by Neural Type System Evolution (2018. 2)
DeepType
,Symbolic Information
,Type System
,Open AI
Ranking Sentences for Extractive Summarization with Reinforcement Learning (2018. 2)
Document-Summarization
,Cross-Entropy vs RL
,Extractive
code2vec: Learning Distributed Representations of Code (2018. 3)
code2vec
,Code Embedding
,Predicting method name
Universal Sentence Encoder (2018. 3)
Transformer
,Deep Averaging Network (DAN)
,Transfer
An efficient framework for learning sentence representations (2018. 3)
Sentence Representation
,True Context
,Unsupervised
An Analysis of Neural Language Modeling at Multiple Scales (2018. 3)
LSTM vs QRNN
,Hyperparemeter
,AWD-QRNN
Analyzing Uncertainty in Neural Machine Translation (2018. 3)
Uncertainty
,Beam Search Degradation
,Copy Mode
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (2018. 3)
Temporal Convolutional Network (TCN)
,CNN vs RNN
Training Tips for the Transformer Model (2018. 4)
Transformer
,Hyperparameter
,Multiple GPU
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension (2018. 4)
QA
,Conv - Self-Attention
,Backtranslation (Data Augmentation)
SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach (2018. 4)
Top-K Subject Recognitio
,Relation Classification
Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer (2018. 4)
Sentiment Transfer
,Disentangle Attribute
,Unsupervised
Parsing Tweets into Universal Dependencies (2018. 4)
Universal Dependencies (UD)
,TWEEBANK v2
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates (2018. 4)
SR
,Subword Sampling + Hyperparameter
,Segmentation (BPE, Unigram)
Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension (2018. 4)
PI-SQuAD
,Challenge
,Document Encoder
,Scalability
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (2018. 4)
GLUE
,Benchmark
,Understanding
On the Practical Computational Power of Finite Precision RNNs for Language Recognition (2018. 5)
Unbounded counting
,IBFP-LSTM
Paper Abstract Writing through Editing Mechanism (2018. 5)
Writing-editing Network
,Attentive Revision Gate
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings (2018. 5)
Unsupervised initialization scheme
,Robust self-leraning
Global-Locally Self-Attentive Dialogue State Tracker (2018. 5)
GLAD
,WoZ and DSTC2 Dataset
Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information (2018, 5)
Dataset
,EVPI
,ACL 2018 Best Paper
Know What You Don't Know: Unanswerable Questions for SQuAD (2018, 6)
SQuAD 2.0
,Negative Example
,ACL 2018 Best Paper
The Natural Language Decathlon: Multitask Learning as Question Answering (2018, 6)
decaNLP
,Multitask Question Answering Network (MQAN)
,Transfer Learning
GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations (2018, 6)
Transfer Learning Framework
,Structured Graphical Representations
Improving Language Understanding by Generative Pre-Training (2018, 6)
Transformer
,Generative Pre-Training
,Discriminative Fine-Tuning
Finding Syntax in Human Encephalography with Beam Search (2018, 6)
RNNG+beam search
,ACL 2018 Best Paper
Let's do it "again": A First Computational Approach to Detecting Adverbial Presupposition Triggers (2018, 6)
Task
,Dataset
,Weighted-Pooling (WP)
ACL 2018 Best Paper
QuAC : Question Answering in Context (2018. 8)
Information-Seeking dialog
,Challenge
,Without Evidence
CoQA: A Conversational Question Answering Challenge (2018. 8)
Abstractive with Extractive Rationale
,Challenge
,Coreference and Pragmatic Reasoning
Contextual Parameter Generation for Universal Neural Machine Translation (2018. 8)
Parameter Generation
,Language Embedding
,EMNLP 2018
Evaluating Theory of Mind in Question Answering (2018. 8)
Dataset
,Higher-order Beliefs
,EMNLP 2018
Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text (2018. 9)
GRAFT-Net
,KB+Text Fusion
,EMNLP 2018
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (2018. 9)
Dataset
,Multi-hop
,Sentence-level Supporting Fact
,EMNLP 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018. 10)
BERT
,Discriminative
,Pre-trained
,Transfer Learning
,NAACL 2019 Best
Trellis Networks for Sequence Modeling (2018. 10)
TrellisNet
,Structural bridge between TCN and RNN
,NAACL 2019
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (2018. 11)
CommonsenseQA
,Dataset
,Multiple-Choice
,NAACL 2019 Best
Cross-lingual Language Model Pretraining (2019. 1)
XLM
,MLM + TLM
,Cross-lingual Pre-trained
,Low-Resource
Parameter-Efficient Transfer Learning for NLP (2019. 2)
Adapter tuning
,Bottleneck
,BERT
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (2019. 3)
Fine-tuning vs Feature
,BERT and ELMo
,Empirically analyze
Linguistic Knowledge and Transferability of Contextual Representations (2019. 3)
Analysis CWRs
,LSTM, Transformer
,Transferable
,NAACL 2019
ERNIE: Enhanced Representation through Knowledge Integration (2019. 4)
ERNIE
,Masking Strategies
,Dialog Language Model
,Pre-trained
,Transfer Learning
CNM: An Interpretable Complex-valued Network for Matching (2019. 4)
CNM
,Quantum Physics
,Interpretable
,NAACL 2019 Best
Unsupervised Recurrent Neural Network Grammars (2019. 4)
RNNG
,Syntax Tree
,Variational Inference
The Curious Case of Neural Text Degeneration (2019. 4)
Nucleus Sampling
,Decoding Method
,Generation
Unified Language Model Pre-training for Natural Language Understanding and Generation (2019. 5)
UniLM
,Uni + Bi + S2S
,Generation
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (2019. 5)
SuperGLUE
,Benchmark
,Understanding
SpanBERT: Improving Pre-training by Representing and Predicting Spans (2019. 7)
SpanBERT
,Span Boundary Objective (SBO)
,Pre-train
,Transformer
RoBERTa: A Robustly Optimized BERT Pretraining Approach (2019. 7)
RoBERTa
,Data-BatchSize
,Pre-train
,Transformer
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (2019. 7)
ERNIE
,Continual Pre-training
,Word-Struct-Semantic
,Transformer
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (2019. 8)
StructBERT (ALICE)
,Language Structure
,Pre-train
,Transformer
One-Shot/Few-Shot/Meta Learing
Matching Networks for One Shot Learning (2016. 6)
Matching Nets
,Non-Parametric
,DeepMind
SMASH: One-Shot Model Architecture Search through HyperNetworks (2017. 8)
SMASH
,HyperNet
,Prior Knowledge
Reptile: a Scalable Metalearning Algorithm (2018. 3)
Reptile
,Meta-Learning
,Few-Shot
,OpenAI
Optimization
On the difficulty of training Recurrent Neural Networks (2012. 11)
Gradient Clipping
,RNN
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units (2015. 4)
Weight Initialization
,RNN
,Identity Matrix
Cyclical Learning Rates for Training Neural Networks (2015. 6)
CLR
,Triangular, ExpRange
,Longtherm Benefit
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima (2016. 9)
Generalization
,Sharpness of Minima
Neural Optimizer Search with Reinforcement Learning (2017. 9)
Neural Optimizer Search (NOS)
,PowerSign
,AddSign
On the Convergence of Adam and Beyond (2018. 2)
AMSGrad
,Convex optimization
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost (2018. 4)
Adafactor
,Adaptive Method
,Update Clipping
Revisiting Small Batch Training for Deep Neural Networks (2018. 4)
Generalization Performance
,Training Stability
Reconciling modern machine learning and the bias-variance trade-off (2018. 12)
Double Descent Risk Curve
,Highly Complex Models
Reinforcement Learning
Progressive Neural Networks (2016. 6)
ProgNN
,Incorporate Prior Knowledge
Neural Architecture Search with Reinforcement Learning (2016. 11)
NAS
,Google AutoML
,Google Brain
Third-Person Imitation Learning (2017. 3)
Imitation Learning
,Unsupervised (Third-Person)
,GAN + Domain Confusion
Efficient Neural Architecture Search via Parameter Sharing (2018. 2)
ENAS
,Google AutoML
,Google Brain
Investigating Human Priors for Playing Video Games (2018. 2)
prior knowledge
,key factor
World Models (2018. 3)
Generative + RL
,VAE (V)
,MDN-RNN (M)
,Controller (C)
Unsupervised Predictive Memory in a Goal-Directed Agent (2018. 3)
MERLIN
,Memory + RL + Inference
,Partial Observability
Transfer Learning
...
Unsupervised & Generative
Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data (2016. 5)
DVBF
,Variational Inference
,SVGB
Structured Inference Networks for Nonlinear State Space Models (2016. 9)
Structured Variational Approximation
,SVGB
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework (2016. 11)
Beta-VAE
,Disentangled
A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning (2017. 10)
Kalman VAE
,LGSSM
Self-Attention Generative Adversarial Networks (2018. 5)
SAGAN
,Attention-Driven
,Spectral Normalization
Unsupervised Data Augmentation (2019. 4)
UDA
,TSA Schedule
,Semi-Supervised
Last updated