(2017. 3) Self Attn Sentence Embed
Submitted on 2017. 3
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou and Yoshua Bengio
Simple Summary
Proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding.
A self-attention mechanism for these sequential models to replace the max pooling or averaging step.
allows extracting different aspects of the sentence into multiple vector representations.
interpreting the extracted embedding becomes very easy and explicit.
Model
bidirectional LSTM -> concat -> self-attention
the final sentence embedding to directly access previous LSTM hidden states via the attention summation.
a = softmax(w_s2 tanh (W_s1 H^T))
M = AH
M: r-by-2u embedding matrix, A: annotation matrix, H: LSTM hidden statesPenalization Term:
P = ||(AA^T - I)||_F^2
Experiments
The Author Profiling dataset
Sentiment Analysis: Yelp and Age Dataset
Textual Entailment: the SNLI corpus
able to encode any sequence with variable length into a fixed size representation, without suffering from long-term dependency problems.
not able to train it in an unsupervised way.
Last updated