(2017. 3) Self Attn Sentence Embed

Submitted on 2017. 3
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou and Yoshua Bengio

Simple Summary

Proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding.

A self-attention mechanism for these sequential models to replace the max pooling or averaging step.
- allows extracting different aspects of the sentence into multiple vector representations.
- interpreting the extracted embedding becomes very easy and explicit.
Model
- bidirectional LSTM -> concat -> self-attention
- the final sentence embedding to directly access previous LSTM hidden states via the attention summation.
- a = softmax(w_s2 tanh (W_s1 H^T))
- M = AH M: r-by-2u embedding matrix, A: annotation matrix, H: LSTM hidden states
- Penalization Term: P = ||(AA^T - I)||_F^2

Experiments
- The Author Profiling dataset
- Sentiment Analysis: Yelp and Age Dataset
- Textual Entailment: the SNLI corpus
able to encode any sequence with variable length into a fixed size representation, without suffering from long-term dependency problems.
not able to train it in an unsupervised way.

Previous(2017. 12) Contextualized Word For RC Next(2017. 6) Slicenet

Last updated 5 years ago