(2017. 10) Multi Paragraph RC
Submitted on 2017. 10
Christopher Clark, Matt Gardner
Simple Summary
Consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the model to produce globally correct output. We combine this method with a state-of-the-art pipeline for training models on document QA data.
requires adapting a paragraph-level model to process document-level input.
entire documents as input is extremely computationally expensive
Pipelined Method
select a single paragraph from the input documents (the smallest TF-IDF cosine distance with the question)
selected paragraph containing the correct answer: 83.% -> 85.1% on TriviaQA
handling Noisy Labels
use a summed objective function that optimizes the sum of the proba- bilities of all answer spans.
Model
Embeding: word-level + char-level
Pre-Process: bi-directional GRU
Attention: BiDAF -> Self-Attention
Prediction, variational dropout
Confidence based methods
the boundary-based models we use here, a span’s score is the sum of the start and end score given to its start and end token. At test time we run the model on each paragraph and select the answer span with the highest confidence.
hypothesize that there are two key reasons a model’s confidence scores might not be well calibrated.
for models trained with the softmax objective, the pre-softmax scores for all spans can be arbitrarily increased or decreased by a constant value without changing the resulting softmax probability distribution.
if the model only sees paragraphs that contain answers, it might become too confident in heuristics or patterns that are only effective when it is known a priori that an answer exists.
four approaches to training models to produce comparable confidence scores
Shared-Normalization: a modified objective function is used where the normalization fac- tor in the softmax operation is shared between all paragraphs from the same context. (this will force the model to produce scores that are comparable between paragraphs)
Merge: concatenating all paragraphs sam- pled from the same context together
No-Answer Option
Sigmoid
Experiments
the shared-norm approach to be the most effective way
found that models that have only been trained on answer-containing paragraphs can perform very poorly in the multi-paragraph setting. (SQuAD)
the paragraphs are shorter, so the model had less exposure to irrelevant text.
Last updated