Skip to content
Commit 92a6c548 authored by Kartikay Khandelwal's avatar Kartikay Khandelwal Committed by Facebook Github Bot
Browse files

Refactor BERTDataset to the more general MaskedLMDataset

Summary: The current BERTDataset has a lot of components needed for generic MaskedLM training but is too restrictive in terms of the assumptions it makes - two blocks being masked, the special tokens used for the sentence embedding as well as the separator etc. In this diff I refactor this dataset and at the same time add make some of the parameters including the probabilities associated with masking configurable.

Reviewed By: rutyrinott

Differential Revision: D14222467

fbshipit-source-id: e9f78788dfe7f56646ba09c62967c4c0bd30aed8
parent 4d59517f
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment