Skip to content
Commit 2b13f3c0 authored by Liezl Puzon's avatar Liezl Puzon Committed by Facebook Github Bot
Browse files

Support BPE end of word marker suffix in fairseq noising module

Summary:
There are 2 ways to implement BPE:
1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word
2. use a end of word marker suffix to indicate that there is no more subtokens left in the word

This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data.

Reviewed By: xianxl

Differential Revision: D12919428

fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a
parent b1521f96
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment