Skip to content
Commit ff74ca94 authored by Ning Dong's avatar Ning Dong Committed by Facebook Github Bot
Browse files

Support dataset upsampling / relative ratio in PytorchTranslateTask (#494)

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/494

Pull Request resolved: https://github.com/pytorch/fairseq/pull/657

Library side change split from D14924942

Added 2 arguments for load_dataset in PytorchTranslateTask
1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus.

2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map.

At most one of them could be specified.

Reviewed By: liezl200

Differential Revision: D15041293

fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad
parent da9e493e
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment