- May 06, 2019
-
-
Simon Will authored
-
- May 04, 2019
-
-
Kritika Singh authored
Summary: See comment Reviewed By: jay-mahadeokar Differential Revision: D15070187 fbshipit-source-id: ffefca0effb2cc866ce6fa22a59d5419b592fb7b
-
- May 03, 2019
-
-
Yongqiang Wang authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairspeq/pull/2 Pull Request resolved: https://github.com/pytorch/fairseq/pull/689 We found not raising OOM during trainer.train_step causes various issue, including NCCL hangs / gloo sync errors because gradient is not synced properly. Before we found the root cause, let's give users an option to raise OOMs. Reviewed By: jmp84 Differential Revision: D15170357 fbshipit-source-id: 3e15e4e111a8380612157955509c39821a216ec4
-
Naman Goyal authored
Summary: Added bert_large architecture Pull Request resolved: https://github.com/pytorch/fairseq/pull/698 Differential Revision: D15198698 Pulled By: myleott fbshipit-source-id: 1dc9e8d4c8c877d15afffe5fe581b4b93eefbc66
-
- May 02, 2019
-
-
Peng-Jen Chen authored
Summary: - Add learned positional embedding binary flag to masked LM model. - Add base arch config for masked LM model which sets all the binary parameters to False. Otherwise some of the binary flag parameters will always be override by config in `xlm_architecture` (e.g. encoder_learned_pos) Reviewed By: liezl200 Differential Revision: D15054487 fbshipit-source-id: d78827f352b9160a89c9dc4f45b9fce15a2f234d
-
Myle Ott authored
Summary: This should make rendezvous happen as lazily as possible. Pull Request resolved: https://github.com/pytorch/fairseq/pull/687 Differential Revision: D15151145 Pulled By: myleott fbshipit-source-id: d70816a85414c5d509a6b12e2b339b4736db2c88
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/693 Differential Revision: D15174831 fbshipit-source-id: 98688b1269ead5694e5116659ff64507d3c0d1c0
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/692 Differential Revision: D15174954 fbshipit-source-id: 1a7bff9aeed3e2cc658577be9d79e8c9f72314c2
-
Kritika Singh authored
Summary: Changes include: 1. Added get_normalized_probabilities to the encoder-only base class FairseqEncoderModel 2. Made CTCCriterion work for both batch_first (LSTMSubsampleEncoderModel) and batch_second (LSTMEncoderOnly) encoder types 3. Added tests for different encoder and CTC combinations. TODO: CTC still doesn't work for VGGLSTMEncoderModel so I have disabled that. Will debug and send out fix in another diff. Reviewed By: jay-mahadeokar Differential Revision: D15158818 fbshipit-source-id: acb484bad705c937d676d2c3dcde3e3562d68ed9
-
- May 01, 2019
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/691 Differential Revision: D15172543 Pulled By: myleott fbshipit-source-id: f2b626ff7f5e95f0ddc83c105af7ab9d092a135e
-
taineleau authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/684 Differential Revision: D15154631 Pulled By: myleott fbshipit-source-id: 5e7dd9651d9ed239b60c51b9a11d08c80307d3ba
-
Ning Dong authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/494 Pull Request resolved: https://github.com/pytorch/fairseq/pull/657 Library side change split from D14924942 Added 2 arguments for load_dataset in PytorchTranslateTask 1. dataset_upsampling. A nested dictionary {direction:{dataset: upsampling_ratio}}. Upsampling_ratio larger than one mean that the bitext is ob- served more often than actually present in the combined bitext and synthetic training corpus. 2. dataset_relative_ratio. A tuple (dataset, ratio). The ratio represents the frequency certain dataset gets sampled to the rest of corpora map. At most one of them could be specified. Reviewed By: liezl200 Differential Revision: D15041293 fbshipit-source-id: 92daad29895c234e26d1b19f121106118a3957ad
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/685 Differential Revision: D15154647 Pulled By: myleott fbshipit-source-id: 36c72359755192a4a53367e19f8dd006791d483c
-
Ning Dong authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/664 Previously arguments for noising (dropout_prob for WordDropout and max_shuffle_distance for WordShuffle) are only passed in noising() so it could not be customized in NoisingDataset. Now add default argument in initializer so the value could be specified at construction. Reviewed By: liezl200 Differential Revision: D15071632 fbshipit-source-id: 59a9bf5a5e6d03c1e74f1b31c1927e221cb11dfa
-
- Apr 30, 2019
-
-
Naman Goyal authored
Summary: Co-authored-by: jingfeidu <jingfeidu@fb.com> The implementation is by Jingfei Du from branch "bigbert". Copied over to this CR to get it merged in isolation since other changes seem to be already in master. **Small changes from original:** Added following line in `__init__` as discovered by myleott : ``` self.optimizer.set_lr(self.warmup_factor * self.lr) ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/683 Reviewed By: myleott Differential Revision: D15149628 Pulled By: myleott fbshipit-source-id: 5f715611182cdd111e636c66d5f24aa88fa03e29
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/682 Differential Revision: D15147735 Pulled By: myleott fbshipit-source-id: 4a5f12c0b24591f964fe1f465be3775a67578e79
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/681 Differential Revision: D15147107 fbshipit-source-id: 4452c98059586a4d748868a7659329285a76d5ef
-
Myle Ott authored
Summary: - Add --add-bos-token option to LM task - Cleanup utils.py and options.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/654 Differential Revision: D15041794 Pulled By: myleott fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/672 title Reviewed By: jmp84, pipibjc Differential Revision: D15094977 fbshipit-source-id: c24e4ec9355b53e1585ac4da32809f1c339c7364
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/680 Some embedding names were renamed but this one was missed So far I've only seen this affect our runs during continuing training. If you encountered any errors when continuing training from an XLM save_dir, rebasing past this diff (or patching this and canarying) should fix the problem Reviewed By: pipibjc Differential Revision: D15137463 fbshipit-source-id: c72067f16aaf1ba2b8286938bd25a19b70ae8712
-
- Apr 29, 2019
-
-
Myle Ott authored
Summary: Add missing backslash. Pull Request resolved: https://github.com/pytorch/fairseq/pull/679 Differential Revision: D15122270 Pulled By: myleott fbshipit-source-id: fbdfde648051294eaa9f7a4e0c4cfbc57491a718
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676 Differential Revision: D15114128 Pulled By: myleott fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a
-
- Apr 27, 2019
-
-
Noe Casas authored
Summary: Log fairseq's `args` and `sys.argv` in tensorboard to easily identify run hyperparameters from within tensorboard. The idea was suggested in https://twitter.com/Thom_Wolf/status/1106300583835766786 Pull Request resolved: https://github.com/pytorch/fairseq/pull/673 Differential Revision: D15114159 Pulled By: myleott fbshipit-source-id: d48133a7f629dffe984836712390c317916cf413
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/669 Differential Revision: D15114160 Pulled By: myleott fbshipit-source-id: 64f4a8154c8931ddbbe459d4d4a54c46680ad6b6
-
- Apr 26, 2019
-
-
Mohammad Sadegh Rasooli authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/670 Pytorch-translate task needs to use extra arguments (such as vocabulary objects). By passing kwargs, we are able to have the ability to have extra arguments in setup_task Reviewed By: akinh, pipibjc Differential Revision: D15086810 fbshipit-source-id: 555f7976020eaac1febb8226f5a0055af0407ea6
-
- Apr 25, 2019
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/667 Use smaller models so that unittests won't timeout Reviewed By: pipibjc Differential Revision: D15056894 fbshipit-source-id: af9fbda6ea6e56cf82d52555620121b189e2f013
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/666 Option to load the XLM weights into only the encoder or the decoder Reviewed By: pipibjc Differential Revision: D14881004 fbshipit-source-id: 6d0d598ea9c445ec468f71b8e855712de89a5dac
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/629 Use GeLU as an alternate activation layer for ReLU. Reviewed By: lematt1991 Differential Revision: D14689851 fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/653 After this diff, you can train a transformer model with --activation-fn 'relu', 'gelu', or 'gelu_fast' gelu_fast is the default implementation in https://github.com/hendrycks/GELUs/blob/master/mnist_fcn.py#L72-L77 gelu is the alternate implementation in https://github.com/hendrycks/GELUs/blob/master/mnist_fcn.py#L72-L77 and the default implementation in https://github.com/facebookresearch/XLM Reviewed By: pipibjc Differential Revision: D14966006 fbshipit-source-id: 94e95fb99bd548ba47cf23b4999265c7b6833fc1
-
ankur6ue authored
Summary: Added link to blog post about incremental decoder in the FairseqIncrementalDecoder class description. Pull Request resolved: https://github.com/pytorch/fairseq/pull/662 Differential Revision: D15077845 Pulled By: myleott fbshipit-source-id: f23294721739600e14feb2cca4ece95f2b968f44
-
Angela Fan authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/665 Differential Revision: D15077853 Pulled By: huihuifan fbshipit-source-id: 2a0d3f6236ae002579f1ee72735d6d8000b8e6b6
-
- Apr 24, 2019
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/661 Differential Revision: D15068312 Pulled By: myleott fbshipit-source-id: 1216835fd4c7f83ea5e350bff83901c93ac57447
-
- Apr 22, 2019
-
-
Max Ryabinin authored
Summary: Because the size of `unfinalized_scores` is equal to current `bsz` and not initial batch size, we need to index it by `unfin_idx` instead of `sent` in `is_finished`. Fixes #588. Pull Request resolved: https://github.com/pytorch/fairseq/pull/627 Differential Revision: D15034641 Pulled By: myleott fbshipit-source-id: 2638e68e877ae01256cac7d8e69b5b7fec8f7017
-
Yongqiang Wang authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/647 the current implementation of average_checkpoints requires loading all the model parameters into memory and then do the averaging. To average large models (e.g., transformer) over a large number of checkpoints (e.g., >50), it may require over 100GB memory. Loading all the parameters is not necessary, as we know the number of models in advance. Reviewed By: skritika Differential Revision: D15027513 fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1
-
- Apr 17, 2019
-
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/641 Fix breaking import Reviewed By: pipibjc Differential Revision: D14978454 fbshipit-source-id: 7b43152cb30100881e9991ead871531ee3f60e07
-
Ning Dong authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/639 Add argument sampling_func in the constructor to enable custom sampling over a list of dataset keys. The default strategy is to sample uniformly as it did previously. Reviewed By: liezl200 Differential Revision: D14965774 fbshipit-source-id: f3285688a9ae3729c0ba12c22254c1144d0eea9e
-
Ning Dong authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/638 RT Reviewed By: liezl200 Differential Revision: D14967268 fbshipit-source-id: 2da361497743d90a841fdbf2a50085136c70b468
-
- Apr 16, 2019
-
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/635 Adding a task and relevant models, datasets and criteria needed for training Cross-lingual Language Models similar to Masked Language Model used in XLM (Lample and Conneau, 2019 - https://arxiv.org/abs/1901.07291). Reviewed By: liezl200 Differential Revision: D14943776 fbshipit-source-id: 3e416a730303d1dd4f5b92550c78db989be27073
-
- Apr 15, 2019
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/615 Differential Revision: D14933742 Pulled By: myleott fbshipit-source-id: c2c20425875743c89bbc2ac564a2fbb6ff4958b2
-
freewym authored
Summary: If arg.keep_interval_updates or args.keep_last_epochs > 0, `checkpoints` would refer to a list of checkpoint files to be removed, which can be empty. So moved the logging code to the right position. Pull Request resolved: https://github.com/pytorch/fairseq/pull/634 Differential Revision: D14933655 Pulled By: myleott fbshipit-source-id: 68182ee99d9701e1536833d31e0a7c5d2eb2d679
-