Denoising autoencoder task (#251) (c9c660c0) · Commits · Simon Will / fairseq

Commit c9c660c0 authored Nov 01, 2018 by Liezl Puzon Committed by Facebook Github Bot Nov 01, 2018

Denoising autoencoder task (#251)

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/251

We should use shared encoder and separate decoders as in:

https://fb.facebook.com/groups/2156114531381111/permalink/2169028113423086/

Generation is a hack, ideally the net input should have the lang pair info so that when we pass the sample to the model, it can select the correct encoder/decoder pair.

diff [2/2] will be for flow integration for basic experimentation

TODO in a future diff: figure out how to generalize this so export will work??

This works with vocab reduction, but we only support vocab reduction for src-tgt, not src-src model. A future (lowpri) task could be to add word prediction vocab reduction for src-src model to speed up training.

Reviewed By: xianxl

Differential Revision: D10512576

fbshipit-source-id: 545d96cad8e814b9da7be102a48cc5cac358b758

parent 5bbd148e

Hide whitespace changes

Inline Side-by-side

Please register or to comment