multihead_attention: pre-transpose incremental state (#232) (265f42b7) · Commits · Simon Will / fairseq

Commit 265f42b7 authored Oct 05, 2018 by James Cross Committed by Facebook Github Bot Oct 05, 2018

multihead_attention: pre-transpose incremental state (#232)

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/232

Though transpose operations are essentially free during PyTorch execution, they can result in costly operations when exported to Caffe2 inference nets via ONNX tracing, especially when applied repeatedly to large tensors.

For this reason, we update `MultiheadAttention` to store its incremental state with shape (bsz, num_heads, seq_len, head_dim), that is after transposing the projected input. This should result in non-trivially faster exported models without changing the semantics or speed of PyTorch execution.

Reviewed By: myleott

Differential Revision: D10186506

fbshipit-source-id: 8a42712423ee767ea49ed88d2a4653f900d14fba

parent b9e29a47

Hide whitespace changes

Inline Side-by-side

Please register or to comment