Related Works. RNN-based Encoder-decoder. LongT GAN (2018).
Related Works. RNN-based Encoder-decoder. LongT GAN (2018).
Motivation. Vaswani, Ashish, et al. "Attention is all you need." Neurips 2017. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv 2018..