Exploiting Monolingual Data at Scale for Neural Machine Translation Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks How Contextual are Contextualized Word Representations? Corpora Generation for Grammatical Error Correction Understanding Back-Translation at Scale On the Limitations of Unsupervised Bilingual Dictionary Induction Guiding neural machine translation with retrieved translation pieces Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling Memory-augmented Neural Machine Translation Lexically constrained decoding for sequence generation using grid beam search Deep Neural Machine Translation with Linear Associative Unit A convolutional encoder model for neural machine translation