tl;drRNNS work great for text but convolutions can do it fasterAny part of a sentence can influence the semantics of a word. For that reason we want our network to see the entire input at onceGetting that big a receptive can make gradients vanish and our networks failWe can solve the vanishing gradient problem with DenseNets or Dilated ConvolutionsSometimes we need to generate text. We can use “de
