deep learning language modeling machine learning seminar #natural language processing bert nlp paper multi-class classification xlnet transformer-xl ai2 gan pytorch classification reverse kl divergence probability parameter regularization parameter of distribution image-to-image transformation face recognition face verification transformer adaptive softmax input representations position embeddings relative position embeddings back propagation learning rate local gradient gradient update optimization neural network full connected layer bayes's theorem baysian inference cross entropy curse of dimensionality entropy exponential family forward kl divergence information theory jensen-shannon divergence kullback-leibler divergence logistic sigmoid map maximum entropy distribution mle mode collapsing fever dense retrieval temporal reasoning temporal dataset implicit temporal events ai language language models dimensionality fine-tuning code generation alphacode acl text generation encoder decoder model long context conference emnlp fine tuning fast adaptation efficient fine-tuning llms roberta réformer pretrained mdoel electra replaced token detection pretrained model commonsense reasoning abductive commonsense reasoning nli nlg abductive dataset iclr iclr 2020 gpt #zero-shot learning #multi task #gpt3 #unified question answering multi-hop qa hotpotqa
See more