Guillaume Lample

Guillaume Lample

Paris, Île-de-France, France
9 k abonnés + de 500 relations

Activité

S’inscrire pour voir toute l’activité

Expérience

  • Mistral AI

  • -

    Paris Area, France

  • -

  • -

    Menlo Park

  • -

    Région de la baie de San Francisco, États-Unis

  • -

    London, Royaume-Uni

Formation

Publications

  • Phrase-Based & Neural Unsupervised Machine Translation

    EMNLP

    Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the…

    Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT'14 English-French and WMT'16 German-English benchmarks, our models respectively obtain 28.1 and 25.2 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points. On low-resource languages like English-Urdu and English-Romanian, our methods achieve even better results than semi-supervised and supervised approaches leveraging the paucity of available bitexts. Our code for NMT and PBSMT is publicly available.

    Voir la publication
  • Unsupervised Machine Translation Using Monolingual Corpora Only

    ICLR

    Machine translation has recently achieved impressive performance thanks to recent
    advances in deep learning and the availability of large-scale parallel corpora.
    There have been numerous attempts to extend these successes to low-resource language
    pairs, yet requiring tens of thousands of parallel sentences. In this work, we
    take this research direction to the extreme and investigate whether it is possible to
    learn to translate even without any parallel data. We propose a model…

    Machine translation has recently achieved impressive performance thanks to recent
    advances in deep learning and the availability of large-scale parallel corpora.
    There have been numerous attempts to extend these successes to low-resource language
    pairs, yet requiring tens of thousands of parallel sentences. In this work, we
    take this research direction to the extreme and investigate whether it is possible to
    learn to translate even without any parallel data. We propose a model that takes
    sentences from monolingual corpora in two different languages and maps them
    into the same latent space. By learning to reconstruct in both languages from this
    shared feature space, the model effectively learns to translate without using any
    labeled data. We demonstrate our model on two widely used datasets and two
    language pairs, reporting BLEU scores up to 32.8, without using even a single
    parallel sentence at training time.

    Voir la publication
  • Word Translation Without Parallel Data

    ICLR

    State-of-the-art methods for learning cross-lingual word embeddings have relied
    on bilingual dictionaries or parallel corpora. Recent studies showed that the need
    for parallel data supervision can be alleviated with character-level information.
    While these methods showed encouraging results, they are not on par with their
    supervised counterparts and are limited to pairs of languages sharing a common
    alphabet. In this work, we show that we can build a bilingual dictionary…

    State-of-the-art methods for learning cross-lingual word embeddings have relied
    on bilingual dictionaries or parallel corpora. Recent studies showed that the need
    for parallel data supervision can be alleviated with character-level information.
    While these methods showed encouraging results, they are not on par with their
    supervised counterparts and are limited to pairs of languages sharing a common
    alphabet. In this work, we show that we can build a bilingual dictionary between
    two languages without using any parallel corpora, by aligning monolingual word
    embedding spaces in an unsupervised way. Without using any character information,
    our model even outperforms existing supervised methods on cross-lingual
    tasks for some language pairs. Our experiments demonstrate that our method
    works very well also for distant language pairs, like English-Russian or EnglishChinese.
    We finally describe experiments on the English-Esperanto low-resource
    language pair, on which there only exists a limited amount of parallel data, to show
    the potential impact of our method in fully unsupervised machine translation. Our
    code, embeddings and dictionaries are publicly available.

    Voir la publication
  • Fader Networks: Manipulating Images by Sliding Attributes

    NIPS

    This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for…

    This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for applications where users can modify an image using sliding knobs, like faders on a mixing console, to change the facial expression of a portrait, or to update the color of some objects. Compared to the state-of-the-art which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to multiple attributes. We present evidence that our model can significantly change the perceived value of the attributes while preserving the naturalness of images.

    Voir la publication
  • Playing FPS Games with Deep Reinforcement Learning

    AAAI

    Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input…

    Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input for training. We present a method to augment these models to exploit game feature information such as the presence of enemies or items, during the training phase. Our model is trained to simultaneously learn these features along with minimizing a Q-learning objective, which is shown to dramatically improve the training speed and performance of our agent. Our architecture is also modularized to allow different models to be independently trained for different phases of the game. We show that the proposed architecture substantially outperforms built-in AI agents of the game as well as humans in deathmatch scenarios.

    Voir la publication
  • Neural Architectures for Named Entity Recognition

    NAACL

    State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures---one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information…

    State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures---one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words: character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora. Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers.

    Voir la publication

Plus d’activités de Guillaume

Voir le profil complet de Guillaume

  • Découvrir vos relations en commun
  • Être mis en relation
  • Contacter Guillaume directement
Devenir membre pour voir le profil complet

Ajoutez de nouvelles compétences en suivant ces cours