Guillaume Lample
Paris, Île-de-France, France
9 k abonnés
+ de 500 relations
Activité
-
#PAISS2025: To conclude the summer school, Timothee Lacroix is sharing his experience at Mistral AI
#PAISS2025: To conclude the summer school, Timothee Lacroix is sharing his experience at Mistral AI
Aimé par Guillaume Lample
-
Mistral AI raises 1.7B€ to accelerate technological progress with AI! Exciting times ahead, please consider applying to solve some of the toughest…
Mistral AI raises 1.7B€ to accelerate technological progress with AI! Exciting times ahead, please consider applying to solve some of the toughest…
Aimé par Guillaume Lample
-
Mistral Medium 3.1 just landed on LMArena leaderboard—punching way above its weight! 🏆 #1 in English (no Style Control) 🏆 2nd overall (no Style…
Mistral Medium 3.1 just landed on LMArena leaderboard—punching way above its weight! 🏆 #1 in English (no Style Control) 🏆 2nd overall (no Style…
Aimé par Guillaume Lample
Expérience
-
Mistral AI
-
-
-
-
-
-
-
-
-
-
Formation
Publications
-
Phrase-Based & Neural Unsupervised Machine Translation
EMNLP
Voir la publicationMachine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the…
Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT'14 English-French and WMT'16 German-English benchmarks, our models respectively obtain 28.1 and 25.2 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points. On low-resource languages like English-Urdu and English-Romanian, our methods achieve even better results than semi-supervised and supervised approaches leveraging the paucity of available bitexts. Our code for NMT and PBSMT is publicly available.
-
Unsupervised Machine Translation Using Monolingual Corpora Only
ICLR
Voir la publicationMachine translation has recently achieved impressive performance thanks to recent
advances in deep learning and the availability of large-scale parallel corpora.
There have been numerous attempts to extend these successes to low-resource language
pairs, yet requiring tens of thousands of parallel sentences. In this work, we
take this research direction to the extreme and investigate whether it is possible to
learn to translate even without any parallel data. We propose a model…Machine translation has recently achieved impressive performance thanks to recent
advances in deep learning and the availability of large-scale parallel corpora.
There have been numerous attempts to extend these successes to low-resource language
pairs, yet requiring tens of thousands of parallel sentences. In this work, we
take this research direction to the extreme and investigate whether it is possible to
learn to translate even without any parallel data. We propose a model that takes
sentences from monolingual corpora in two different languages and maps them
into the same latent space. By learning to reconstruct in both languages from this
shared feature space, the model effectively learns to translate without using any
labeled data. We demonstrate our model on two widely used datasets and two
language pairs, reporting BLEU scores up to 32.8, without using even a single
parallel sentence at training time. -
Word Translation Without Parallel Data
ICLR
Voir la publicationState-of-the-art methods for learning cross-lingual word embeddings have relied
on bilingual dictionaries or parallel corpora. Recent studies showed that the need
for parallel data supervision can be alleviated with character-level information.
While these methods showed encouraging results, they are not on par with their
supervised counterparts and are limited to pairs of languages sharing a common
alphabet. In this work, we show that we can build a bilingual dictionary…State-of-the-art methods for learning cross-lingual word embeddings have relied
on bilingual dictionaries or parallel corpora. Recent studies showed that the need
for parallel data supervision can be alleviated with character-level information.
While these methods showed encouraging results, they are not on par with their
supervised counterparts and are limited to pairs of languages sharing a common
alphabet. In this work, we show that we can build a bilingual dictionary between
two languages without using any parallel corpora, by aligning monolingual word
embedding spaces in an unsupervised way. Without using any character information,
our model even outperforms existing supervised methods on cross-lingual
tasks for some language pairs. Our experiments demonstrate that our method
works very well also for distant language pairs, like English-Russian or EnglishChinese.
We finally describe experiments on the English-Esperanto low-resource
language pair, on which there only exists a limited amount of parallel data, to show
the potential impact of our method in fully unsupervised machine translation. Our
code, embeddings and dictionaries are publicly available. -
Fader Networks: Manipulating Images by Sliding Attributes
NIPS
Voir la publicationThis paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for…
This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for applications where users can modify an image using sliding knobs, like faders on a mixing console, to change the facial expression of a portrait, or to update the color of some objects. Compared to the state-of-the-art which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to multiple attributes. We present evidence that our model can significantly change the perceived value of the attributes while preserving the naturalness of images.
-
Playing FPS Games with Deep Reinforcement Learning
AAAI
Voir la publicationAdvances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input…
Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input for training. We present a method to augment these models to exploit game feature information such as the presence of enemies or items, during the training phase. Our model is trained to simultaneously learn these features along with minimizing a Q-learning objective, which is shown to dramatically improve the training speed and performance of our agent. Our architecture is also modularized to allow different models to be independently trained for different phases of the game. We show that the proposed architecture substantially outperforms built-in AI agents of the game as well as humans in deathmatch scenarios.
-
Neural Architectures for Named Entity Recognition
NAACL
Voir la publicationState-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures---one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information…
State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures---one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words: character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora. Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers.
Plus d’activités de Guillaume
-
Meet the new Codestral 25.08! It offers significant upgrades, and now excels at prioritizing the most helpful suggestions, reducing distractions to…
Meet the new Codestral 25.08! It offers significant upgrades, and now excels at prioritizing the most helpful suggestions, reducing distractions to…
Aimé par Guillaume Lample
-
🚀 Exciting news! The Mistral AI Legal team is expanding again (this time in the US): we are looking for a Commercial Legal Counsel to join our team!…
🚀 Exciting news! The Mistral AI Legal team is expanding again (this time in the US): we are looking for a Commercial Legal Counsel to join our team!…
Aimé par Guillaume Lample
-
🎤 Discover Voxtral, our first audio models: cutting-edge, open-source speech recognition models under Apache 2.0, perfect for transcription…
🎤 Discover Voxtral, our first audio models: cutting-edge, open-source speech recognition models under Apache 2.0, perfect for transcription…
Aimé par Guillaume Lample
-
Introducing AI for Citizens, an initiative designed to empower public institutions to harness the benefits of AI for their citizens. Building on…
Introducing AI for Citizens, an initiative designed to empower public institutions to harness the benefits of AI for their citizens. Building on…
Aimé par Guillaume Lample
-
We started Genesis AI to solve general-purpose robotics and unlock unlimited physical labor. We’re backed by $105M from Eclipse, Khosla Ventures…
We started Genesis AI to solve general-purpose robotics and unlock unlimited physical labor. We’re backed by $105M from Eclipse, Khosla Ventures…
Aimé par Guillaume Lample