Lip Reading Datasets
LRW, LRS2, LRS3

LRW, LRS2 and LRS3 are audio-visual speech recognition datasets collected from in the wild videos.

6M +

word instances

800 +

hours

5,000 +

identities


Download

The dataset consists of two versions, LRW and LRS2. Each version has it's own train/test split. For each we provide cropped face tracks and the corresponding subtitles. There is no overlap between the two versions.


LRW (BBC)

Up to 1000 utterances of 500 different words

LRS2 (BBC)

1000s of natural sentences from the British television

LRS3 (TED)

1000s of natural sentences from TED and TEDx videos

Publications

Please cite the following if you make use of the dataset.

[1] J. S. Chung, A. Zisserman
Asian Conference on Computer Vision, 2016

[2] J. S. Chung, A. Senior, O. Vinyals, A. Zisserman
IEEE Conference on Computer Vision and Pattern Recognition, 2017

[3] J. S. Chung, A. Zisserman
British Machine Vision Conference, 2017

Applications

The audio-visual datasets can be used for a number of applications including:

Acknowledgements

This work is supported by the EPSRC programme grant Seebibyte EP/M013774/1: Visual Search for the Era of Big Data. We are very grateful to Rob Cooper and Matt Haynes at BBC Research & Development for help in providing the LRW and the LRS2 datasets.

Copyright © Visual Geometry Group.