The Oxford-BBC Lip Reading in the Wild (LRW) Dataset


Overview

This page contains the download links to the Lip Reading in the Wild (LRW) dataset, described in [1].

The dataset consists of up to 1000 utterances of 500 different words, spoken by hundreds of different speakers. All videos are 29 frames (1.16 seconds) in length, and the word occurs in the middle of the video. The word duration is given in the metadata, from which you can determine the start and end frames. The dataset statistics are given in the table below. The full list of classes in the dataset is given here.

SetDates# class# per class
Train01/01/2010 - 31/08/2015500800-1000
Validation01/09/2015 - 24/12/201550050
Test01/01/2016 - 30/09/201650050



Example and visualisation

An example video and the corresponding metadata can be found in the link below. Please note that your web browser may not play the mp4 file correctly.

Example mp4 video
Example metadata

Visualisation of video clips for selected words
(with thanks to Donglai Wei for providing these)

Downloads


The package including the videos and the metadata is available for non-commercial, academic research. You will need to sign a Data Sharing agreement with BBC Research & Development before getting access. To download a copy of the agreement please go to the BBC Lip Reading in the Wild and Lip Reading Sentences in the Wild Datasets page. Once approved, you will be supplied with a password, and the package can then be downloaded below. Please cite [1] below if you make use of the dataset.

For all technical questions, please contact the author of [1].


File MD5 Checksum
Part ADownload 474f255cdf6da35f41824d2b8a00d076
Part BDownload ef03d6ab52d14de38db23365e2e09308
Part CDownload 532343bbb5f14ab14623c5cce5c8b930
Part DDownload 78709823e18c3906e49b99536c5343de
Part EDownload abb5fcf3480f2899d09d0171b716026f
Part FDownload b311feea9705533350a030811501f859
Part GDownload 37e525220e8d47bc7b8bee4753131390


Each part is 10GB. Download all parts and concatenate the files using the command cat lrw-v1* > lrw-v1.tar, and then uncompress by typing tar -xvf lrw-v1.tar. Train, validation and test sets are all contained in the package.

Publications


[1] J. S. Chung, A. Zisserman
Asian Conference on Computer Vision, 2016