TVSeries Dataset


The TVSeries Dataset is a realistic, large-scale dataset for action detection. It consists of 16 hours of videos from six recent TV series. Thirty action classes are defined and all occurences are marked with start and end time. For every action instance, we provide some metadata (single person, occluded, part of the action missing...) that can be used to analyze the performance of a method on specific difficult cases.

Example video clips of the TVSeries Dataset.

Download

The annotations of the TVSeries Dataset can be downloaded here. The provided files contain detailed information on all actions of the 30 defined action classes in the selected TV series episodes: their start and end time, as well as the metadata. The split of the dataset over training, validation and test set is also included. If you encounter any problem with the annotations, do not hesitate to send us an e-mail.
The dataset consists of the first few episodes of six recent TV series. The names of the series can be found in the paper. We encourage everyone to buy the first season of the series on DVD and rip the annotated episodes to obtain the video material. Our annotations can then be used with your video files. Please note that the exact content of a DVD can depend on the region. The position of episode six of Modern Family in particular can vary. We encourage everyone to check a few annotations for every episode to make sure no problems exist.
If it is not possible to buy the DVDs, however, please print, fill out and sign this form (in which you promise to use the video data only for research purposes). Send us an e-mail with a scan of the completed form in attachment.
The CNN and LSTM models used for online action detection on this dataset, can be found on GitHub.

The TVSeries dataset is used in our paper on Online Action Detection. If you use this dataset or the models, please refer to our ECCV paper (read it on arXiv):
De Geest, R., Gavves, E., Ghodrati, A., Li, Z., Snoek, C. & Tuytelaars, T. (2016). Online Action Detection. ECCV 2016.

Action classes

We defined 30 action classes (see Table 2). All episodes are manually annotated; in total, 6231 action instances are found. The start and end frames of all instances are known, not the spatial position. Actions can be overlapping in time.
Table 2: Action classes of the TVSeries Dataset and their number of occurences.
#instances#instances
Pick something up937Go up stairway119
Point557Throw something119
Drink440Get in/out of car112
Stand up411Hang up phone105
Run395Eat98
Sit down314Answer phone96
Read302Clap95
Smoke290Dress up95
Drive car248Undress95
Open door237Kiss79
Give something211Fall/trip77
Use computer169Wave71
Write149Pour62
Go down stairway124Punch53
Close door121Fire weapon50
Total6231

Metadata

Every action instance is annotated with some extra metadata. This metadata can be used to compare the performance of different methods on specific difficult cases. We provide the following metdata.