Online Action Detection

Roeland De Geest¹, Efstratios Gavves², Amir Ghodrati¹, Zhenyang Li², Cees Snoek², Tinne Tuytelaars¹

ECCV 2016

¹ PSI, ESAT, KU Leuven

² QUVA-Lab, University of Amsterdam

The goal of online action detection is to detect an action as it happens and ideally even before the action is fully completed. A decision is made early, without having seen a complete video (as is the case in traditional action detection). Being able to detect an action at the time of the occurence can be useful in many practical applications, e.g.,

a pro-active robot offering a helping hand,
a surveillance camera raising an alarm not just after the facts but well in time to allow for intervention,
a smart active camera system zooming in on the action scene and recording it from the optimal perspective,
an autonomous car stopping for a child chasing a ball.

We introduce the online action detection problem in our ECCV 2016 paper (read it on arXiv). Essentially, an online action detection method must answer the following question: based on all frames seen up to now, what action (if any) is happening in the current frame? Therefore, we use the per-frame average precision for evaluation. We collected the TVSeries dataset, a new dataset that can be used to evaluate online (as well as traditional) action detection methods. We evaluate three popular video interpretation methods on this dataset, both in an online and an offline action detection setting: Fisher vectors with SVM, a frame-based CNN, and an LSTM with the output of this CNN as input. None of the methods perform well, indicating that more research on this relevant, challenging problem is needed.

Citation:
De Geest, R., Gavves, E., Ghodrati, A., Li, Z., Snoek, C. & Tuytelaars, T. (2016). Online Action Detection. ECCV 2016.