Temporally coherent interpretations for long videos using pattern theory

Fillipe Souza, Sudeep Sarkar, Anuj Srivastava, Jingyong Su

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Graph-theoretical methods have successfully provided semantic and structural interpretations of images and videos. A recent paper introduced a pattern-theoretic approach that allows construction of flexible graphs for representing interactions of actors with objects and inference is accomplished by an efficient annealing algorithm. Actions and objects are termed generators and their interactions are termed bonds; together they form high-probability configurations, or interpretations, of observed scenes. This work and other structural methods have generally been limited to analyzing short videos involving isolated actions. Here we provide an extension that uses additional temporal bonds across individual actions to enable semantic interpretations of longer videos. Longer temporal connections improve scene interpretations as they help discard (temporally) local solutions in favor of globally superior ones. Using this extension, we demonstrate improvements in understanding longer videos, compared to individual interpretations of non-overlapping time segments. We verified the success of our approach by generating interpretations for more than 700 video segments from the YouCook data set, with intricate videos that exhibit cluttered background, scenarios of occlusion, viewpoint variations and changing conditions of illumination. Interpretations for long video segments were able to yield performance increases of about 70% and, in addition, proved to be more robust to different severe scenarios of classification errors.

Original languageEnglish
Title of host publicationIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
PublisherIEEE Computer Society
Pages1229-1237
Number of pages9
ISBN (Electronic)9781467369640
DOIs
StatePublished - Oct 14 2015
EventIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 - Boston, United States
Duration: Jun 7 2015Jun 12 2015

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume07-12-June-2015
ISSN (Print)1063-6919

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
CountryUnited States
CityBoston
Period06/7/1506/12/15

    Fingerprint

Cite this

Souza, F., Sarkar, S., Srivastava, A., & Su, J. (2015). Temporally coherent interpretations for long videos using pattern theory. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 (pp. 1229-1237). [7298727] (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 07-12-June-2015). IEEE Computer Society. https://doi.org/10.1109/CVPR.2015.7298727