We present a novel framework, based on Germander's pattern theoretic concepts, for high-level interpretation of video activities. This framework allows us to elegantly integrate ontological constraints and machine learning classifiers in one formalism to construct high-level semantic interpretations that describe video activity. The unit of analysis is a generator that could represent either an ontological label as well as a group of features from a video. These generators are linked using bonds with different constraints. An interpretation of a video is a configuration of these connected generators, which results in a graph structure that is richer than conventional graphs used in computer vision. The quality of the interpretation is quantified by an energy function that is optimized using Markov Chain Monte Carlo based simulated annealing. We demonstrate the superiority of our approach over a purely machine learning based approach (SVM) using more than 650 video shots from the You Cook dataset. This dataset is very challenging in terms of complexity of background, presence of camera motion, object occlusion, clutter, and actor variability. We find significantly improved performance in nearly all cases. Our results show that the pattern theory inference process is able to construct the correct interpretation by leveraging the ontological constraints even when the machine learning classifier is poor and the most confident labels are wrong.