Research on attacks which exploit video-based side-channels to decode text typed on a smartphone has traditionally assumed that the adversary is able to leverage some information from the screen display (say, a reflection of the screen or a low resolution video of the content typed on the screen). This paper introduces a new breed of side-channel attack on the PIN entry process on a smartphone which entirely relies on the spatio-temporal dynamics of the hands during typing to decode the typed text. Implemented on a dataset of 200 videos of the PIN entry process on an HTC One phone, we show, that the attack breaks an average of over 50% of the PINs on the first attempt and an average of over 85% of the PINs in ten attempts. Because the attack can be conducted in such a way not to raise suspicion (i.e., since the adversary does not have to direct the camera at the screen), we believe that it is very likely to be adopted by adversaries who seek to stealthily steal sensitive private information. As users conduct more and more of their computing transactions on mobile devices in the open, the paper calls for the community to take a closer look at the risks posed by the now ubiquitous camera-enabled devices. Copyright is held by the owner/author(s).