Smartwatch-Based Keystroke Inference Attacks and Context ...

Smartwatch-Based Keystroke Inference Attacks andContext-Aware Protection Mechanisms

Anindya MaitiX, Oscar ArmbrusterX, Murtuza JadliwalaX, and Jibo HeO

XElectrical Engineering and Computer Science DepartmentOPsychology Department

Wichita State University, USA{axmaiti, oxarmbruster, murtuza.jadliwala, jibo.he}@wichita.edu

ABSTRACTWearable devices, such as smartwatches, are furnished withstate-of-the-art sensors that enable a range of context-awareapplications. However, malicious applications can misusethese sensors, if access is left unaudited. In this paper, wedemonstrate how applications that have access to motion orinertial sensor data on a modern smartwatch can recovertext typed on an external QWERTY keyboard. Due tothe distinct nature of the perceptible motion sensor data,earlier research efforts on emanation based keystroke infer-ence attacks are not readily applicable in this scenario. Theproposed novel attack framework characterizes wrist move-ments (captured by the inertial sensors of the smartwatchworn on the wrist) observed during typing, based on therelative physical position of keys and the direction of tran-sition between pairs of keys. Eavesdropped keystroke char-acteristics are then matched to candidate words in a dictio-nary. Multiple evaluations show that our keystroke infer-ence framework has an alarmingly high classification accu-racy and word recovery rate. With the information recov-ered from the wrist movements perceptible by a smartwatch,we exemplify the risks associated with unaudited access toseemingly innocuous sensors (e.g., accelerometers and gyro-scopes) of wearable devices. As part of our efforts towardspreventing such side-channel attacks, we also develop andevaluate a novel context-aware protection framework whichcan be used to automatically disable (or downgrade) accessto motion sensors, whenever typing activity is detected.

CCS Concepts•Security and privacy → Privacy-preserving proto-cols; Side-channel analysis and countermeasures; •Human-centered computing → Ubiquitous and mobile com-puting;

KeywordsSmartwatch, keystroke, sensor, wearable, privacy.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

ASIA CCS ’16, May 30-June 03, 2016, Xi’an, China© 2016 ACM. ISBN 978-1-4503-4233-9/16/05. . . $15.00

DOI: http://dx.doi.org/10.1145/2897845.2897905

1. INTRODUCTIONTechnologies that fueled the rapid growth of modern elec-

tronic computers are also catalyzing development in wear-able computing. Most modern wearable devices, such assmartwatches, are capable of a range of context-aware ap-plications, including personal assistance, health and well-ness monitoring, personal safety, and corporate solutions,to name a few. Behind the scenes, a wide range of highlyprecise sensors provide the contextual information requiredby these applications to provide a seamless and personalizeduser experience.

Unfortunately, applications with malicious intents can alsocovertly use these sensors to collect private information,without user-consent. Many of the side-channel attacks im-plemented for smartphones can be readily applied to smart-watches. For example, a malicious application with appro-priate hardware support can stealthily capture photos orvideo of the user and their surroundings with camera[11],record ambient sounds with microphone [21], and track andpredict location activities using GPS [18]. As modern smart-phone operating systems recognized open access to thesesensors as apparent privacy risk, applications’ permission toaccess these sensors were made user-manageable. Addition-ally, use or access to certain sensors also include explicituser-notifications, for example, a notification icon appearson Android and iOS when the GPS sensor is being accessedby an application. Yet, malicious applications found novelside-channel attacks to infer private information through in-direct means. For example, gyroscope, a sensor usually usedto detect changes in device orientation, can potentially beused to detect and recover audio speeches [17]. Anotherexample of a distressing side-channel attack is inferring key-strokes using emanations captured by motion sensors suchas accelerometer and gyroscope [16, 19].

Although many of the side-channel attacks designed forsmartphones are not directly applicable in wearable devicessuch as smartwatches, we anticipate that innovative attackscan hugely benefit by the way these wearable devices areused. In practice, smartphone usage is highly intermittent(for example, it has been shown that on an average a smart-phone is used only 58 minutes per day [2]), and these de-vices spend a majority of their time in a constrained (e.g., inusers’ pockets) or in an activity-less (e.g., on a table) settingwhere most on-board sensors are partially (or completely)non-functional, thereby limiting the inferential capabilitiesof malicious applications that attempt to take advantage ofdata from these sensors. On the contrary, wearable device

795

http://dx.doi.org/10.1145/2897845.2897905

usage is much more persistent as they are always carried byusers on their body in the same natural position as theirtraditional counterparts and users are much more naturallyhabituated to these devices. Our hypothesis is that, as wear-able smart devices are persistently and uniquely used, sen-sors on them are able to capture a continuous stream of user-specific contextual data, access to which if not controlledappropriately, can be potentially exploited by malicious ap-plications to infer sensitive user information.

In this paper, we show that unaudited access to motionsensors featured on most smartwatches can inadvertentlylead to significant leakage of information relating to usersand their surrounding. We demonstrate that a malicious ap-plication, with access to motion sensor readings of a smart-watch, can decode the keystrokes made on a QWERTYkeyboard while wearing the smartwatch on one hand. Weachieve this based on the observed relative physical positionof keystrokes and direction of transition between pairs ofkeystrokes. We then recover the typed words by mappingthe captured ‘motion’ of each word to pre-formed motionprofiles of words in an English language dictionary. Due tothe distinctive nature of perceptible sensor data on smart-watches, straightforward adaptation of earlier side-channelkeystrokes attacks based on emanation of electromagnetic,acoustic or vibration pulses generated by a keystroke, is notbefitting. A comprehensive empirical evaluation of our key-stroke inference framework show significantly high word re-covery rates.

As evident from our experimental results, the threat toprivacy posed by side-channel attacks using wearable de-vices is substantial. However, there have been very limitedefforts from the research community to effectively defendagainst such side-channel attacks in a user-friendly fashion.We propose and implement a new context-aware protectionframework which can automatically activate various protec-tion mechanisms whenever typing activity is detected. Wealso empirically evaluate the protection framework in real-life usage scenario.

2. RELATED WORKEmanation based side-channel inference attacks date back

to the World War II era [12]. The primary types of ema-nations include electromagnetic signals, sounds, and vibra-tions. Previous studies demonstrated the use of electro-magnetic emanation to eavesdrop on contents displayed ona CRT or LCD screen [24, 14] from a distance and withopaque obstacles in between. Similar attacks using electro-magnetic emanations have also been shown to work againstCPU chips [3], smart cards [20], data carrying cables [22],and keyboards (wired or wireless) [25]. Optical emanation,contained in the band of electromagnetic spectrum percep-tible to human eyes, present a different form of leakage fordisplay devices. The light released from display devices mayreflect off various surfaces in front of the screen, and reachan eavesdropper. Successful reconstruction of the displayedinformation has been demonstrated based on reflection suchas from walls [13], shiny objects [7], and even from viewer’seyes [5]. While electromagnetic emanation based attacks arecertainly effective, the need of specialized equipment and it’sconcealed placement near to the target poses difficulty. Sim-ilarly for side-channel attacks based on optical emanations,the eavesdropping equipment must be placed in line of sightof the target.

Side-channel attacks based on acoustic or sound emana-tions are much more feasible because of the popularity ofpersonal devices featuring microphones. Microphones areinexpensive, and can be easily concealed because of theircompact form factor. Furthermore, if a target’s microphoneenabled device (such as smartphones, tablets, etc.) is hi-jacked, it can act as a disguised eavesdropping equipment.As much as 90% of English text printed by a dot-matrixprinter can be successfully recovered, by learning the acous-tic emanations released by the printer [6]. The other majoruse of acoustic emanation has been in keystroke inferenceattacks, which targets to recover key presses on a nearbycomputer keyboard [4, 9]. Similar keystroke inference at-tacks can be carried out using surface vibration emanationgenerated during keystrokes [16, 8]. Vibrations of nearbysurfaces caused by human voice can also be recorded, andused to decode speeches [17]. While systems to record vi-brations may be difficult to conceal, Marquardt et al. [16]proposed the use of a smartphone’s accelerometer to recordvibrations near keyboards. If an adversary is able to infecttheir target’s smartphone with a malicious application whichcan record and transmit sensor data stealthily, it can serveas a very effective eavesdropping tool.

However, a critical requirement of learning based side-channel attacks using electromagnetic, acoustic, or vibrationemanation, is that the target and eavesdropping equipmentmust not be disturbed. Change in either’s position or orien-tation will render the training data futile, thereby makingrecovery of target information impossible. This also meansthat training must be performed in the same setting as theattack, which may not always be feasible. For example, incase of [16], if the target person puts his/her smartphoneone day on the left side of the keyboard and another day onthe right side, the vibrations captured by the accelerome-ter will be significantly different, resulting in failed recoveryof typed text. Our attack setting, which uses motion datafrom a wearable device to infer keystrokes, is largely unaf-fected due to similar constraints as most people wear and usethese devices in a very standard fashion (for example, smart-watches are almost always worn on the left wrist by mostpeople). Moreover, our attack mechanism and wrist motioncharacterization framework is very general and can be easilyextended to work in scenarios comprising of non-traditionalusage of these wearable devices (for example, users wearingthe watch on the right hand instead).

During the final phase of completing this work, we cameacross recently published works which demonstrate the abil-ity to infer keystrokes using smartwatch. Maiti et al. [15]used machine learning to train classifiers based on the slightdifferences in wrist movements observed while tapping nu-meric keys on a handheld smartphone keypad, depending onthe location of the key on the screen. The trained classifiersare then used on test data to perform multiclass classifica-tion between the ten keys. Similar to our work, Wang et al.[26] demonstrate the feasibility of keystroke inference attackusing a smartwatch, on a QWERTY keyboard. However,their attack framework is very different from ours. We alsoconduct a comprehensive evaluation of our attack frameworkand preliminary results indicate that our approach leads tobetter inference accuracy compared to [26]. However, [26]has a different experimental setup, due to which we are un-able to make a comprehensive comparison like we do with[16] and [9].

796

Interestingly, none of the recent works on side-channelkeystroke inference attacks propose or implement a practi-cal protection mechanism. Some of the previous work usingsmartphone sensors as side-channels, briefly suggest oper-ating system developers to provide users with fine-grainedcontrol over application’s permissions to every sensors [10].But without knowing which application is malicious, theuser may have to toggle sensor access back and forth forall the installed applications. Other research efforts vaguelysuggest to restrict the precision at which applications are al-lowed to access the sensors [19, 17, 15]. However, regulatingsensor precision will result in poor application performance,for example, gaming applications will have slow controls andresponse, mapping applications will be delayed/inaccurate,etc. Moreover, some sensors (such as camera and micro-phone) will be rendered unusable at very low sampling rates.In the more recent work using smartwatches as a side-channel[26], Wang et al. completely overlooked the necessity ofhaving protection mechanisms. In this paper, we not onlydemonstrate the feasibility of keystroke inference attacksusing smartwatches as a side-channel, but we also design,implement, and evaluate a new context-aware protectionframework to defend against such attacks.

3. ATTACK DESCRIPTIONIn this paper, we demonstrate the feasibility of a key-

stroke inference attack against a user typing on an externalQWERTY keyboard by using smartwatch motion sensors.Because of limitations faced by emanation-based keystrokeinference attacks, and multiple technical challenges in imple-menting them on a smartwatch, we pursue a slightly differ-ent approach for our attack where we focus on capturing andusing keystrokes related wrist motion or movement charac-teristics.

We observed that the wrist movements made while typinga fixed sequence of letters on a keyboard are highly similarand consistent across multiple trials involving a single typer.This gave us the intuition that an adversary can create adictionary of commonly used words (words are nothing butfixed sequence of letters), along with their correspondingwrist movement patterns. During the attack, the adver-sary can simply match the eavesdropped wrist movementpattern to the closest matching pattern in the dictionary.Intuitively, the recovery can be highly accurate if the dictio-nary is carefully created and comprises of all words that thetarget is expected to type. However, the recovery rate alsodepends on how the wrist movement patterns are character-ized (which we will explain in Section 4.1) and the granular-ity of the captured wrist movement data (which is generallylimited by the eavesdropping sensor’s maximum samplingfrequency).

For carrying out the proposed inference attack, an adver-sary requires an eavesdropping device that is capable of con-tinuously recording wrist movements, while avoiding detec-tion. A modern commercial-off-the-shelf smartwatch, whichis generally equipped with a range of sensors (especially mo-tion sensors), can easily serve as such an eavesdropping de-vice. Other forms of wrist wearable devices such as activ-ity trackers and fitness bands, are typically also equippedwith motion or inertial sensors, and can also be used as aneavesdropping device for the proposed attack. In this work,without loss of generality, let’s assume that the adversaryexploits the smartwatch as an eavesdropping device. How-

ever, a bigger challenge is how does an adversary gain accessto the motion data captured on the smartwatch. This can beachieved by an adversary installing a malicious applicationthat has access to the motion sensors on the target’s smart-watch such that the application is able to stealthily captureand transfer the captured motion data to the adversary.

This is feasible because, even though an application’s ac-cess to the some sensors (e.g., GPS and camera) is generallyuser-managed or restricted on most modern mobile operat-ing systems such as Android and iOS, access to motion sen-sors (such as, accelerometer and gyroscope) remains highlyunregulated. An adversary can easily install the maliciousapplication on the target smartwatch by various means, forexample, by gaining physical access to the device or throughsocial engineering (e.g. masquerading as a legitimate appli-cation, pretexting, baiting, phishing etc.). The maliciousapplication can then stealthily collect and transfer motiondata by masquerading as, or piggy backing on, useful ap-plication data and network traffic. In other words, the in-fected smartwatch now acts as an eavesdropping device thatthe targets’ themselves place on their wrist, and unsuspect-ingly have it on their wrist while typing on a keyboard, asdepicted in Figure 1.

The malicious adversarial application on the smartwatchrecords the linear accelerometer data (linear accelerometermeasures the acceleration experience by the device, exclud-ing the force of gravity) and microphone data. In the pro-posed attack, the acoustic data recorded by the microphoneis not used for keystroke inference, but rather just to iden-tify keystroke events (as explained in detail in Section 4.2.2).Due to the impracticality of an on-screen keyboard on thesmall smartwatch screens, an adversarial smartwatch appli-cation can seek access to the microphone in order to supportvoice commands or dictation, which is common. Alterna-tively, keystroke events can also be recognized by solely us-ing the motion sensors, as accomplished in Marquardt et al.[16]. As mentioned earlier, the recorded sensor data is thentransmitted by the malicious application to the adversarydirectly over the Internet by masquerading as useful com-munication or by piggyback on communications from otherapplications. In an effort to save battery power (necessaryfor avoiding detection), the recording and communicationprocess may be initiated remotely by the adversary or basedon periodic activity tracking.

Figure 1: An exemplary setup where a person istyping on a QWERTY keyboard, while wearing aSamsumg Gear Live smartwatch on left hand. Asimilar setup is used in our experiments.

797

4. THE ATTACK FRAMEWORKIn this section, we present our model for identifying key-

press events from raw motion sensor data. We then discussour attack framework, and an experimental setup for evalu-ating the framework.

4.1 Modeling Key Press EventsWith the maximum supported linear accelerometer sam-

pling rate (∼50-70Hz) being much lower than that of smart-phones (∼200-300Hz), the difficulty in recognizing individ-ual keys is greatly increased when using a smartwatch. Toovercome this shortcoming, we attempt to identify pairs ofkey presses or keystrokes by learning the relationship be-tween them. While typing a word, there will be one keypress for each character or letter in the word. Let Ki,Kj betwo consecutive key press events, signifying two consecutivecharacters or letters of a word. We characterize the relation,rel(Ki,Kj), between any two consecutive key press eventsKi,Kj as follows:

• Horizontal Position: The location loc(Ki) of each key-stroke event relative to a ‘central-line’ dividing the key-board into left (L) and right (R) halves. The rationalebehind this classification is that the wrist movement willbe more pronounced for typing a key on the same side asthe watch-wearing hand.

• Transitional Direction Between Consecutive KeyPresses: The direction dir(Ki,Kj) represents the direc-tion of wrist movement between consecutive key pressesKi and Kj on watch-wearing side of the keyboard. Thepossible directions (or values for dir(Ki,Kj)) are N, E, S,and W, representing geographical north, east, south, andwest, movement respectively. An additional classificationis O, if Ki = Kj . The rationale behind this classification isthat the direction of transition between a pair of keystrokeswill be reflected in the wrist movement.

With the above classification, the relationship betweentwo consecutive key press events is defined as follows:

• When either Ki, Kj , or both, occur on the non-watchwearing side of the keyboard, rel(Ki,Kj) = loc(Ki) ||X || loc(Kj), where ‘X ′ implies that direction cannot bedetermined. The intuition behind such an assignment isthat it is not possible to determine the direction of tran-sition when at least one of the pressed key is not on thewatch-wearing side of the keyboard.

• When both Ki and Kj occur on the watch-wearing sideof the keyboard, rel(Ki,Kj) = loc(Ki) || dir(Ki,Kj) ||loc(Kj).

A word-profile for a word can then be derived by concate-nating the relation between every consecutive pair of lettersin the word. For example, the word “boards” can be brokendown in to five pairs of keystrokes {bo, oa, ar, rd, ds}, i.e.,word-profile for the word “boards” is rel(bo).rel(oa).rel(ar).rel(rd).rel(ds). With the setup for a QWERTY keyboard,as shown in Figure 2, and the entire L/R and N/E/S/W/Oclassification listed in Table 1, the word-profile of “boards”will be:

RXR . RXL . LEL . LSL . LWL

Figure 2: The keyboard is divided in to left (L) andright (R) halves, shown by the solid red line. Exam-ples of N, E, S, and W classification are also shown.Each direction has 90° field of view from center ofthe key. Keys that fall on the boundary are catego-rized in the direction where greater area of the keylies.

Table 1: L/R classification of individual keys andN/E/S/W/O classification of character-pairs, as-suming smartwatch is worn on left hand.

L q, w, e, r, t, a, s, d, f, g, z, x, c, v

R y, u, i, o, p, h, j, k, l, b, n, m

N aq, aw, sw, se, de, dr, fr, ft, gt, zq, zw, ze, za, zs,xw, xe, xr, xs, xd, ce, cr, ct, cd, cf, vr, vt, vf, vg

E qw, qe, qr, qt, qs, qd, qf, qg, qx, qc, qv, we, wr,wt, wd, wf, wg, wc, wv, er, et, ef, eg, ev, rt, rg, ae,ar, at, as, ad, af, ag, ax, ac, av, sr, st, sd, sf, sg,sc, sv, dt, df, dg, dv, fg, zr, zt, zd, zf, zg, zx, zc,zv, xt, xf, xg, xc, xv, cg, cv

S qa, qz, wa, ws, wz, wx, es, ed, ez, ex, ec, rd, rf, rx,rc, rv, tf, tg, tc, tv, az, sz, sx, dx, dc, fc, fv, gv

W wq, eq, ew, ea, rq, rw, re, ra, rs, rz, tq, tw, te, tr,ta, ts, td, tz, tx, sq, sa, dq, dw, da, ds, dz, fq, fw,fe, fa, fs, fd, fz, fx, gq, gw, ge, gr, ga, gs, gd, gf,gz, gx, gc, xq, xa, xz, cq, cw, ca, cs, cz, cx, vq, vw,ve, va, vs, vd, vz, vx, vc

O qq, ww, ee, rr, tt, aa, ss, dd, ff, gg, zz, xx, cc, vv

The main idea behind our attack is that the adversarywill have a pre-processed dictionary of well-known (or tar-geted) words and their corresponding word-profiles (formedas discussed before). These word-profiles are used in dis-tinguishing between candidate words from the dictionary.Given the motion data, the adversary will attempt to inferword-profiles from the motion data and then use the pre-processed dictionary to determine the typed word by com-paring the inferred word-profile to the word-profile in thedictionary. However if the dictionary is large, more thanone word may have the same word-profile. Such collisionsmay result in incorrect predictions, and thus, reduce the ac-curacy of the inference attack by the adversary. In suchcases, a frequency-based selection (as discussed in Section5) could yield better word recovery results. Similarly, defin-ing word-profiles by using additional fine-grained directionaldata (e.g., NE, SW, etc.) could reduce the number of col-lisions and improve inference accuracy, however it will alsoincrease the attack execution time for the adversary.

4.2 Keystroke Inference AttackBroadly, our proposed inference attack comprises of a

learning phase (Figure 3) that is followed by the attack

798

phase (Figure 4). However, before initiating the learningphase, the adversary must define the classification param-eters. This includes deciding the keys in L and R halves,determining the hand on which the smartwatch is worn, andaccordingly form all perceptible transitions. In our experi-ments, we suppose that the target is wearing the smartwatchon his/her left hand and the keyboard is divided in L andR halves as shown in Figure 2. Accordingly, all 196 possibletransitions with the watch-wearing hand are listed in Table1. However, the proposed attack could easily be modified(with little effort) for the watch worn on the right hand orfor other forms of L/R division of the keyboard. After theseparameters are determined, the learning phase can begin.

4.2.1 Learning PhaseThe purpose of this phase is to construct trained clas-

sification and prediction models for use during the attackphase. Training of these models comprises of the followingfour steps: (i) data collection, (ii) feature extraction, (iii)word labeling, and (iv) supervised learning. To ensure uni-formity in the learning models, the training data is chosensuch that it has equally distributed features. This can beachieved by using a large set of randomly generated words,uniformly covering all keys and apprehensible transitions.

Data Collection: There are two types of data recordedby our custom Android Wear attack (or data collection) ap-plication. First, is the motion data just before and immedi-ately after a keystroke. Second, is the entire transition databetween two keystrokes that occur on the watch-wearing sideof the keyboard. Both types of recorded data are the lin-ear accelerations experienced by the smartwatch, as sensedby it’s linear accelerometer sensor. The sampled linear ac-celerometer readings are composed of instantaneous threedimensional linear acceleration along the X, Y, and Z axes.One of the authors (pretending to be the adversary) typed aset of 1000 random English words which uniformly coveredall 26 keys and 196 transitions, without any fixed ordering ortiming. Note that the number of possible transitions will be144 if the target wears the smartwatch on the right hand andthe keyboard is divided into the same L and R halves. Thedata collection application also clocks and tags the groundtruth of the typed keys, which helps simplify the feature ex-traction and labeling process later, which in turn, ensureserror-free training.

Feature Extraction: Feature extraction aids in dimen-sionality reduction by eliminating redundant measurements.For (L/R) keystrokes we compute a comprehensive set of 24type of features such as mean, median, variance, standarddeviation, skewness (measure of any asymmetry) and kurto-sis (to measure any peakedness). We use multiple inter andintra-axis time domain features to capture the correlationbetween movement on the three axis, and frequency domainfeatures to identify the different rebounding (or oscillatory)motion of the wrist. However, in case of (N/E/S/W/O) la-beling, we observed that the transition period was varyingwidely based on typing speed and word composition. As aresult, it is impossible to represent the entire transition in afixed length time-domain feature vector (as used in featurevectors with (L/R) labels). As a solution, we use frequency-domain features such as Fast Fourier Transformation (FFT)of the transition data.

Labeling: Each training word is broken down into itsconstituent characters and character-pairs. As a result, a

word of length n letters would be broken into n charac-ters and n − 1 character-pairs. Feature vector of each key-stroke is labeled (L/R) using the ground truth charactersrecorded during data collection. Feature vectors of direc-tions are labeled (N/E/S/W/O) by calculating the direc-tion between character-pairs obtained from the same groundtruth. An additional processing is performed to select a setof character-pairs with even distribution of L and R and N,E, S, W and O labels. Note that the number of N, E, S, Wand O labels will be approximately one-fourth of the num-ber of L L, L R, R L and R R pairs because thedirection is determined only in case of L L transition (or R

R if the target wears the smartwatch on right hand). ForL R, R L and R R character-pairs, the direction fortransition cannot be determined (which is denoted by X inthe word-profile), and thus, they are not used in the trainingphase.

Supervised Learning: We created two separate trainingmodels that will be used during the attack phase to classifykeystrokes and keystroke-pairs. The two trained models areL-R and N-E-S-W-O neural networks for classifying (L/R)and (N/E/S/W/O) feature vectors, respectively. Because ofthe complex interactions possible between consecutive key-strokes, we train our classifiers using neural networks. Neu-ral networks are specifically useful in discovering these com-plex interactions between the corresponding feature vectors,and improving the classification model based on it. OurL-R neural network uses a back-propagation algorithm forlearning at a rate of 0.01 and with a momentum of 0.99.This neural network has 30 hidden layers and training wasperformed for 2000 epochs. Our N-E-S-W-O neural networkalso uses a back-propagation algorithm for learning at a rateof 0.001 and with a momentum of 0.99. This neural net-work has 100 hidden layers and the training was performedfor 1000 epochs. These parameters for our neural networkbased classifiers were chosen heuristically. Training of theseneural networks completes the learning phase.

4.2.2 Attack PhaseThe attack phase follows a similar procedure as the learn-

ing phase, with the exception that the goal here is to recovertest words using the trained neural networks-based classifi-cation models from the learning phase. In order to do so,the adversary must first create a dictionary of words, andtheir corresponding word-profiles, that the target is mostlikely of typing. The dictionary size can vary from a fewwords to thousands of words, depending on the target andhis/her context. If the adversary is unaware of the target’scontext, he could also create a large dictionary of most pop-ular or all words in the English language. The dictionarycreation involves a preprocessing step to obtain equivalentword-profiles of each word in the L/R and X/N/E/S/W/Orepresentation (as discussed before). The attack phase isthen executed in sequential steps of: (i) data collection, (ii)feature extraction, (iii) keystroke classification, and (iv) wordmatching.

Data Collection: The same properties and operationsfrom the data collection operation of the learning phase alsoapplies to the data collection during the attack phase. Theonly exception is that the malicious Android Wear applica-tion does not have the ability to clock and tag ground truthcharacters. As the smartwatch can only detect motion causeby one (watch-wearing) hand, a significant challenge of the

799

Typing

Raw Linear Accelerometer

Data

Feature Extraction

Module

Position Features

L/R Labeler

Labeled L/R Features

L-R Neural Network

N/E/S/W/O Labeler

Labeled N/E/S/W/O

Features

N-E-S-W-O Neural Network

Transition Features

Random Training Words

Supervised Learning Module

Figure 3: Learning Phase: A high level overview of the data processing architecture used to train the neuralnetworks.

Typing

Raw Linear Accelerometer

Data

Feature Extraction

Module

Position Features

Transition Features

Contextual Dictionary

Classification and Word Matching Module

Acoustic Keystroke Detection

Keystroke Timestamps

Feature Matching

Predicted Words

N-E-S-W-O Neural Network

Classifier

L-R Neural Network Classifier

Figure 4: Attack Phase: A high level overview of the data processing architecture used to analyze keyboardinput using the trained neural networks.

proposed inference attack is due to the inability to detectkeystrokes made by the non-watch-wearing hand. However,our attack framework requires to know the number of typedcharacters. To solve this problem, here we assume that theadversary can employ an alternate source or sensor on thesmartwatch that can detect keystroke events typed by eitherhand. The intention for using such an auxiliary sensor is notto classify the keystrokes using it, but to clock the time whena keystroke occurs in the stream of raw linear accelerationdata. A microphone can perfectly serve this purpose. Eventhough a smartwatch’s microphone is not effective for recov-ering keystrokes (due to the aforementioned reasons), it cancertainly be used to detect the occurrence of a keystroke it-self, made by either hand. This is where the naturally closepositioning of the smartwatch near the keyboard is benefi-cial. Thus, the malicious application uses the microphoneto detect keystroke acoustics, and in the case a keystrokeevent is detected from the acoustic signal, it logs a key-stroke event in the linear accelerometer data stream. Spacekey press events are also clocked or logged because they actas word separators. Fortunately, as we empirically deter-mined, space keys are easy to identify in an audio recordingbecause of the key’s distinctive sound and frequency of use.

Feature Extraction: During the attack phase, the samefeatures as in the learning phase are extracted from the rawlinear acceleration data recorded by the malicious applica-tion. The feature vectors are then used to create two sets ofdata, one for classifying L vs. R, and one for classifying Nvs. E vs. S vs. W vs. O.

Keystroke Classification: The adversary initiates theclassification process after extracting all feature vectors. Thetrained L-R neural network is used to predict the (L/R) labelfor each individual keystroke. Only when a L L key-pairis detected in the data stream by the L-R neural network,the N-E-S-W-O neural network classification is conducted topredict the transition direction label (N/E/S/W/O). Oth-erwise, the transition direction is labeled as X. Using thepredicted labels (and the recognized space keys as describedearlier), a word-profile is constructed for each word in thekeystroke stream. All the constructed word-profiles are thenpassed as input to a word matching algorithm describednext.

Word Matching: Word matching is the final step of theattack phase, where each predicted word-profile of length mis matched with all words of length m + 1 in the prepro-cessed dictionary by the adversary. For each matched word

800

in the dictionary, a similarity score is computed based onthe number of matching labels between the predicted word-profile and the corresponding word-profile in the dictionary(see details in Algorithm 1). The dictionary word with thehighest similarity score is then output as the word corre-sponding to the predicted word-profile. For some evaluationexperiments, we also use a ‘similarity list’ made of dictionarywords with descending order of similarity scores.

Algorithm 1 Word Matching Algorithm

1: similarityScore = 02: for all words of len(m) ∈ dic do3: for pair = 1 to m− 1 do4: for label = 1 to 3 do5: if dic.word.profile[pair][label] =

predicted.profile[pair][label] then6: similarityScore++7: end if8: end for9: end for

10: end for11: return similarityScore

4.3 Experimental SetupIn our experimental evaluation of the proposed inference

attack and keystroke characterization framework, we use asetup similar to the one shown in Figure 1. We recruit 25participants1 who wear the smartwatch on their left wristand type test words on an external QWERTY keyboard. Alldata recorded by the smartwatch was transferred to a remoteserver. Both the training and attack phases are executed onthis remote server which is assumed to possess enough com-putational and storage resources in order to carry out theseoperations. The specifications of important hardware andsoftware components used in our experiments are outlinedbelow:

1. Smartwatch and sensor hardware: We used the Sam-sung Gear Live smartwatch running Android Wear build1.1.1.1944630. The Gear Live is equipped with an In-venSense ICS-43430 microphone and an InvenSense MP-92M 9-axis Gyro + Accelerometer + Compass sensor.The maximum average linear accelerometer sampling rateachieved in our experiments was 50 Hz. Our data collec-tion application can be readily used on any Android Wearsmartwatch, which makes the attack framework compat-ible with a diverse set of smartwatches.

2. Keyboard hardware: We chose to use the Anker A7726121bluetooth keyboard because of its generic design. Thebluetooth connectivity aided in accurate labeling of sen-sor data, by allowing us to aggregate recorded sensor dataand corresponding typed character on the smartwatch invery close to real time.

3. Signal processing tool: Most of the features are calculatedusing MatLab 2015a libraries.

4. Supervised machine learning tool: We used PyBrain v0.31to train and test the neural networks in our framework.

1Our experiments have been approved by Wichita State Uni-versity’s Institutional Review Board (IRB).

PyBrain is an open-source modular machine learning li-brary for Python, supporting easy integration with un-derlying environment.

5. EVALUATIONWe first perform two preliminary experiments (involving

only one participant) in order to evaluate (i) the base accu-racy of the L-R and N-S-E-W-O classifiers by analyzing a setof test sentences from the set of Harvard sentences [1] and(ii) the word recovery accuracy of the proposed inference at-tack strategy by using a dictionary of ten Harvard sentencesand attempting to recover each of the same ten sentences astest data. We choose Harvard sentences because they arephonetically-balanced. In the two preliminary experimentswe make the assumption of ‘perfect typing’, i.e. the par-ticipant follows our L/R separation. After the preliminaryexperiments, we conduct more realistic experiments involv-ing all 25 participants, real-life sentences, larger dictionaries,and without the assumption of perfect typing.

5.1 Feature AccuracyIn the first experiment, we examine the base accuracy of

both L-R and N-S-E-W-O classifiers in correctly distinguish-ing between L/R region for individual letters and N/S/E/W/O transition between pairs of letters. We evaluate ourtrained classifiers using all the ten sentences in List 6 ofHarvard sentences. Interestingly, without any typing errors,the L-R classifier was able to correctly identify 100% of theindividual key press events as left or right. However, theN-E-S-W-O classifier had two mis-classifications, resultingin 95% accuracy.

5.2 Basic Text RecoveryOur next experiment examines the percentage of text (in

terms of words) correctly matched by the word matcher.In this preliminary experimental results, we observed thatthe overall percentage of words correctly matched notice-ably dropped due to mismatched two and three letter wordsin the analyzed text. The smaller number of features inthese words results in several of these ‘small’ words havingthe same word-profile, thus causing more collisions duringmatching. We also observed that most of these words aregenerally articles and conjunctions (e.g. an, the, and, or),which can be easily interpolated by analyzing the languagesemantics of the recovered text. As a result, we opted toconsider only ‘long’ words of four letters or more in all fi-nal percentages of recovered words in our remaining experi-ments. The ‘short’ words are instead denoted with asterisks(“*”) in the recovered sentences.

In this experiment, we used the same ten sentences inList 6 of Harvard sentences, as from the first experiment.Among the 48 words of length four or more, only threewere erroneous (93.75% successful recovery). Out of the

Typed Text: The show was a flop from the very start.

Recovered: *** sums *** * flop from *** very start.

Colliding Word-Profiles:

Show: LXR . RXR . RXL , Sums: LXR . RXR . RXL

Figure 5: Sentence 4 from List 6 of Harvard Sen-tences. The words ‘show’ and ‘sums’ have the sameword-profile resulting in a collision in the dictionary.

801

0

10

20

30

40

50

60

70

80

90

P7

P5

P1

1

P2

4

P2

0

P1

6

P1

9

P1

8

P9 A P3

P1

7

P2

3

P8

P4

P2

5

P1

4

P1

3

P1

0

P1

P2

P6

P1

5

P2

1

P2

2

P1

2

Wo

rd R

eco

very

Pe

rce

nta

ge (

Top

1)

Increasing Order of Time Taken by Participant to Type All Words

Among All 40 Words Typed Among 27 Words in Dictionary

Participants who Typed Faster than Adversary

Participants who Typed Slower than Adversary

Figure 6: Contextual Dictionary: Percentage ofwords recovered per participant, presented in de-scending order of typing speed of the participants.

three, two had incorrect N/E/S/W/O classification, whilethe other was due to collisions in the word-profiles. Fig-ure 5 shows the sentence where the collision occurred. Theproblem of collision will increase with increasing size of thedictionary. However, if we also take in to account secondand third ranked similar word-profiles during word match-ing, this problem can be moderated. Errors in word recovery(especially, due to collisions) can be further diminished byanalyzing language semantics, and then selecting the word(from the multiple colliding words in the dictionary) that issemantically best fit.

5.3 Contextual DictionaryThis experiment evaluates how our attack performs when

the adversary has some knowledge about what their targetsare typing. All the 25 participants typed a paragraph of 40words (of length four or more) that appear in a NationalPublic Radio (NPR) news article on Greece debt crisis, andthis experiment simulates eavesdropping on a reporter typ-ing the NPR news article. The dictionary is formed withwords that appear in six other news articles related to Greecedebt crisis, that were published a week before the target arti-cle. The dictionary is also sorted based on frequency of wordappearances in the six chosen news articles, which improvesour chances of successfully solving a word-profile collision.Figure 6 shows the percentage of words recovered per par-ticipant. As one should expect in a real-life attack, out ofthe 40 words in the target paragraph, only 27 were presentin the contextual dictionary. Even so, our framework wasable to recover as many as 21 words (for 3 participants), bymatching with just the first ranked word in the sorted list ofsimilarity scores (Figure 6). In other words, 21 words wereuniquely identified without any ambiguity, for the 3 partic-ipants. On the lower end, only 4 words were recovered for3 participants, but the recovery can be improved by con-sidering words with lower rank in the sorted similarity list.The mean word recovery using only the first ranked wordwas 31.2% (or 46.2% if we consider only the words presentin the dictionary).

5.4 Typing Behavior and SpeedDuring data collection, we observed that in many instances

participants did not follow our assumed layout. Some of theparticipants frequently used their left hand to press a keyon the right side of the keyboard, and vice versa. Upon fur-

0

10

20

30

40

50

60

70

80

90

100

Top 10 Top 25 Top 50 Top 100 Top 200

Per

cen

tage

Wo

rd R

eco

very

Our Attack Marquardt et al. Berger et al.

Figure 7: A comparison of accuracy of our attackwith Marquardt et al. [16] and Berger et al. [9].Note that in spite of not having wrist movementinformation available from the non-watch-wearinghand, our results are roughly comparable for a verylarge (60,000 words) dictionary.

ther investigation we also found that participant who typedslower, were less likely to follow the left and right division ofthe keyboard. This phenomenon explains why participantswho took longer to type all the 40 words saw lesser wordrecovery rate in Section 5.3 experiments. Figure 6 showsthe time taken by the adversary (whose typing was used asthe training data) to type the 40 words as A on the hor-izontal axis. Participants on the right of A typed slowerthan the adversary, and we can see a trend that the recov-ery rate drops with slower typing. Interestingly, we see asimilar trend on the left of A as well, indicating that recov-ery rate drops with faster typing. Our speculation is thatdue to fast typing there may occur overlapping feature re-gions leading to poorly performing L/R classification, andincorrect L/R classification can significantly affect recoveryof words. Combining both the trends we arrive at a con-clusion that participants who typed at a similar speed asthe adversary were more vulnerable to the attack. For anadversary, the take-home message from this conclusion isthat the attack framework can be optimized by training itwith a typing speed and style expected from the potentialvictim(s).

5.5 Comparison to Previous WorkFrom the above experiments, we saw that relying on ex-

act match with first ranked words may not always resultin the best inference accuracy. As pointed out by earlieremanation based keystroke inference attacks [16, 9], moreintelligent adversaries may be able to form target sentenceswith lower ranked words from the sorted similarity list. So,we re-create the experiments conducted by Marquardt et al.[16] and Berger et al. [9] in order to be able to compareour attack framework directly with theirs. We use a simi-lar sized English dictionary of 60,000 words (of length 4 ormore), sorted based on frequency of usage in English liter-ature. We reuse 38 of the 40 words typed by participantsin Section 5.3 experiment, while remaining 2 (first and lastname of former Greek finance minister) are not contained inthe 60,000 word dictionary. Figure 7 shows the comparison.

Our attack framework demonstrates comparable accura-cies to that of Marquardt et al. and Berger et al. It wasable to correctly map test words to the top 10 words in the

802

sorted similarity list 50.5% of the time, which happen to besignificantly higher than the earlier works using smartphonesensors. The word recovery steadily improves as we increasethe size of selection from the sorted similarity list, but ourattack trails behind the other two in case of very large selec-tions. Note that the complexity of forming sentences withambiguously recovered words grow exponentially with theselection size. Therefore, achieving a better recovery ratewith just top 10 words is more significant than having a bet-ter recovery rate using top 500 words. It is also importantto remember the distinct challenge faced by our techniquewhere no wrist movement information is available from thenon-watch-wearing hand.

We are unable to compare equitably with Wang et al.[26] because of their different experimental setup. However,using a smaller dictionary of only 5,000 words, they wereable to narrow down a typed word to 24 possibilities with a50% chance. In contrast, we use a much larger dictionary of60,000 words, and our attack is still able to narrow down atyped word to only 25 possibilities with about 52.5% chance.

6. LIMITATIONSOur proposed movement-based keystroke inference attack

using smartwatches circumvents some of the limitations ofemanation-based attacks, but it faces new challenges. Inthis section, we discuss some of them.

• Ambient Wrist Movement: In case the target par-ticipates in some other activity (for example, having aperiodic sip of drink) in between typing, the introducednoise can lead to incorrectly predicted words. However,since each word is treated separately, the error will notpropagate.

• Left and Right Handedness: Although the same at-tack framework is applicable independent of the hand onwhich the smartwatch is worn, classifiers trained usingdata with the smartwatch worn on the left hand cannotbe used to predict words typed while wearing the smart-watch on the other hand, and vice-versa.

• Inferring Non-Dictionary Text: Our attack performswell for dictionary words, but is incapable of recoveringnumeric keys and special characters. As a result, if the ad-versary is interested in learning data with numbers and/orspecial characters (such a credit card numbers, strongpasswords, etc.), the presented framework and attack willnot be directly applicable. However, wrist movements canstill be useful in determining approximate position of keyspressed, which may significantly reduce the search space.

7. SMART MITIGATIONOur proposed attack demonstrates the need for reforms on

how sensors on smartwatches, and other wearable devices,are accessed by applications. Even innocuous sensors can beused as side-channels to indirectly infer private information.However, there is no straightforward remedy to such privacythreats. In this section, we present a smart countermeasureto prevent such attacks in future.

The simplest way to protect against the presented attackwould be to remove the smartwatch from wrist while typing.But repetitive removal of the watch (and remembering whento remove) can become a burden for the user, as a result of

MSAC

Sensors

Linear Accelerometer

Accelerometer Magnetometer Pedometer

rTAD

Ener

gy

Ste

p C

ou

nt

Mag

net

ic F

ield

C

han

ge

Dir

ect

ion

of

Gra

vity

Turn

aro

un

ds

Other Motion Sensors

Yes No

User Typing?

Untrusted 3rd Party Applications

Figure 8: The protection framework against key-stroke inference attacks. Third party applicationsget unrestricted access to motion sensors only whenrTAD reports that the user is not typing at the mo-ment.

which, the user may choose to ignore the threat altogether.To draw a favorable balance between utility, usability andprivacy while using wearable devices, we need smarter sen-sor access controls. We feel that sensor access controls needto be context-aware in order to automatically manage anapplication’s sensor permissions, without having the user tomanually change these settings repetitively. As part of ourefforts to prevent smartwatch based side-channel inferenceattacks demonstrated earlier in this paper, we design, imple-ment and evaluate a context-aware access control frameworkfor smartwatch sensors. The framework (Figure 8) consistsof two key components: (i) a real-time typing activity de-tection (rTAD) and (ii) a motion sensor access-controller(MSAC). Preliminary evaluations of the framework lead usto very promising results.

7.1 Typing Activity RecognitionDetecting when a smartwatch user is typing on a key-

board is not as straightforward as detecting contexts such aslocation or temperature. Running complex machine learn-ing based classification on very limited processors of smart-watches is not a practical solution. Moreover, rTAD mustbe real-time so that protection measures can be activatedproactively. The second bottleneck is the limited batterycapacity. Sampling sensors at high frequency and perform-ing complex computations discharges the smartwatch bat-tery rapidly, requiring frequent recharge of the device. Forexample, continuous sampling of the accelerometer and gy-roscope at 50 Hz on our Samsung Gear Live smartwatchcompletely drains the battery in less than an hour of use,which will severely affect the usability. From these observa-tions it is evident that we have to identify features which areeasy to compute and compatible with low sensor samplingrates. However, reducing sampling frequency also meanscompromising the accuracy of classification. To compen-sate the reduction in sampling frequency, we design featuresusing a assorted set of motion sensors (sampled at approxi-mately 15 Hz) in order to make a highly perceptive decision.Following are the five feature we incorporate in our proposedrTAD component:

• Energy: Activity measured in terms of cumulative lin-ear accelerometer readings. An unworn watch lying on atable has zero energy, while an athlete’s watch has highenergy. Typing activity typically results in low but non-zero energy. We apply a low pass filter over the linear

803

1

2

3 Ground Truth TypingTyping Activity Ground

Truth

Recognition RecognitionTyping Activity Recognition

(1 detection)

Typing Activity Detection

15 minutes time segment

Classification Result FN TP TN FP

(a) N=1

1

2

3 Ground Truth TypingTyping Activity Ground

Truth

Recognition RecognitionTyping Activity Recognition

(2 detections within a minute)

Typing Activity Detection

15 minutes time segment

Classification Result FN TP TN FP

(b) N=2

Figure 9: From bottom to top, (1) the 10 second detection windows where typing was detected are markedin red vertical lines, (2) when N detections occurs within a minute, typing activity is recognized for that 15minute time segment, and (3) the ground truth collect by prompting the participant.

accelerometer to eliminate high-frequency noise caused byenvironmental factors.

• Turnarounds: Major positive to negative (or vice versa)changes on linear accelerometer readings signify the turn-arounds adjoining transitional movements between keypresses. Multiple turnarounds in close time proximity canbe associated with many activities, such as brushing teeth,eating, playing drums, etc. As a result, we need additionalfeatures to distinguish typing from other similar activities.

• Magnetic Field Change: Wrists are not rotated sig-nificantly when a user types on a QWERTY keyboard,while sitting in front of a stationary desk. Rapid changein north, east and nadir vectors implies non-typing activ-ity.

• Direction of Gravity: Gravity generally remains domi-nant on z-axis of accelerometer while typing on a horizon-tally placed keyboard. Any major fluctuations or gravityon x-axis or y-axis implies other activities.

• Step Count: We assume that the user will be stationarywhile typing on a computer keyboard. Thus, wheneverstep count increases, we rule out typing activity.

At the end of every 10 seconds, rTAD conducts a binaryclassification of weather the user typed in the last 10 sec-onds or not. All features for the binary classification resetsat the starting of the next 10 second window. The cutoffparameters for Energy and Turnarounds features are calcu-lated using the test data collected in Section 5, whereas cut-off parameters for Magnetic Field Change and Direction ofGravity features are calculated heuristically. Cutoff param-eter for Step Count is straightforward, because any increasein the pedometer count indicates walking. The exact cutoffparameters of each feature used in our evaluation of rTADcan be found in Table 2.

Like many other activity recognition problems, there isan inverse relationship between precision (number of actualtyping instances divided by number of all identified typinginstances) and recall (number of identified actual typing in-stances divided by number of actual typing instances), whereit is possible to increase one at the cost of reducing the other.A common approach to draw a favorable balance betweenfalse positives and false negatives is to ‘recognize’ an activityonly when multiple instances of the activity are detected inclose time proximity [23]. However, the use of rTAD is verydifferent than most informative activity detection applica-tions. The purpose of rTAD is to enable countermeasures

Table 2: The rTAD’s binary classification uses thefollowing parameters. At the end of each 10 secondwindows, if any of the features are outside theseparameter ranges, then non-typing activity is iden-tified, and vice versa.

Feature Parameters Ranges

Energy >= 10 and <= 200, after applying low-pass filter

Turnarounds >= 6

MagneticField Change

<= 2 samples with change in north di-rection

Direction ofGravity

>= 5 samples with fluctuations, or grav-ity on x-axis or y-axis

Step Count <= PreviousStepCount

against keystroke inference attacks as soon as typing activ-ity is identified. In other words, rTAD’s goal is to maximizerecall, but not to an extent where high false positives startaffecting the utility of other non-malicious applications in-stalled on the smartwatch. We evaluate rTAD in two differ-ent settings (visually explained in Figure 9):

• N=1: Typing activity is recognized whenever a 10 sec-ond window is classified as a typing window. As a result,countermeasures against keystroke inference attacks canbe initiated as early as 10 seconds from when the userstarts typing.

• N=2: Typing activity is recognized when two or more 10second windows are classified as typing windows withina minute. Countermeasures against keystroke inferenceattacks can be initiated no sooner than 20 seconds fromwhen the user starts typing.

7.2 ProtectionOnce rTAD identifies that the user is typing, countermea-

sures against keystroke inference attacks can be activatedautomatically in an non-intrusive fashion. And since theprotection mechanism is activated only when the user isidentified to be typing on a keyboard, the utility of the mo-tion sensors is not affected when user is actively using otherapplications on the smartwatch (such as playing games thatuse motion sensors). Such smart protection measures can beundertaken by the MSAC implemented in the operating sys-tem itself, or as a trusted middle-ware. For the framework towork, we assume that all third party applications get access

804

to motion sensor data only via the MSAC and the MSAChas the ability to modify or restrict the flow of motion sensordata. Although this assumption requires change in operat-ing system architecture, it should be a rudimentary task foroperating system developers. Also, it should be noted thatthis assumption does not require changes in existing thirdparty application, as long as the APIs to access motion sen-sors remain unchanged. Since MSAC requires a change inthe operating system architecture, we are unable to imple-ment a working MSAC. However, below we list out some ofthe strategies that the MSAC can adopt when rTAD reportsthat the user is typing:

• Complete Blocking: This strategy is the safest as it willcompletely block the side-channel, but it can also harmthe utility of other non-malicious applications that maywant to perform passive computing with motion data.

• Reduced Sampling Rate: When a user types for sig-nificant amount of time in a day, complete blocking ofthe motion sensor data from third party applications cangreatly harm the utility of other non-malicious applica-tions. In order to preserve some of the utility, MSAC canprovide third party applications access to motion sensorsat a reduced sampling rate. Restricting the precision atwhich applications are allowed to access the sensors re-duces the efficiency of side-channel attacks [19, 17].

• Random Out of Order Blocks: A smarter MSAC cansend out of order blocks of sensor readings to third partyapplications. Random out of order blocks of sensor datacan greatly lower the inference accuracy of side-channelattacks, but may still preserve utility for certain non-malicious application. For example, a daily calorie countermay not be significantly affected by out of order blocks ofsensor readings. That is because the calorie count willbe accurate as long as all the motions are captured bythe application, even if out of order. Size of block andrandomization algorithm will play a signification role indetermining how much an adversary can recover versusthe utility of randomly ordered blocks.

There can be other strategies that the MSAC can adopt aswell. We think that it will be best if users are allowed tochoose among the MSAC protection strategies, suitable fortheir personal lifestyles.

7.3 EvaluationWe implement and evaluate our proposed rTAD, because

the effectiveness of the entire protection mechanism relieson rTAD. To evaluate rTAD, we use the same smartwatchsetup detailed in Section 4.3. Preliminary evaluation in-volved 4 participants with varied lifestyles wearing the watchfor long durations. If the rTAD application does not rec-ognize typing activity, it prompts the participant every 15minutes to collect ground truth. If the rTAD applicationdoes recognize typing activity, it prompts the user immedi-ately for ground truth. In case the user continues to type forlong period of time, the rTAD application does not ask theuser for ground truth for 15 minutes after the initial detec-tion. This avoids annoyance to the participants and resultsin equitable ground truth collection. In real usage, the userwill not be prompted for ground truth, instead the MSACwill automatically start acting as soon as typing activity isreported by rTAD.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

TP TN FP FN TP TN FP FN

N=1 N=2

Precision = 0.585106383 Recall = 0.948275862

Precision = 0.867088608Recall = 0.85625

Figure 10: Normalized true positive (TP), true neg-ative (TN), false positive (FP), and false negative(FN), along with precision and recall values.

One problem that we encountered when evaluating rTADwas that in certain cases the Magnetic Field Change featureacted unexpectedly, introducing a lot of error in classifica-tion. We observed that the unexpected behavior occurredonly while the participant typed on a laptop. Further inves-tigation revealed that the magnet inside the laptop’s harddrive (which are normally installed directly under the key-board) was responsible for this unexpected behavior. Sincedesktop keyboards are generally placed away from the harddrives, the Magnetic Field Change feature performed in anexpected fashion in that case. For the remainder of theevaluation we do not use the Magnetic Field Change featurebecause it will be hard for the participants to rememberand distinguish between laptop and desktop typing. How-ever, as laptops featuring non-magnetic solid state drives arebecoming popular, the Magnetic Field Change feature mayeventually become useful in future.

The combined true positives (TP), true negatives (TN),false positives (FP), and false negatives (FN) results fromthe 4 participants are shown in Figure 10. To better visu-alize the difference between the two settings, the values inFigure 10 are normalized with respect to the total numberof ground truth collected in each setting. As explained withexamples in Figure 9, TP signifies that the user was typingand rTAD correctly identified that the user was typing, andif rTAD failed to recognize that the user was typing, it wasrecorded as FN. Similarly, TN signifies that the user was nottyping and rTAD correctly identified that the user was nottyping, and if rTAD identified that the user was typing, itwas recorded as FP.

In case of N=1, we observe lesser FN and higher TP, butat the cost of higher FP. In case of N=2, we observe lowerFP, but at the cost of lower TP and higher FN. In otherwords, rTAD can gain recall by trading-off precision, andvice versa. Nevertheless, in both settings rTAD achievedhigh recall values, which asserts it’s effectiveness in the pro-tection framework.

7.4 Discussions

• Left or Right: Our attack framework, presented ear-lier in this paper, requires the adversary to have a dif-ferently trained framework for targets wearing watch ontheir right hand. However, the design of rTAD (and thusthe whole protection framework) makes it independent ofwhich hand the smartwatch is worn on. As a result, rTADcan start working out of the box, without manual setup.

805

• Usability: Our primary focus while designing the pro-tection framework was usability. We work towards a lowprocessor intensive design, which in turn consumes lowbattery power. We identify activities similar to typing onkeyboard, and try to minimize false positives. The pro-tection mechanism works in an non-intrusive fashion aswell, and we envision that the entire setup process in areal-life implementation will be very simple.

8. CONCLUSIONThis paper presents a novel keystroke inference attack

which utilizes wrist-motion data gathered from a smart-watch as side-channel information. In order to harvest theinformation masked in wrist movements for inferring key-strokes, we designed and validated a novel learning-basedattack framework which is specifically targeted towards re-covering text typed by a smartwatch wearing user on anexternal QWERTY keyboard. By showing the feasibility ofthe proposed classification and prediction mechanisms, wevalidate our hypothesis that wearable devices such as smart-watches can leak sensitive personal information if access tosensors (on these devices) is not appropriately regulated. Wealso present a smart protection framework to automaticallyregulate sensor access, aimed to improve privacy withoutdegrading utility of the device.

9. ACKNOWLEDGMENTSResearch reported in this publication was partially sup-

ported by the Division of Computer and Network Systems(CNS) of the National Science Foundation (NSF) under aw-ard number 1523960 and by the Information Institute of theU.S. Air Force Research Lab (AFRL) under the summerfaculty fellowship extension grant. The content is solely theresponsibility of the authors and does not necessarily repre-sent the official views of the NSF or the AFRL. The authorswould also like to thank Dr. Kevin Kwiat and Dr. CharlesKamhoua for their valuable inputs and suggestions.

10. REFERENCES[1] IEEE Recommended Practices for Speech Quality

Measurements. IEEE Transactions on Audio andElectroacoustics, 1969.

[2] Experian Marketing Services - Simmons Connect.http://tinyurl.com/experiansmartphones, May 2013.[Online; accessed 8-June-2015].

[3] D. Agrawal, B. Archambeault, J. R. Rao, andP. Rohatgi. The EM Side-channel(s). In CryptographicHardware and Embedded Systems, 2002.

[4] D. Asonov and R. Agrawal. Keyboard AcousticEmanations. In IEEE S&P, 2004.

[5] M. Backes, T. Chen, M. Duermuth, H. Lensch, andM. Welk. Tempest in a Teapot: CompromisingReflections Revisited. In IEEE S&P, 2009.

[6] M. Backes, M. Durmuth, S. Gerling, M. Pinkal, andC. Sporleder. Acoustic Side-Channel Attacks onPrinters. In USENIX Security, 2010.

[7] M. Backes, M. Durmuth, and D. Unruh.Compromising Reflections-or-How to Read LCDMonitors Around the Corner. In IEEE S&P, 2008.

[8] A. Barisani and D. Bianco. Sniffing Keystrokes withLasers/Voltmeters. Black Hat USA, 2009.

[9] Y. Berger, A. Wool, and A. Yeredor. DictionaryAttacks using Keyboard Acoustic Emanations. InACM CCS, 2006.

[10] J. Cappos, L. Wang, R. Weiss, Y. Yang, andY. Zhuang. BlurSense: Dynamic Fine-Grained AccessControl for Smartphone Privacy. In IEEE SensorsApplications Symposium, 2014.

[11] T. Fiebig, J. Krissler, and R. Hansch. Security Impactof High Resolution Smartphone Cameras. In USENIXWOOT, 2014.

[12] J. Friedman. Tempest: A Signal Problem. NSACryptologic Spectrum, 1972.

[13] M. G. Kuhn. Optical Time-Domain EavesdroppingRisks of CRT Displays. In IEEE S&P, 2002.

[14] M. G. Kuhn and R. J. Anderson. Soft Tempest:Hidden Data Transmission Using ElectromagneticEmanations. In Information Hiding, Lecture Notes inComputer Science, 1998.

[15] A. Maiti, M. Jadliwala, J. He, and I. Bilogrevic.(Smart)Watch Your Taps: Side-channel KeystrokeInference Attacks Using Smartwatches. In ACMISWC, 2015.

[16] P. Marquardt, A. Verma, H. Carter, and P. Traynor.(sp)iPhone: Decoding Vibrations From NearbyKeyboards Using Mobile Phone Accelerometers. InACM CCS, 2011.

[17] Y. Michalevsky, D. Boneh, and G. Nakibly.Gyrophone: Recognizing Speech from GyroscopeSignals. In USENIX Security, 2014.

[18] L. T. Nguyen, H.-T. Cheng, P. Wu, S. Buthpitiya, andY. Zhang. PnLUM: System for Prediction of NextLocation for Users with Mobility. In Nokia MobileData Challenge Workshop, 2012.

[19] E. Owusu, J. Han, S. Das, A. Perrig, and J. Zhang.ACCessory: Password Inference using Accelerometerson Smartphones. In ACM HotMobile, 2012.

[20] J.-J. Quisquater and D. Samyde. ElectroMagneticAnalysis (EMA): Measures and Countermeasures forSmart Cards. In Smart Card Programming andSecurity, Lecture Notes in Computer Science, 2001.

[21] R. Schlegel, K. Zhang, X.-y. Zhou, M. Intwala,A. Kapadia, and X. Wang. Soundcomber: A Stealthyand Context-Aware Sound Trojan for Smartphones. InISOC NDSS, 2011.

[22] P. Smulders. The Threat of Information Theft byReception of Electromagnetic Radiation from RS-232Cables. Computers & Security, 9(1), 1990.

[23] E. Thomaz, I. Essa, and G. D. Abowd. A PracticalApproach for Recognizing Eating Moments withWrist-mounted Inertial Sensing. In ACM UbiComp,2015.

[24] W. Van Eck. Electromagnetic Radiation from VideoDisplay Units: An Eavesdropping Risk? Computers &Security, 4(4), 1985.

[25] M. Vuagnoux and S. Pasini. CompromisingElectromagnetic Emanations of Wired and WirelessKeyboards. In USENIX Security, 2009.

[26] H. Wang, T. T.-T. Lai, and R. Roy Choudhury. Mole:Motion leaks through smartwatch sensors. In ACMMobiCom, 2015.

806

http://tinyurl.com/experiansmartphones

Smartwatch-Based Keystroke Inference Attacks and Context ...

Documents

Transcript of Smartwatch-Based Keystroke Inference Attacks and Context ...