Enabling Reconstruction of Attacks on Users via Efﬁcient ... · social engineering attack and...

Enabling Reconstruction of Attacks on Usersvia Efficient Browsing Snapshots

Phani Vadrevulowast Jienan Liulowast Bo Lilowast Babak Rahbariniadagger Kyu Hyung Leelowast and Roberto Perdiscilowastlowast Department of Computer Science University of Georgia USA

dagger Department of Math and Computer Science Auburn University Montgomery Alabama USAvadrevujienanbokyuhleeperdiscicsugaedu brahbariaumedu

AbstractmdashIn this paper we present ChromePic a web browserequipped with a novel forensic engine that aims to greatly enhancethe browserrsquos logging capabilities ChromePicrsquos main goal is toenable a fine-grained post-mortem reconstruction and trace-backof web attacks without incurring the high overhead of record-and-replay systems In particular we aim to enable the reconstructionof attacks that target users and have a significant visual componentsuch as social engineering and phishing attacks To this endChromePic records a detailed snapshot of the state of a webpage including a screenshot of how the page is rendered and aldquodeeprdquo DOM snapshot at every significant interaction betweenthe user and the page If an attack is later suspected these fine-grained logs can be used to reconstruct the attack and trace backthe sequence of steps the user followed to reach the attack page

We develop ChromePic by implementing several carefulmodifications and optimizations to the Chromium code base tominimize overhead and make always-on logging practical Wethen demonstrate that ChromePic can successfully capture andaid the reconstruction of attacks on users Our evaluation includesthe analysis of an in-the-wild social engineering download attackon Android a phishing attack and two different clickjackingattacks as well as a user study aimed at accurately measuring theoverhead introduced by our forensic engine The experimental re-sults show that browsing snapshots can be logged very efficientlymaking the logging events practically unnoticeable to users

I INTRODUCTION

Web browsers have unfortunately become the preferred entrypoint for a large variety of attacks For example through thebrowser a user may be exposed to malware infections via socialengineering attacks [31] or drive-by downloads [14] phishingattacks [9] cross-site scripting [48] cross-site request forgery [4]clickjacking [15] etc

While the mechanics of these attacks (ie how they aretypically executed within the browser) are well understood it isoften challenging to determine how users arrived to a given attackpage in the first place At the same time tracing back the stepsthrough which an attack unfolds can be critical to fully recoverfrom an intrusion [18] and prevent future compromises Forinstance security analysts and forensic investigators often try not

only to understand how a specific attack instance was executed(eg find the URL from which malware was downloaded) butalso attempt to put the attack into context by reconstructingthe steps that preceded it [30] (eg whether the user fell for asocial engineering attack and inadvertently triggered the malwaredownload) While existing browser and system logs may assist inreconstructing a partial picture of how an attack page was reachedthese logs are often sparse and do not provide sufficient detailsto precisely reconstruct the events preceding the user landing onthe attack page and what exactly happened afterwards

Quoting [21] ldquowe tend to lack detailed information [aboutan attack] just when we need it the mostrdquo Therefore to enablea detailed reconstruction and trace-back of web attacks weneed enhanced logging capabilities [19] [21] [28] For instancesystems such as ClickMiner [29] and WebWitness [30] rely onfull network traffic logs (or traces) and deep packet inspectionto reconstruct the sequence of pages visited by users before theyreach an attack page (eg a malware download URL) Howevereven by using full traffic traces these systems are sometimesunable to precisely reconstruct all steps that brought a user toencounter an attack page due to the complexity of modern webtechnologies and the consequent discrepancies between systemevents (eg user-browser interactions) and the network traffic theygenerate [29] [30] Furthermore encrypted (eg HTTPS) trafficwould all but prevent these systems from inferring the completepath to the attack page Other approaches such as ReVirt [11]and WebCapsule [28] go beyond network traffic logging andanalysis and instead focus on recording fine-grained details atthe system level to enable full attack replay However whole-system record-and-replay [10] [11] is computationally expensiveand is especially difficult to deploy on resource constrainedmobile devices On the other hand while in-browser record-and-replay [28] can be more easily ported to mobile devices it ishindered by difficulties introduced by OS-level non-determinism(eg due to thread scheduling) and can result in an inaccuratereplay of browsing sessions [28] thus preventing reliable attackreconstruction Furthermore to enable replay record-and-replaysystems typically require storing large amounts of informationincluding all network traffic generated by the browser

In this paper we present ChromePic a web browser equippedwith a novel forensic engine aimed at greatly enhancing Chromersquoslogging capabilities ChromePicrsquos main goal is to enable a fine-grained reconstruction and trace-back of web attacks withoutincurring the high overhead typically associated with record-and-replay systems such as [11] [28] In particular we aim toenable the reconstruction of attacks that target users and havea significant visual component such as social engineering and

Permission to freely reproduce all or part of this paper for noncommercialpurposes is granted provided that copies bear this notice and the full citationon the first page Reproduction for commercial purposes is strictly prohibitedwithout the prior written consent of the Internet Society the first-named author(for reproduction of an entire paper only) and the authorrsquos employer if the paperwas prepared within the scope of employmentNDSS rsquo17 26 February - 1 March 2017 San Diego CA USACopyright 2017 Internet Society ISBN 1-1891562-46-0httpdxdoiorg1014722ndss201723100

phishing attacks To this end we focus on instrumenting GoogleChromium [40] (the open source project on which the Chromebrowser is based) to efficiently record a browsing snapshot atevery meaningful interaction between the user and the browserFor example every time the user clicks on a page or presses akey (eg Enter) we record the input information (eg mousecoordinates key code etc) and the page URL shown in thebrowser bar Furthermore we take a screenshot of the renderedpage and a ldquodeeprdquo snapshot of the related DOM tree andembedded resources (eg iframes images etc) We refer tothis type of detailed browsing snapshots as webshots Intuitivelysuch rich logs would allow a security team or forensic analyst totravel back in time and effectively reconstruct a userrsquos browsingactions over a desired time window In fact we can consider thescreenshot contained in each webshot as a ldquovideo framerdquo Thesescreenshots can then be stitched back together to reconstructprecisely what the user saw during every significant interactionwith the browser Furthermore each screenshot is associated withthe related full state of the DOM (including embedded objectsand JavaScript source code) recorded at the very same instant intime Namely we record exactly how a specific page DOM wasstructured how it was rendered at the time of a user-browserinteraction and how the user interacted with it thus enabling ananalysis of how the attack was implemented Our ChromePicbrowser addresses the following challenges

bull Forensic Rigor Our main goal (and challenge) is to enablewebshots to be taken synchronously with user-browserinteractions Namely let u(t0) represent a user-browserinteraction u (eg a mouse click or key press) that occursat time t0 Because we aim to prevent any (potentiallymalicious) JavaScript code that listens on u from altering thepage before the webshot is taken our goal is to ldquofreezerdquo theprocessing of u until we take both a screenshot of the pagecurrently displayed by the browser as well as a full DOMsnapshot Only after the snapshot is completed the event uwill be released and processed by the browser The needfor this synchronous snapshots constraint is motivated bythe fact that we intend to prevent any discrepancy betweenwhat is logged in the webshot and what the user saw (andthe DOM of the page heshe interacted with) at the veryinstant of time t0 when the event u occurred

bull Efficiency As attacks cannot be easily predicted ChromePicaims to be always-on and continuously log webshotsThis allows us to record undetected (and unexpected)attacks in accordance with the compromised recordingdesign principle [33] However to make always-on loggingpractical efficiency is critical and logging overhead must bereduced to a minimum In particular because webshots aretaken synchronously with each user input we effectivelyintroduce a processing overhead that increases the naturalprocessing of input events Therefore the challenge we faceis to make sure that no negative effect (eg latency) will beperceived by the user Based on previous studies on human-computer interaction [47] we target a logging time budgetof around 150ms which would make the logging eventspractically unnoticeable to users To this end we implementa number of careful system-level browser modifications andoptimizations which we describe in detail in Section IV

bull Transparency We require webshots to be taken in atransparent way wrt the web pages that are being logged

For instance there should be no easy way for (malicious)javascript code running on a page to detect whether theinteractions (inputs) between the user and the page arebeing logged or not In addition webshots should also betransparent to users in that once they are enabled the usershould not notice any difference in the behavior of thebrowser when webshots are being recorded

In Sections IV and V we discuss why the existing snapshot-taking capabilities currently implemented by Chromium do notsatisfy the above requirements For instance we describe theimplementation of a browser extension that attempts to meet thesame requirements described above using the existing extensionAPI and demonstrate why such a solution is not viable

The reader may notice that because ChromePic continuouslyrecords detailed information about the state of the browserincluding visual screenshots our system may produce numerouslogs some of which may include sensitive information Whileprotecting the security and privacy of the logs recorded byChromePic is outside the scope of this paper it is important tonotice that existing solutions could be used to mitigate theseconcerns For example sensitive URL whitelisting and logencryption using a key escrow as proposed in [28] could also beused in ChromePic We discuss these solutions in more details inSection VIII Also while a typical browsing session may resultin numerous webshots often the changes to a page betweentwo consecutive user-browser interactions are minimal thusresulting in few changes between snapshots This provides anopportunity for storing only the difference between snapshotsIn addition the visual screenshots can be reduced in size usinglossy compression and the overall storage requirements for thelogs of each browsing session could be further reduced usingstandard archiving tools We discuss storage requirements inmore details in Section VII

In summary we make the following contributions

bull We propose ChromePic a web browser equipped with anovel forensic engine that aims to enable the reconstructionand trace-back of web browser attacks especially for attacksthat directly target users and have a significant visualcomponent such as social engineering and phishing

bull We develop ChromePic by implementing careful modifi-cations and optimizations to the Chromium code base tominimize overhead and make always-on logging practicalIn addition we discuss why implementing ChromePic usingexisting facilities such as Chromersquos extension API is nota viable option

bull We demonstrate that ChromePic can successfully captureand aid the reconstruction of attacks on users Specificallywe report the analysis of an in-the-wild social engineeringdownload attack on Android a phishing attack and twodifferent clickjacking attacks

bull We evaluate the efficiency of our solution via a user studyinvolving 22 different users who produced more than 165hours of browsing activities on hundreds of websites Weprovide precise measurements about the overhead introducedby ChromePic on multiple devices including desktop andlaptop Linux systems as well as Android tablet devicesOur results show that the vast majority of webshots can betaken very efficiently making them practically unnoticeableto users

2

II WEBSHOTS

As mentioned in Section I we aim to enable the reconstructionand trace-back of web attacks that target users with particularfocus on attacks that have a significant visual component such associal engineering and phishing attacks To this end we designChromePic to embed an always-on forensic engine Specificallywe instrument the Chromium browser to record rich logs calledwebshots that aim to capture the state of the rendered web pagesat every significant interaction between the user and the browser

A What is a WebShot

A webshot consists of the following components (i) atimestamp and other available details about the user input eventthat triggered the webshot (eg mouse event type and relatedscreen coordinates keypress code etc) (ii) the full URL of thepage with which the user interacted (iii) a screenshot of thecurrently visible portion of the web page (the viewport) renderedby the browser (iv) a ldquodeeprdquo DOM snapshot that consist of thepagersquos DOM structure all embedded objects (eg the content ofall images) the DOM and embedded objects of all iframesthe JavaScript code running on the page etc

To satisfy the forensic rigor requirement mentioned inSection I webshots must be taken synchronously with thetriggering user input This requirement along with the always-onoperational goal for our ChromePic system has a significantimpact on the amount of overhead we can afford for producingeach webshot In Section IV we describe a set of very carefulcode instrumentations and optimizations that make efficientwebshots feasible

B Input Events that Trigger a WebShot

WebShots are triggered by user interactions with web pagesIn theory we could take a screenshot for every single ldquorawrdquo userinput event including every mouse movement every key-downevent every tap or gesture on a touchscreen etc However manyuser input events (eg most mouse movements) have no realchanging effect on the underlying web page Furthermore toreduce overhead it is desirable to minimize the type and numberof events that actually trigger a webshot At the same time ourgoal is to capture enough webshots to allow for the reconstructionand trace-back of possible attacks To balance these conflictinggoals we trigger a webshot for each of the following events

bull Mouse Down Mouse clicks are a common interactionbetween users and web pages Clicks often have importantconsequences such as initiating the navigation to a newpage submitting a form selecting a page element etc Aseach click starts with a mouse down event we trigger awebshot for each such eventbull Tap On touchscreen devices taps are the initial event for a

variety of gestures including ldquoclickingrdquo on a link or buttonTherefore a tap often (though not always) has an effectsimilar to a mouse down event Therefore we trigger awebshot at every tap eventbull Enter Keypress In many cases pressing Enter has

the same effect as a mouse click such as submitting aform navigating to a new link etc Therefore we trigger awebshot at every keydown event for the Enter keybull Special Keys We also trigger a screenshot every time a

special key is pressed For example pressing tab while

entering data in a form usually allows to transition froman input field to another thus indicating that the previousfield has been fully entered Other keys (eg the space bar)may be used to scroll a page or pausestart a video or tonavigate to the previous page (eg using the backspace)We have selected a total of five special keys whose rawkeydown event triggers a webshot

bull Generic Input Events All other input events such as mousemovements mouse wheel key presses etc that do not fallwithin the above categories are also considered Specificallywe trigger a webshot for each ldquogenericrdquo event but impose atime constraint if the previous webshot has been taken morethan a predefined number of seconds ago (eg 5 seconds)we take another webshot otherwise we skip this event (ieno webshot is taken) Notice that this time constraint onlyapplies to ldquogenericrdquo input events and to the case when a keyis kept pressed For all other single events mentioned earlier(eg mouse down tap etc) we always take a webshotregardless of the time

Webshots are logged synchronously with the triggering inputevent as required by the forensic rigor property introduced inSection I Effectively the user input will be held from processinguntil a full webshot is taken In Section IV we will explainthat because user inputs are processed on the render thread ofChromiumrsquos renderer process this has the effect of preventingthe DOM of the page from changing before the webshot isrecorded Hence each webshot reflects what the user saw at themoment of her interaction with the page This has the effectof preventing attempts from the attacker to hide the attack byaltering the log for example by rapidly changing the DOMand appearance of the page immediately after a user-browserinteraction

III USE CASES

In this section we discuss a representative use case scenarioto highlight how our system could be used in practice In generalwe envision ChromePic to be particularly useful in aiding thereconstruction and trace-back of attacks that involve user actionssuch as social engineering and phishing attacks In these casesreconstructing what the user saw or what exact informationwas entered on a phishing page is critical to understand howthe attack unfolded We argue that these types of attack aredifficult to reconstruct without a visual account of what the userexperienced ChromePic would ideally be deployed in corporateand government network environments where web-based attacksmay represent the first step of larger incidents (eg targetedattacks) At the same time we believe ChromePic may also beuseful in other scenarios such as in web application debugging

Example Use Case Meet Bob a corporate employee who whileusing the browser at work falls victim to a social engineeringmalware download attack [31] by clicking on a misleadingadvertisement Once installed the malware opens a backdoor tothe corporate network which is later used by the malware ownersto gain access to and exfiltrate sensitive information triggeringa data breach detection (eg due to side effects such as sellingof information in the underground markets) Then a forensicanalysis team is hired to investigate how the data was leaked Byanalyzing network logs such as web proxy logs that report allURLs visited by the corporate network users the forensic analystsnotice something anomalous (eg a particularly suspicious set

3

of URLs) in Bobrsquos web logs recorded a week earlier Thereforethe analysts ask for authorization to explore Bobrsquos ChromePiclogs Finally by exploring the webshots produced by our systemthe analysts are able to reconstruct the social engineering attackthat tricked Bob into installing the initial malicious software

By learning how Bob fell for the attack including obtaininga precise reconstruction of the visual tricks used for the socialengineering attack the corporate network security team couldthen develop a user training session on social engineering tobetter educate corporate employees on how to decrease thelikelihood of becoming a victim [16] In addition by having boththe screenshot taken at the very moment when the user clicked onthe misleading ad as well as the related full DOM snapshot thisinformation could be used to enhance browser-based defensesagainst social engineering [3]

Notice that ChromePic enables the reconstruction not only ofthe exact moment in which the attack is triggered (eg a clickon a social engineering malware ad) but also of the sequenceof pages with which the user interacted before falling for theattack In addition in case of phishing attacks ChromePic wouldalso provide an account of the exact information the user leakedon the phishing site Knowing what information was ldquophishedrdquomay be important to decide what actions to take to mitigatethe damage to both the user and to the corporate network (egthe user may have leaked access credentials related to sensitivecorporate assets)

IV SYSTEM DETAILS

A Background

Before we present the details of ChromePic we first providea brief overview on the Chromium browser architecture AsChromiumrsquos architecture is fairly complex we will limit thefollowing description to highlight only those components thatare needed to understand how our code modifications andoptimizations work

Chromium uses a multi-process architecture [40] whichincludes a main browser process called Browser and onerendering process called Renderer per each open browser tab1The Browser runs multiple threads [45] including a UI Threadwhich handles UI events among other things and an IO Threadwhich handles the IPC communications [39] between the Browserand all Renderers Each Renderer is also multithreaded [38] TheRendererrsquos Main Thread is responsible for communicating viaIPC with the Browser whereas the Rendererrsquos Render Threadis responsible for rendering web content including executingJavaScript code

As shown in Figure 1 user inputs to a web page are firstreceived by the Browserrsquos UI Thread and then asynchronouslycommunicated via IPC (by the IO Thread) to the Renderer thatis responsible for the tab where the page is rendered The IPCmessage will first be processed by the Rendererrsquos Main Threadand then forwarded to the Rendererrsquos Render Thread [38] Forinstance a click on a hyperlink will be processed by the RenderThread to decide whether a navigation event should be triggered

1In practice a Renderer may in some cases be responsible for rendering morethan one tab [40] To simplify our description in the following we will assumeone tab per Renderer Also we will not consider out-of-process-iframes [41]which are a recent ongoing project

Browser IO Thread

IPCSend(input)

User

Renderer Main Thread

input

Renderer Render Thread

Notify(input)Input Processing

Browser UI Thread

Send(input)

Fig 1 Overview of how user inputs to a web page are passed to the RendererThread Dashed arrows indicate asynchronous calls Notice that the functionnames are intentionally simplified and do not exactly reflect the (long chainsof) function calls that exists in the source code

Furthermore JavaScript code execution (eg initiated due to alistener registered on the input event) is also executed in thecontext of the Render Thread

B ChromePic Overview

Figure 2 provides a simplified overview of how ourChromePic browser generates a webshot Notice that all dashedarrows in the figure represent asynchronous calls

In response to a user input ChromePic takes the followingmain actions (1) on the Browser process it calls Chromiumrsquoscode for taking a screenshot of the current visible tab (see detailsin Section IV-D) to which the user input is destined (2) it opensa file that will be used to save the DOM snapshot and passes itsfile descriptor fd to the Renderer along with the user input(3) as the input and fd are received by the Renderer it saves thecurrent entire DOM including embedded objects and JS codein MHTML format (4) once the DOM snapshot has been savedthe Renderer waits for confirmation from the Browser that thescreenshot taking process has terminated only then (5) the userinput is processed using the original Rendererrsquos workflow In thisprocess notice that if the screenshot finishes before the DOMsnapshot is saved there will simply be no delay between steps(3) and (5)

The high-level steps described above allow us to guaranteethat each webshot is taken synchronously with the user input andno DOM modification due to the current input is allowed beforethe webshot is logged in accordance with the forensic rigorrequirement stated in Section I Moreover our webshot eventsare transparent to the (possibly malicious) page ChromePicrsquoscode is designed so that after logging the input can continueits ldquonaturalrdquo processing path and no information regarding thewebshot events is transferred to the page (notice that whileside-channel attacks cannot be excluded user input timings arenot easily predictable thus making detecting the existence ofChromePic a laborious non-deterministic endeavor)

Challenges While the process of taking synchronous screenshotsmay appear straightforward at first our design of ChromePicfaces two main challenges First the limited documentationfor many of the modules we instrumented forced us to a greatdeal of ldquoreverse engineeringrdquo of the source code In fact ourcode modifications had to span not only multiple processes butalso multiple threads per process (UI IO Renderer GPU etc)In addition while we strived to limit the number of changesto existing code as much as possible to meet our efficiencyrequirements we had to engineer a number of optimizations soto minimize the webshot overhead shown in Figure 2

4

Browser IO Thread

IPCSend(input fd)

take tabscreenshot

User


IPCSend(screen_taken)

input


Notify(input fd)

Notify(screen_taken)

regular inputprocessing

take DOMsnapshot

TakeScreenshot()

Browser UI Thread

Send(input fd)

wait for screenshotBrowser

File ThreadSave(screen)

Send(screen_taken) webshotoverheadsave to file (fd)

Fig 2 Simplified view of how ChromePic processes user inputs that trigger a webshot Dashed arrows indicate asynchronous calls

C Identifying the Target Renderer Process

Every Renderer Process has a correspondingRenderProcessHost object in the Browser processwhich is used to send and receive IPC messages betweenthe two processes Effectively the RenderProcessHostrepresents the Browser side of a single Browser-RendererIPC connection [40] A RendererProcessHost objectcommunicates with multiple RenderWidgetHost instanceseach one representing one tab in the browser [38] Forevery RenderWidgetHost object we create a customSnapshotHandler object whose responsibility is tocoordinate the process of taking webshots for a giventab When an input event is received the responsibleRenderWidgetHost object is identified by the Browser andrepresents the last point in the Browser after which the event ispassed on to the correct Renderer via IPC message We takecontrol of the input event just before it is passed on to theRenderer and handle the event via our SnapshotHandlerinstead By doing so we are able to identify the correctRoutingID for the IPC messages [39] and therefore we cancoordinate the process of taking a snapshot with the appropriateRender Thread

D Taking Screenshots Efficiently

One way to implement the TakeScreenshot func-tion shown in Figure 2 would be to call ChromiumrsquosCopyFromCompositingSurface and simply wait forthe CopyFromCompositingSurfaceFinished callback(see Figure 3) However we empirically found that this processsometimes takes a large amount of time to finish (eg severalhundred milliseconds depending on the web page) Obviouslya large latency would be unsustainable for our purposes asit violates our efficiency requirements Therefore we hadto break down and study the details of the process usedby Chromium to satisfy CopyFromCompositingSurfaceWhile documentation such as [37] [42] helped this was not aneasy task as it required a much deeper understanding of theinternals of Chromiumrsquos compositing process than found in thesparse Chromium project documents

We then discovered that to efficiently take synchronousscreenshots we could safely use the process depicted in Figure 3Specifically to satisfy CopyFromCompositingSurfacethe Browser relies on a graphics library (GL) API and assistancefrom the GPU (with code running on the GPU process or GPUthread in Android [37]) The GLGPU module in Figure 3 isrepresented separately from the Browser UI thread for presen-tation convenience (to be more precise the DrawFrame andGetFrameBufferPixels functions are actually executed

GL GPU

User

inputCopyFromCompositingSurface()

Browser UI Thread

setNeedsCommmitDrawFrame()

GetFrameBufferPixels()

RequestCopyOfOuput()

Browser IO Thread

Send(screen_taken)PrepareTextureCopyOutputResult()

Result

CropScaleReadBackCopyFromCompositingSurfaceFinished

IPC messageto Renderer

Browser File Thread

Save

TakeScreenshot()

Fig 3 Overview of how screenshots are taken and the Renderer is notified(notice that some of the function call names have been shortened and mademore readable for presentation purposes compared to the source code)

asynchronously within the context of the Browserrsquos UI threadOnly the ReadBack part of the screenshot taking process isexecuted on the GPU processthread)

In simplified terms we can break down the screenshot-takingprocess into five main steps (1) draw (ie composite the layersof) the web page (2) copy the pixels (3) cropscale (4) readback the final bitmap (5) save to file (we execute the filesaving process within the Browserrsquos File Thread [45]) Howeverwe found that once step (2) is completed the screenshot haseffectively been taken and do not need to wait for the cropscaleoperation before we can ldquoreleaserdquo the user input for furtherprocessing Namely after step (2) the screenshot content is notgoing to be influenced by the processing of the input even ifthe input causes the DOM to change

The DrawFrame operation is controlled by the compositorscheduler ccscheduler which takes into account factorssuch as the devicersquos v-sync and dynamically establishes a targetrate at which frames are drawn [36] For instance on devices witha v-sync frequency of 60Hz a frame would be ideally drawnevery sim16ms Thanks to these properties the time betweenthe arrival of the user input and our Send(screen_taken)message in Figure 3 is typically on the order of only few tensof milliseconds (see Section VII)

E Taking ldquoDeeprdquo DOM Snapshots Efficiently

Along with each screenshot we also take a ldquodeeprdquo DOMsnapshot that not only includes the current structure of the DOM(at the time of the input) but also the content of all framesembedded objects (eg images) and javascript code To enablethese rich DOM snapshots we apply several important changes toChromiumrsquos code for saving web pages in MHTML format [32]Specifically we enhance Chromiumrsquos code to include javascript

5

code into the DOM snapshots and importantly to significantlyimprove efficiency Below we focus on detailing these lattercode optimizations

To save a page in MHTML format Chromium implements aGenerateMHTML function which can be called in the Browserprocess from the UI Thread Given a specific tab this functionis responsible for serializing the tabrsquos web page content intoMHTML format and to save it into a file However the Browserdoes not have direct access to the DOM of the page in each tabTherefore to save the MHTML content the Browser must rely onthe Renderer process But because the Renderer executes withina sandbox it cannot directly open a file to save the MHTMLcontent Chromiumrsquos solution is to (1) open a file in the Browserprocess (2) pass the file descriptor of the already opened file tothe Renderer and (3) ask the Renderer (via IPC message) toproduce the MHTML content of the main page and each frameit embeds and to save it into this file

Unfortunately instead of sending only one IPC message tothe Renderer for the entire process Chromiumrsquos code results intosending one IPC message to request to save the main page aswell as one separate IPC message to the Renderer to request thesaving of each frame embedded in the page2 Because moderncomplex web pages contain a potentially large number of frames(eg an iframe for each ad embedded in the page) the fullprocess of serializing and saving the MHTML content can bequite expensive lasting from hundreds of milliseconds to a fewseconds Therefore using the existing code to take a DOMsnapshot synchronously with each user input would violate ourefficiency requirements

One of the reasons why the above process is highly inefficientis that for each IPC that is received the Renderer creates a Taskwhich will (asynchronously) run on the Render Thread at a timedecided by the Renderer Scheduler [35] As mentioned in [36]ldquothe render thread is a pretty scary placerdquo due to its complexityIts execution ldquoroutinely stalls for tens to hundreds of milliseconds[] on ARM stalls can be seconds longrdquo [36] The reason is thatthere are many different types of tasks that share the same RenderThread processing time For example execution commonly stallsdue to the execution of ldquolongrdquo javascript code [44] Thereforeeach task related to a framersquos MHTML seralization could easilyfind itself starving for CPU time thus bloating the overall timeneeded to complete the full DOM snapshot

To address the above challenges and dramatically reducethe overhead related to taking DOM snapshots we use thefollowing approach Instead of calling GenerateMHTML thusgenerating multiple IPC messages to the Renderer specificallydedicated to MHTML serialization we send only one IPCmessage to the Renderer In fact we piggyback the ldquotake DOMsnapshotrdquo message from the Browser onto the input-passing IPCmessage that the Browser already must send to the Rendererto communicate the user input (see Figure 2) To this endwe modify the IPCSend(input) IPC message to also carrya file descriptor parameter fd which is related to a file weexplicitly open to allow the Renderer to save the DOM snapshotIn addition we instrument the input-processing task that wouldnormally only process the user input so that when its Task

2This approach will be useful in the future once the out-of-process-iframesproject is completed and the functionality is turned on by default as discussedlater in the paper

is executed it will first serialize the page into MHTML formatand then simply continue with the regular user input processingas shown on the right side of Figure 2 This guarantees that theDOM snapshot is taken synchronously with the related user inputevent and before any input-driven DOM changes can occur

There is one remaining question how can we serializeboth the main page and all embedded frames considering thatthe original GenerateMHTML code needed to send multipleIPC messages To solve this problem we write new MHTMLserialization code to explicitly traverse the entire frame tree fromthe Render Thread sequentially serialize each frame and savethe entire DOM snapshot into the file previously opened by theBrowser (in Section VIII we discuss how this process could befurther adapted in the future once out-of-process-iframes [41]become enabled by default)

V ALTERNATIVE IMPLEMENTATION USING EXTENSIONS

In this section we discuss whether webshots could becaptured using Chromersquos extension API [7] An extension-based implementation would be appealing because it does notrequire any browser instrumentation and can easily be addedto existing browser releases However we will show that anextension-based implementation of ChromePic is not a viablealternative in practice due to the constraints imposed by thebrowserrsquos extension API itself and to the higher average overheadassociated with webshots produced via the extension In thefollowing we refer to the extension-based version of ChromePicas ChromePicExt

A ChromePicExt Overview

Because our forensic rigor requirement (see Section I) dictatesthat webshots must be taken synchronously with the user inputthe content javascript component of ChromePicExtmust intercept user inputs before any page javascript code Thisgoal can be achieved in two steps (1) by setting the run_atproperty in the extensionrsquos manifest file to document_startand (2) by registering an event listener for user input events(eg mousedown keydown etc) on the window object assoon as the content javascript starts being executedAll ChromePicExtrsquos event listeners are registered with theuseCapture option set to true to guarantee that thecontent javascript will be the first to capture and handlethe event before any page javascript has a chance to receive thesame event In the following we will use cntjs to refer tothe extensionrsquos content javascript

During the interaction between the user and the browser ifa particular event that ChromePicExt is listening on is firedour listener captures it first and passes the event object tothe handler function implemented by the extension At thispoint ChromePicExtrsquos cntjs needs to temporarily stop thepropagation of the event object to any other listeners includinglisteners registered by the web page with which the user isinteracting until a full browsing snapshot is taken This is crucialto ensure that the other event listeners will not have an opportunityto change the appearance of the web page before the screenshotand DOM snapshot are recorded Notice also that because theexecution of javascript code within each renderer process issingle-threaded3 [43] the input event cannot be processed by

3Notice that WebWorkers cannot directly change the DOM

6

any other listener until cntjs ldquoyieldsrdquo Furthermore becausecntjs runs in an isolated world4 the page javascript cannotobserve or interfere with the extensionrsquos event processing

Once a triggering event is received to take a screen-shot of the rendered content cntjs needs to call thecaptureVisibleTab API accessible via the backgroundextension component At the same time cntjs also needsto produce a snapshot of the current pagersquos DOM tree whichcould be achieved for example by asking the backgroundcomponent to call the pageCapturesaveAsMHTML APIOnce both the screenshot and DOM snapshot are taken the eventcan be released so that the browser can propagate it to otherlisteners

Challenges In Chrome the content javascript compo-nent of an extension has limited direct access to the extension APIThe full extension API can be accessed via the backgroundcomponent In the following we refer to the backgroundextension component as bgndjs for short The cntjsand bgndjs components can communicate via messagepassing For instance immediately after a user input is capturedcntjs can use sendMessage and ask bgndjs to callcaptureVisibleTab thus producing a screenshot of thecurrent page with which the user is interacting Unfortunatelythe simple approach described above does not satisfy theforensic rigor requirement for webshots because bgndjsruns in the extension process [6] rather than the rendererprocess where cntjs runs and the screenshot is thereforetaken asynchronously Similarly to take a full DOM snapshotbgndjs can make use of saveAsMHTML but this also causesthe DOM snapshot to be taken asynchronously wrt the user inputIn other words if the cntjs simply captures a user event asksbgndjs to take a webshot (using captureVisibleTaband saveAsMHTML) and then immediately ldquoyieldsrdquo the eventcan be propagated by the browser to other listeners Thereforethere is no guarantee that the page javascript will not changethe page (DOM and rendering) before the webshot is actuallylogged

Possible Solutions One possible approach to make the webshottaking functionality synchronous may be for cntjs to activelywait (eg loop) until bgndjs communicates that the webshotrequest has been processed via a callback function Howeverbecause JavaScript execution within each process is single-threaded5 this would prevent cntjs from yielding to thecallback function because it would need to run in the rendererprocess where cntjs is actively waiting Therefore this wouldstall the renderer process and thus the web page (we haveempirically verified all observations) Another solution could beto ldquosleeprdquo instead of actively waiting for example by leveragingsetTimeout or setInterval However this would notsolve the problem because while cntjs ldquosleepsrdquo it effectivelyldquoyieldsrdquo and the captured user event will trickle down to thenext listeners thus again violating the requirement that webshotsmust be taken synchronously

One may think that cntjs could simply capture an eventobject say e and (1) make a deep copy of the object thuscreating eprime = e (2) cancel the propagation of e to the remaining

4httpsdeveloperchromecomextensionscontent scripts5WebWorkers cannot be used in the scenario we are considering

1 Save shallow DOM snapshot 2 var domSnapshot = documentheadouterHTML + documentbodyouterHTML 3 chromeruntimesendMessage(command save_dom dom domSnapshot) 4 5 Take screenshot 6 var md_time = Datenow() 7 var filename = snapshots+md_time+png 8 var xhr_request = new XMLHttpRequest() 910 chromeruntimesendMessage(command take_screenshot file filename)11 while(true) 12 try 13 xhr_requestopen(GET chromeextensiongetURL(filename) false)14 xhr_requestsend(null) send synchronous request15 break16 catch (err) 17 Synchronous XMLHttpRequest has failed18 19

Fig 4 Simplified cntjs source code

listeners6 (3) wait until the callback from bgndjs indicatesthat the webshot has been taken and (4) re-dispatch the eventby injecting eprime so that the browser will propagate the user eventto the remaining listeners Unfortunately this will cause theisTrusted property of eprime to be set to false thus potentiallypreventing some listeners from correctly processing the event Inaddition the value of isTrusted would allow an attack pageto infer the presence of ChromePicExt and perhaps stop theattack to prevent it from being loggedanalyzed thus violatingthe transparency requirement (see Section I)

B Our Approach

To solve the above problems we proceed as follows Firstwe will focus only on how to synchronously take a screenshotand then discuss how to take a DOM snapshot

Once a message has been sent to bgndjs to ask for ascreenshot to be taken cntjs actively waits for the screenshotto be completed However as mentioned earlier cntjs cannotactively wait for a callback from bgndjs as this wouldbring cntjs to stall Instead what cntjs can do is (1)explicitly choose the name of the file where the screenshotshould be stored (2) pass this information to bgndjs (viasendMessage) and at the same time ask it to concretely startthe screenshot capturing process (3) actively probe the filesystem using a synchronous XMLHttpRequest to the localURL 7 to test whether the screenshot file has been saved (viacaptureVisibleTab)

Figure 4 shows a simplified code snippet that implementsthe approach outlined above The synchronous XMLHttpRequestwill raise an exception if the file does not exist In this casecntjs will try again until the file can be found on disk(or a maximum number of attempts have been exhausted asa safeguard to avoid waiting indefinitely in case of failure atthe extension process side) After cntjs exits the active waitloop the user input event will effectively be ldquoreleasedrdquo andpassed by the browser to the remaining listeners thus allowingthe processing of the event to continue (eg this could triggersome DOM modification by the underlying page javascript)

Unfortunately the approach described above cannot be usedto synchronously take a DOM snapshot using saveAsMHTMLThe reason is that while the call to saveAsMHTML happens

6Using EventstopPropagation()7Notice that this can be enabled in the extensionrsquos manifest file via the

web_accessible_resources parameter

7

(a) (b) (c)

Fig 5 Some of the screenshots captured by ChromePic during an in-the-wild social engineering ldquofake-AV-likerdquo attack on Android

asynchronously via bgndjs which runs within the extensionprocess ultimately saveAsMHTML will delegate the respon-sibility of producing and saving the mhtml representation ofthe DOM to the same Renderer process where cntjs alsoruns within the Render Thread (see Section IV-E) Thereforeif cntjs actively waits for the mhtml file to be saved itwill simply wait indefinitely as the mhtml file cannot beproduced until cntjs ldquoreleases controlrdquo of execution on therendererrsquos main thread One way to avoid this problem is tosimply program cntjs to save the DOM structure as shownat the top of Figure 4 However this is a much more limitedldquoshallowrdquo representation of the DOM compared to what can beobtained with saveAsMHTML because embedded objects (egthe content of images or iframes) are not saved

The cntjs could be extended to produce a result that ismore similar to saveAsMHTML For instance the content ofimages can be accessed by first loading them into a canvas andthen reading the content of the canvas [27] But this is a quitecumbersome and inefficient operation Also while cumbersomeit would be possible to communicate (eg via postMessage)to the cntjs running in the context of the embedded frames8

to produce a DOM snapshot which could then be combined tothe DOM of the main page to produce a more comprehensiveldquodeeprdquo snapshot of the page

It should be apparent by now that the extension-basedapproach to taking synchronous webshots is sort of a ldquohackrdquo inthat it bypasses some of the restrictions imposed by the browseron cntjs and its inability to directly access the extensionAPIs Furthermore screenshots cannot be made fully transparentto the user because every time a screenshot is taken the browservisually indicates that a file is being downloaded (on the bottomof the browser window) In Section VII we will also showthat the extension-based implementation of ChromePic imposesa higher overhead compared to the browser instrumentationapproach described in Section IV Overall this demonstratesthat extensions are not suitable for meeting all of ChromePicrsquosdesign requirements

VI RECONSTRUCTING ATTACKS ON USERS

In this section we report on a number of experiments thatdemonstrate how ChromePic can capture attacks on users andenable their post-mortem reconstruction Specifically we willdiscuss three attacks an in-the-wild social engineering downloadattack on Android a phishing attack and two clickjacking attacksproposed in [1]

8Assuming the all_frames option is set to true in the manifest file

A Social Engineering Download Attack

During our user study (see Section VII-B) we encounteredan in-the-wild social engineering download attack Here is how auser arrived to this attack (1) The user visits wwwgooglecomand searches for ldquowolf of wall streetrdquo (2) after scrolling theresults the user modifies the search terms by adding the letterldquofrdquo to the search string (see Figure 5a) (3) the search enginesuggests ldquowolf of wall street full movierdquo as the top searchsuggestion which is clicked (with a touch screen tap) by theuser (4) the user then clicks on the top search result whichredirects the browser to a site called fmovies[]to (5) as thesite loads with no interaction from the user an advertisementembedded in the page forces the browser to open a new tabwhere a page is loaded from usintellectual-82[]xyz (6) an alertpopup window is immediately shown which warns the user thatthe device is infected by multiple viruses (7) clicking on theOK button makes the alert window disappear but the user nowsees the usintellectual-82[]xyz page (which was previously inthe background) claiming that the Android device is ldquo281DAMAGED because of 4 harmful virusesrdquo (see Figure 5c) andrecommends the user to download an application called ldquoDUCleanerrdquo (8) clicking on a ldquoREPAIR FAST NOWrdquo button theuser is redirected to the Google Play store and specifically toinformation about an app called GO Speed9 (not DU Cleaneras stated on the attack page)

Using ChromePic we were able to record all main steps ofthe attack In fact the screenshots in Figure 5 were all takenby ChromePic and confirm that using the recorded webshotsthe social engineering attack described above can indeed bereconstructed by tracing back the user-browser interactionsincluding tapping on the ldquoREPAIR FAST NOWrdquo button onthe attack page Naturally after the user clicks on this downloadbutton and control is passed to the Google Play app ChromePiccould not follow the next user actions (eg whether the app wasinstalled or not on the device) This is expected as ChromePicis meant to reconstruct all steps of web-based attacks that unfoldwithin the browser The GO Speed app that the user is asked toinstall seems to be benign as it has been installed by a largeuser base (more than 10M users according to Google Play)and an analysis of VirusTotalcom reports no anti-virus labels10After analyzing the DOM snapshots taken by ChromePic wesuspect that the attackers are trying to monetize an advertisementcampaign that pays for every new ldquoreferredrdquo installation ofthe app For instance the attack page contains a link toclickinfo-apps[]xyz and another to trackinglenzmx[]com witha URL query parameter mb_campid=du_cleaner_tier2

9httpsplaygooglecomstoreappsdetailsid=comgtozerozboostamphl=en10sha1 811b367c4901642ae41b4b8f0167eac2d3ac4039

8

(a) (b) (c)

Fig 6 Some of the screenshots captured by ChromePic during a phishing attack (the attack URL was first reported in PhishTank submission 4359181)

After a mouse click the browser is redirected (via HTTP 302redirections) through those two sites to the final marketURL referring to the GO Speed app It is likely that the FakeAV-like advertising tactics employed in this social engineering attackare simply a way to convince more users to install the app and(illicitly) increase revenue in a way similar to how pay-per-installnetworks [20] [46] monetize third-party software installations

There is a small exception to be noted ChromePic did nottake a screenshot of the alert popup window which should havebeen triggered by the user clicking on the OK button to closethe popup The reason is that alert windows are rendered ldquoout ofcontextrdquo wrt to browser tabs and our current implementationof ChromePic does not support taking a snapshot when usersinteract with such alert windows (we plan to add support foralert windows in future releases of ChromePic) However it isworth noting that by analyzing the DOM snapshots taken as theuser interacted with the attack page (at usintellectual-82[]xyz)we were able to also reconstruct the content of the alert popup

WARNING This Google Pixel C is infected with viruses andyour browser is seriously damaged You need to remove virusesand make corrections immediately It is necessary to remove andfix now Donrsquot close this window If you leave you will beat risk

B Phishing Attack

Besides tracing-back the steps followed by users who reachan attack page ChromePic can also assist in understanding howthe user interacted with the attack itself For instance in the caseof phishing attacks our webshots capture a wealth of informationabout what data was leaked by the user To demonstrate thiswe present an example of a recent phishing attack posted onPhishTank (submission 435918111)

After using ChromePic to visit the phishing URL whichimpersonates a Brazilian banking website we simulated theactions of a user who falls for the attack by providing fakeinformation (due to format-checking javascript we had to figureout how to provide fake but syntax-compliant data) Figure 6shows some of the snapshots taken by ChromePic as we interactedwith the attack website Unlike other phishing attacks whichare often limited to stealing the victimrsquos login credentials thisattack is fairly sophisticated as it attempts to reproduce the entirebanking site Once the user logs in (by providing hisher CPF12

code) the site claims the balance of the userrsquos bank account hasbeen hidden (presumably for security purposes) and must be

11httpwwwphishtankcomphish detailphpphish id=435918112httpsenwikipediaorgwikiCadastro de Pessoas FC3ADsicas

recovered As the user clicks on a menu bar link the site requiresthe victim to fill in a set of security codes as shown in Figure 6bNotice that here the attacker is attempting to essentially steal theuserrsquos entire security code card13 By doing so the attackers willsubsequently be able to perform any bank transaction operationwithout being blocked by the real bankrsquos security mechanismsFinally after the user provides the security code card informationthe phishing site also requests the userrsquos telephone number andpassword (Figure 6c) Once this information is provided thesite shows a ldquoloadingrdquo animation that makes the user believehisher data is being verified (not shown in Figure 6 due to spaceconstraints) But at this point the attack has already succeeded

C ClickJacking Attacks

To demonstrate how ChromePic is able to also captureclickjacking attacks we reproduced two attacks described in [1]the Destabilizing Pointer Perception attack and the PeripheralVision attack The (simulated) attacks which we adapted frompublicly available code14 by the authors of [1] are available athttpschromepicgithubioclickjacking 15

Destabilizing Pointer Perception The attack is shown in Figure 7In this attack the user intends to click on a ldquohererdquo hyperlinkHowever as the mouse pointer approaches the link the followingevents occur (1) a fake pointer is drawn that has a left-sidedisplacement error compared to the real pointer (which is hidden)(2) as the user brings the fake pointer on top of the link the realpointer is actually located on the Facebook Like button (3)because the Like button is rendered within a third-party framethe attack javascript cannot hide the mouse pointer at this timetherefore the attack instead draws other random mouse pointersto confuse the user and effectively prevent the user from noticingthat the real mouse pointer is over the Like button (4) as theuser attempts to click on ldquohererdquo the real click actually occurson the Like button thus completing the clickjacking attack

Figure 7 shows the screenshot taken by ChromePic at themouse down event The center of the red circle is the exactlocation where the user input event occurred Notice that thefake mouse pointers are captured by the screenshot includingthe pointer located over ldquohererdquo At the same time the real mousepointer is not captured in the screenshot because it is drawn by

13An English language explanation of how security code cards are used infinancial applications can be found at this link httpswwwinteractivebrokerscomenf=2Fen2Fgeneral2FbingoHelpphp

14httpwh0githubiosafeclick-blastlisthtml15The original attack code is currently broken due to a missing remote file

after analyzing the code we found an easy fix and we were able to recreate theattacks

9

Fig 7 Destabilizing pointer perception clickjacking attack

function distract() var img = documentcreateElement(img) imgclassName = random imgsrc = httpiimgurcomEWmYMN2png imgstyletop = Mathrandom() 160 + 160 + px imgstyleleft = Mathrandom() 160 + 240 + px playareaappendChild(img) var dummy = imgclientHeight imgstyletop = Mathrandom() 160 + 160 + px imgstyleleft = Mathrandom() 160 + 240 + px setTimeout(function () playarearemoveChild(img) RANDOM_MOVE_TIME)

Fig 8 Reconstruction of code for generating fake pointers from ChromePicrsquosDOM snapshots

the OS not rendered by the browser (only the fake pointers arerendered by the browser) Nonetheless the coordinates of thereal input event are recorded in our webshot and it is thereforestraightforward to find the correct location of where the realmouse pointer was located and draw the red circle accordinglyover the screenshot Also by analyzing the DOM snapshotsproduced by ChromePic it is easy to recover the fact that themouse is hidden via CSS (using cursornone) and to alsoget the full source code for the javascript functions that enablethe attack including the creation of fake mouse pointers (seeFigure 8)

Peripheral Vision In this attack the objective is to attract theuserrsquos attention towards an area of the screen that is far fromwhere the mouse clicks actually occur To this end a game issetup as shown in Figure 9 In this game the user needs toclick on the Play button on the bottom left of the screen soto catch the moving L or R blocks within the purple box on theright side of the screen Because the userrsquos attention is drawn tothe right side while the clicks occur on the bottom left the usermay not notice that at some random convenient time the attackermay replace the Play button with a Facebook Like button Ifthe mouse click occurs when the Like button is displayed theclickjacking attack succeeds

Figure 9 shows two screenshots taken at two differentmousedown events In the screenshot on the left the user clickson the real Play button The screenshot on the right shows thatat the second mousedown event the Play button had temporarily(for only one second) been replaced with the Like button whichreceived the userrsquos click As can be seen ChromePic correctlycaptured the two events (the center of the red circle representsthe exact location where the user clicked) Notice that this attackagain has a significant visual component that would be difficultto reconstruct by analyzing only the page DOM and that wewere able to correctly capture it thanks to ChromePicrsquos abilityto take screenshots synchronously with the user inputs

VII PERFORMANCE EVALUATION

In this section we present a set of experiments dedicated tomeasuring the overhead introduced by webshots

Fig 9 Two screenshots that capture the peripheral vision clickjacking attack

A Experimental Setup

Our ChromePic browser is built upon Chromiumrsquos codebaseversion 50026262 Our source code modifications amount toapproximately 2000 lines of C++ and will be available athttpschromepicgithubiochromepic-browser

We evaluate ChromePic on both Android 60 on a GooglePixel-C tablet with an Nvidia X1 quad-core CPU and 3GB ofRAM as well as on two machines running Linux Ubuntu 1404a Dell Optiplex 980 desktop machine with a quad-core IntelCore-i7 processor and 8GB of RAM and a Dell Inspiron 15laptop with a Core-i7 CPU and 8GB of RAM

B User Study Setup

User Study 1 To evaluate the overhead imposed by our codechanges to Chromium we perform a user study involving 22distinct users (with IRB approval) Specifically we compile ourChromePic browser for both Linux and Android and ask thestudy participants to use the devices described earlier for genericInternet browsing activities Users were allowed to freely browseany site of their choosing The only restriction we imposed wasto avoid visiting any website containing personal data such asonline banking sites their Facebook page etc to avoid recordingany sensitive information Each user was asked to perform one ormore browsing sessions on different devices with each sessionlasting approximately 15 minutes Each user completed no morethan two separate browsing sessions per device (a few usersused only the Android and Linux laptop devices and did notbrowse on the desktop Linux machine) Overall we collected363 minutes of browsing activity on the Android tablet from16 different users 346 minutes on the Linux laptop from 15users and 286 minutes on the Linux desktop from 11 users(more than 165 hours of browsing overall) which includedseveral thousands input events per device The users visited morethan 1600 different web pages (ie URLs) on 204 distinct websites (ie different effective second-level domains includinggooglecom youtubecom amazoncom and several other highlypopular sites) producing close to 6000 webshots overall Table Ireports a summary of the data we collected

For this study the browser was setup so that webshots areactive only on randomly selected pages Namely every time theuser navigates to a new page the browser ldquoflips a coinrdquo anddecides if the webshot logging capabilities should be activatedor not (other experiments described later had the webshot logsalways on) The reason for this is that we wanted to measureand compare the time needed by the browser to process inputevents with and without our code changes to demonstrate thatour webshots do not impose any other input processing delay

10

TABLE I DATA COLLECTED DURING User Study 1

Platform Users Browsing time Sites visited Pages visited Pages visited WebShot events(minutes) (webshots on) (webshots off)

Android 16 363 92 480 479 2428Linux laptop 15 346 80 777 746 2145

Linux desktop 11 286 65 369 404 1376Total 22 (unique) 995 204 (unique) 1626 1629 5949

besides the actual time to record the logs We comment on theresults of this experiment in Section VII-C (see also Figure 11)

User Study 2 We also performed a smaller targeted user studyinvolving 4 different users browsing on the Linux laptop device(with webshots always on) In this study we asked the usersto login into sites such as Facebook Gmail Twitter GoogleDrive etc using a ldquotemporaryrdquo account we created only for thisstudy which therefore contains no true personal information Thisexperiment aimed at evaluating ChromePicrsquos overhead duringactivities such as writing emails writing FacebookTwitter postswriting a GoogleDoc text document etc Overall we collected 53minutes of browsing time The experimental results are discussedin Section VII-C

User Study 3 Finally we performed a separate small user studyinvolving 6 users to evaluate the performance of ChromePicExtthe extension-based implementation that attempts to record brows-ing snapshots similar to the webshots recorded by ChromePic(see Section V) We discuss the related results in Section VII-C

C ChromePic Performance Measurements

User Study 1 In Table II we report a breakdown of the overheadmeasurement results performed on browsing traces collectedduring our User Study 1 Specifically we report the 50thpercentile (ie the median) and 98th percentile of the timerequired for taking screenshots ldquodeeprdquo DOM snapshots and forthe total webshots time All numbers are in milliseconds

To better explain how the measurements in Table II areobtained let u(t0) be a user input event that occurs attime t0 which triggers a webshot Also let tsn be thetime at which the screen_taken notification is sent inFigure 3 from the GL module to the Browser IO ThreadNamely this is the time when the screenshot has actuallybeen captured and the user input can be processed (seeSection IV-D) On the other hand let tsc be the time whenthe CopyFromCompositingSurfaceFinished callbackis called We define the screenshot notification time as (tsnminus t0)the screenshot callback time as (tsc minus t0)

Similarly let td be the time at which the DOM snapshothas been saved and the user input can be further processed asdiscussed in Section IV-E (see also Figure 2) and δf be thetime taken to save the DOM snapshot to file using the MHTMLformat The DOM snapshot time with file write is computed as(td minus t0) whereas DOM snapshot time wo file write is equalto (td minus δf minus t0) which therefore excludes the time neededto copy the snapshot to file The reason why we measure thislatter quantity is that with some more engineering effort theDOM snapshot file saving process could be moved to a separateRenderer process thread thus effectively decreasing the overheadimposed by the DOM snapshot logging

The total webshot time is computed as (maxtsn td minus t0)according to the discussion provided in Section IV This time

0 50 100 150 200 250 300

Event Duration (ms) - mouse key press deltas and webshot overheads00

02

04

06

08

10

Mouse click deltas (1278)Key press deltas (1089)Laptop webshots (2117)Tablet webshots (2428)Desktop webshots (1361)

Fig 10 Time needed to take webshots and comparison with mouse-downupand key-downup time deltas (the number of events on which the CDFs arecomputed are in parenthesis)

could be further reduced to (maxtsn (td minus δf ) minus t0) byoffloading the DOM file saving process to a separate Rendererprocess thread (we leave this implementation task to futurereleases of ChromePic)

Figure 10 shows the distribution of the total time needed tolog the webshots on different devices while Table II reports abreakdown of the webshot times into their components (50th- and98th-percentiles) On both Linux devices (laptop and desktop)98 of all webshots are logged in less than 120ms This is avery good result because any latency below 150ms is practicallyunnoticeable to users [47] On Android 98 of webshots arelogged in less than 264ms with a median time of around 78msWhile the 98th-percentile time is higher than our 150ms targetit is still a low overhead that is on the very low end of theldquonoticeablerdquo latency classification provided in [47] Also ourresults indicate that 8202 of all webshots on Android canbe taken in less than 150ms Furthermore notice that the totaloverhead is driven by the DOM snapshot time including savingthe DOM to file rather than the screenshot notification timeFrom Table II we can see that the 98th-percentile of the totalwebshot logging time for Android could be reduced to roughly203ms if file saving was delegated to a separate thread in theRenderer process In addition in this setting 8933 of thewebshots on Android would take less than 150 ms

To further put our results into perspective we also comparedthe time needed to take webshots to the time in between mouse-downup and key-downup events In other words we measurethe time that it takes for a user to lift her finger from the mousebutton or from a key The mouse-downup and key-downup timedeltas are measured on the Linux desktop with webshots turnedoff As we can see from Figure 10 the distribution (CDF) ofwebshot overhead times on the Linux laptop and desktop arealways to the left of the mouse-downup and key-downup timedeltas curves Because we start the webshot log at the downevent this means that in the vast majority of cases when a mouseclick or a key press occurs the webshot will be fully logged

11

TABLE II User Study 1 - PERFORMANCE OVERHEAD (50TH- AND 98TH-PERCENTILE)

Platform Total Total Screenshot Screenshot DOM snapshot time DOM snapshot timewith file write (ms) wo file write (ms) notification time (ms) callback time (ms) with file write (ms) wo file write (ms)

Android 7805 26306 5953 20301 1302 2588 6567 10997 7755 26181 5886 20238Linux laptop 3916 11843 3332 10955 538 2768 3617 7105 3895 11826 3312 10938

Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400

Event Duration (ms) - mouse and key events00

02

04

06

08

10

Webshots enabled (1369)Webshots disabled (1412)

Fig 11 Comparison of ldquonaturalrdquo input event processing time with and withoutwebshots enabled (the number of events on which the CDFs are computed arein parenthesis)

by the time the user raises her finger Also the Android tabletcurve is almost entirely to the left of the mouse-downup andkey-downup curves showing that even on Android the webshotscan be taken efficiently

Another result worth noting is that our screenshot codeoptimizations described in Section IV-D yield a very significantoverhead improvement as can be seen by comparing thenotification time and callback time in Table II

To verify that our webshots do not negatively impact thesubsequent ldquonaturalrdquo input processing times in Figure 11 wealso compare the amount of time taken by the browser to processa user input in two different cases when webshots are disabled(dashed line) and when the input is processed right after awebshot has been logged (solid line) Specifically let t0 be thetime when the Browser sends a user input u to the Renderer andti be the time when the Renderer confirms to the Browser that theinput has been processed (we use Chromiumrsquos LatencyInfoobjects to measure this) Also let tprimei be the ldquoinput processedrdquoconfirmation time related to events that triggered a webshot andδw be the time delta needed to log a webshot The first (dashed)curve measures (ti minus t0) which represents the ldquonaturalrdquo inputprocessing time Similarly the second (solid) curve measures(tprimei minus δw minus t0) which represents the time needed by the browserto process the input after a synchronous webshot has been taken(see Figure 2) As can be seen the two curves are very similarindicating no unexpected delay to the natural input processingtime due to webshot events In other words the webshots donot cause any other delays besides the actual time to take thewebshots δw

User Study 2 As discussed in Section VII-B we separatelymeasured the overhead for user activities on popular web sitessuch as Facebook Twitter Gmail Google Drive etc We recordedthousands of user input events 1910 of which triggered awebshot Of these webshots 50 were processed in less than66ms and 98 took less than 240ms Furthermore 80 of allthe webshot took less than 150ms After closely analyzing the

measurements we found that the slight increase in overheadcompared to User Study 1 was due to DOM snapshots on Gmaildue to how the page is structured (eg Gmail pages embeddeda larger number of iframersquos and had a larger DOM size)Specifically the 98th-percentile for the total webshot time onGmail was around 245ms The times on all other popular sites(Facebook Twitter Google Docs etc) were in line or even lowerthan those obtained in User Study 1 For instance on Facebookthe 98th-percentile was less than 120ms Overall if we excludeGmail from this experiment 98 of the webshots can be takenin 108ms

User Study 3 For comparison purposes we also measuredthe performance of taking screenshots using the extension-based approach discussed in Section V These have been doneon the desktop machine and the results should therefore becompared to the third row of Table II Also notice that inthis experiment we are only considering the screenshot time(as explained in Section V it is not easy to take synchronousldquodeeprdquo DOM snapshots via the extension API) We found that50 of screenshots require at least 140ms and 98 of themrequire 243ms This is in contrast with the 274ms and 2380msrespectively that are required by the browser-based version ofChromePic Furthermore the extension times are much largerthan the total time needed to take a full webshot (including theDOM) on the desktop machine using the instrumented browsersolution (see Table II) This reinforces our conclusion that anextension-based solution is not only cumbersome as discussedin Section V but also much less efficient

D Storage Requirements

Table III shows the storage requirements for archiving thewebshots produced during User Study 1 After a straightforwardcompression process (converting screenshots to JPG and usinglossless compression for DOM snapshots) the webshots takea maximum of 103MB per minute of browsing on the Linuxlaptop Android logs required only 085MBminute of storage and092MBminute on the desktop machine This space requirementscould be further reduced by using lossy compression on theDOM-embedded images for instance by converting them to alow- or medium-quality JPG

Letrsquos now consider a scenario in which ChromePic is deployedin a corporate network setting Assume that in average usersspend half of their working time (4 hoursday) browsing whilethe other half is spent on other tasks (meetings developmentdesign data analysis etc) If we assume 22 business days permonth and 103MB of storage needed per minute of browsing(ie the maximum amount we observed) a single user wouldproduce less than 6GB of webshot logs per month In a corporatenetwork with 1000 users this would result in less than 6TB ofstorage for an entire month of browsing logs for the network or72TB for an entire year of logs Considering that a multi-TBhard drive currently costs only a few hundreds US dollars anentire year of webshot logs could be archived for only a few

12

TABLE III AVERAGE STORAGE REQUIREMENTS (MBSMINUTE)

Platform Uncompressed CompressedScreenshots DOM Screenshots DOM

Android 680 1162 031 054Linux laptop 466 1133 015 088

Linux desktop 231 807 009 083

thousand US dollars In alternative considering that business-grade cloud-based storage services are currently priced at lessthan $003GB per month archiving one entire year worth ofwebshots for the entire corporate network in the cloud wouldcost less than $2200 per month16

VIII DISCUSSION

There exist some corner cases in which it is not possibleto ldquofreezerdquo the state of the DOMrendering immediately aftera user input arrives For instance if a user input arrives whilethe Render Thread is already executing another task such asa long-running javascript program that affects the DOM theprocessing of the user input will have to wait until javascriptterminates and until its own Task is scheduled for execution(see Section IV-E) The net effect is that the DOM snapshotwill reflect the state of the DOM after the already runningjavascript code terminates Notice however that this is also truefor ldquonaturalrdquo input processing Namely the input will apply tothe modified DOM regardless of whether a webshot is taken ornot Therefore our snapshots correctly reflect the state of theDOM at the time when the input becomes effective Similarlybecause screenshots need to wait for the compositor to redrawthe exact instant in time in which the screenshot is taken isdetermined by the ccscheduler (see Section IV-D) If ananimation is in progress on the page it may be possible for thescreenshot to be one (or a very small number of) frame(s) ldquooffrdquowrt the user input Again this also holds for ldquonaturalrdquo inputprocessing (ie even if webshots were disabled) because theinput may become effective after a redraw

In Section IV-E we mentioned that once the out-of-process-iframes (OOPIFs) [41] project is completed and becomes activeby default we will need to slightly adapt our code for takingDOM snapshots In fact we believe that OOPIFs would allowus to further decrease the time needed to take a snapshot Thereason is as follows Assume the user interacts (eg clickson a link) with a page that embeds several iframes (eg todisplay different ads) In the current implementation both themain page and iframes are processed in the same Rendererprocess Therefore the DOM of the main page and all iframeshas to be produced at once synchronously with the input (seeSection IV-E) But with OOPIFs we could produce all thesepartial DOM snapshots in parallel by simply sending a ldquotakeDOM snapshotrdquo IPC message to the main page and all iframesat the same time

Because ChromePic continuously logs user-browser inter-actions and takes screenshots the recorded logs may containsensitive user information To mitigate privacy concerns the logscould be securely stored using methods similar to previouslyproposed approaches [12] [34] Furthermore the solutionsproposed in [28] could also be readily applied to ChromePicrsquosoutput For instance ChromePic could employ a customizable

16Estimated using httpcalculators3amazonawscomindexhtml (with ColdHDD)

whitelist of sites on which webshots should be turned off Tobe more strict ChromePic could be prevented from logging anyevents on pages loaded via HTTPS that have a valid (not self-signed) TLS certificate Furthermore a ldquohelperrdquo application (ora separate browser thread) could be responsible for continuouslygathering and encrypting the browser logs This applicationwould encrypt logs related to different tabs separately usingunique keys that could be stored in a key escrow [8] The keyescrow could be owned by the user or in enterprise environmentsby the machinersquos administrator and the keys released only whena security investigation is called for In addition because eachtab can be stored separately and encrypted with a different keyinvestigators could be selectively given access only to some tabsrather than the entire browsing history The decision on whetherto authorize the decryption of a tab would depend on the specificinvestigation but could for instance be based on the time framein which the attack is suspected to have happened and on thelist of domains that have been visited within the tab which canbe recorded as meta-data and encrypted with a ldquoglobalrdquo keyIn addition audit logs could be protected from tampering byusing existing file system features such as append-only filesand immutability [24] We leave the engineering of this keyescrow-based system to future work

IX RELATED WORK

The analysis of security incidents is often hindered by thelack of necessary logs As mentioned in [21] ldquoit is all too oftenthe case that we tend to lack detailed information just when weneed it the mostrdquo The existing logging functionalities providedby modern operating systems and browsers are often insufficientto precisely reconstruct an attack Below we discuss previousworks that aim to enhance logging and improve the ability toinvestigate security incidents

Enhanced Logging To enable the analysis of security incidentsKornexl et al [19] propose a network ldquoTime Machinerdquo whosegoal is to efficiently record detailed information extracted fromnetwork traffic The purpose of this system is to support forensicanalysis and network troubleshooting To increase efficiency andallow for storing network traffic information for long periods oftime Time Machine only records the first portion of each networkconnection Even with partial recordings [19] demonstrates thatthis approach enables the analysis of security incidents Krishnanet al [21] propose a virtualization-based forensic engine tokeep track and record access to data objects read from diskThe proposed system follows the chain of access operations onthe objects as they are copied into memory and accessed bydifferent processes The output is an audit log that enables thereconstruction of the sequence of data changes Ma et al [25]develop a low cost audit logging system for Windows whichaims to enable accurate attack investigation and significant logreduction

The instrumentation of Chrome has been proposed in thepast in different security contexts For example Bauer et al [5]propose an information-flow tracking system that allows forenforcing fine-grained browser security policies Excision [2] isan instrumentation of Chrome that aims at detecting and blockingthe inclusion of malicious third-party content into web pagesTo this end Excision keeps track of the origin of third-partycontent to be loaded as part of the page

13

Our ChromePic system is different in that it ams to introducefine-grained logging in Chromium to enable the recording andpost-mortem investigation of web-based attacks with particularfocus on attacks on users that have a significant visual component

Record-and-Repaly Systems ReVirtrsquos main goal is to enablewhole-system record-and-reply [11] To this end it uses avirtualization-based approach to log detailed information abouta VMrsquos guest system execution instruction-by-instruction Thisenables deterministic replay of the entire system thus alsoallowing an exact replay of previously recorded intrusions Otherwhole-system record-and-replay engines such as PANDA [10]share similar goals Whole-system record-and-replay is expensiveand difficult to deploy on resource-constrained mobile devices Toobviate these problems Neasbitt et al propose WebCapsule [28]which aims to enable browser-level record-and-replay WebCap-sule is implemented by instrumenting Blink Chromersquos renderingengine Because recording occurs at a higher level comparedto [11] WebCapsule does not allow for fully deterministic replayOn the other hand WebCapsule is portable to multiple platformsincluding mobile devices

ChromePic is different from the above systems in that it doesnot aim to enable replay Rather our system aims to introducevery low overhead and to record enough detailed informationabout the state of the browser to enable an accurate reconstructionof web-based attacks towards users such as social engineeringand phishing

Automated Incident Investigation WebWitness [30] is an incidentinvestigation system that leverages deep packet inspectionto reconstruct the steps followed by users who reach socialengineering or drive-by malware download pages The systemrelies on full network packet traces to performs a (network-based)analysis of both the content of web pages and the way in whichthe content is requested (eg by analyzing referrers and HTTPredirections) and is able to reconstruct the web browsing paththat brought the user to the final attack page Unlike WebWitnesswhich is purely based on an analysis of network traces using aset of heuristics and inference methods ClickMiner [29] aimsto reconstruct the path to an attack page by replaying networktraces into an instrumented browser

BackTracker [18] is a system for automatically reconstructingthe sequence of steps followed by an attacker to compromise amachine Given an initial detection point such as a malicious fileidentified by a security analyst BackTracker traces back processesand files that have a causal relation to the detection pointby leveraging OS-level logs The final result is a dependencygraph that explains what system objects affected (or caused)the presence of the malicious file on disk thus potentiallyrevealing the attackerrsquos entry point into the system Taser [13]and RETRO [17] use OS-level logs and perform forward trackingto identify and recover form intrusions whereas other recentworks [22] [23] [26] have focused on improving accuracy inbackward- and forward-tracking of intrusions and on reducingthe space for OS logs

Our work is different from the systems discussed abovein that ChromePicrsquos main goal is to produce highly efficientfine-grained browser logs In the future these logs could beused to enable and improve the accuracy of automated incidentinvestigation systems

X CONCLUSION

In this paper we presented ChromePic a web browserequipped with a novel forensic engine whose goal is to greatlyenhance the browserrsquos logging capabilities ChromePic enables afine-grained post-mortem reconstruction and trace-back of webattacks without incurring the high overhead of record-and-replaysystems ChromePic works by recording a detailed snapshot ofthe state of a web page including a screenshot of how the pageis rendered and a ldquodeeprdquo DOM snapshot at every significantinteraction between the user and web pages If an attack is latersuspected these fine-grained logs can be processed to reconstructthe attack and trace back the sequence of steps the user followedto reach the attack page

We developed ChromePic by implementing several carefulmodifications and optimizations to the Chromium code base tominimize overhead and make always-on logging practical Usingboth real-world and simulated web attacks we demonstrated thatChromePic can successfully capture and aid the reconstructionof attacks on users Our evaluation included the analysis of anin-the-wild social engineering download attack on Android aphishing attack and two different clickjacking attacks as wellas a user study aimed at accurately measuring the overheadintroduced by our forensic engine The experimental resultsshowed that browsing snapshots can be logged very efficientlymaking snapshot logging events practically unnoticeable to users

ACKNOWLEDGMENT

This material is based in part upon work supported by theNational Science Foundation under grant No CNS-1149051 andby the United States Air Force and Defense Advanced ResearchAgency (DARPA) under Contract No FA8650-15-C-7562Any opinions findings and conclusions or recommendationsexpressed in this material are those of the authors and do notnecessarily reflect the views of the National Science Foundationor DARPA

REFERENCES

[1] D Akhawe W He Z Li R Moazzezi and D Song ldquoClickjackingrevisited A perceptual view of ui securityrdquo in 8th USENIX Workshop onOffensive Technologies (WOOT 14) Aug 2014

[2] S Arshad A Kharraz and W Robertson ldquoInclude me out In-browserdetection of malicious third-party content inclusionsrdquo in Proceedings ofthe 20th International Conference on Financial Cryptography and DataSecurity (FC) 2 2016

[3] L Ballard ldquoNo more deceptive download buttonsrdquo 2016 httpssecuritygoogleblogcom201602no-more-deceptive-download-buttonshtml

[4] A Barth C Jackson and J C Mitchell ldquoRobust defenses for cross-siterequest forgeryrdquo in Proceedings of the 15th ACM Conference on Computerand Communications Security ser CCS rsquo08 2008

[5] L Bauer S Cai L Jia T Passaro M Stroucken and Y Tian ldquoRun-timemonitoring and formal analysis of information flows in Chromiumrdquo inProceedings of the 22nd Annual Network and Distributed System SecuritySymposium Feb 2015

[6] Chrome ldquoBackground pagesrdquo httpsdeveloperchromecomextensionsbackground pages

[7] mdashmdash ldquoExtensionsrdquo httpsdeveloperchromecomextensions[8] D E Denning and D K Branstad ldquoA taxonomy for key escrow encryption

systemsrdquo Commun ACM vol 39 no 3 pp 34ndash40 1996[9] R Dhamija J D Tygar and M Hearst ldquoWhy phishing worksrdquo in

Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems ser CHI rsquo06 2006

14

[10] B Dolan-Gavitt J Hodosh P Hulin T Leek and R Whelan ldquoRepeatablereverse engineering with pandardquo in Proceedings of the 5th ProgramProtection and Reverse Engineering Workshop ser PPREW-5 2015

[11] G W Dunlap S T King S Cinar M A Basrai and P M Chen ldquoRevirtEnabling intrusion analysis through virtual-machine logging and replayrdquoSIGOPS Oper Syst Rev vol 36 no SI Dec 2002

[12] R Geambasu J P John S D Gribble T Kohno and H M LevyldquoKeypad An auditing file system for theft-prone devicesrdquo in Proceedingsof the Sixth Conference on Computer Systems ser EuroSys rsquo11 ACM2011 pp 1ndash16

[13] A Goel K Po K Farhadi Z Li and E de Lara ldquoThe taser intrusionrecovery systemrdquo in Proceedings of the Twentieth ACM Symposium onOperating Systems Principles ser SOSP rsquo05 2005

[14] C Grier L Ballard J Caballero N Chachra C J Dietrich K LevchenkoP Mavrommatis D McCoy A Nappa A Pitsillidis N Provos M ZRafique M A Rajab C Rossow K Thomas V Paxson S Savage andG M Voelker ldquoManufacturing compromise The emergence of exploit-as-a-servicerdquo in ACM Conference on Computer and CommunicationsSecurity ser CCS rsquo12 2012

[15] L-S Huang A Moshchuk H J Wang S Schecter and C JacksonldquoClickjacking Attacks and defensesrdquo in Presented as part of the 21stUSENIX Security Symposium (USENIX Security 12) 2012

[16] S Institute ldquoA multi-level defense against social engineeringrdquo2003 httpswwwsansorgreading-roomwhitepapersengineeringmulti-level-defense-social-engineering-920

[17] T Kim X Wang N Zeldovich and M F Kaashoek ldquoIntrusion recoveryusing selective re-executionrdquo in Proceedings of the 9th USENIX Conferenceon Operating Systems Design and Implementation ser OSDIrsquo10 2010

[18] S T King and P M Chen ldquoBacktracking intrusionsrdquo in Proceedings ofthe Nineteenth ACM Symposium on Operating Systems Principles serSOSP rsquo03 2003

[19] S Kornexl V Paxson H Dreger A Feldmann and R Sommer ldquoBuildinga time machine for efficient recording and retrieval of high-volume networktrafficrdquo in Proceedings of the 5th ACM SIGCOMM Conference on InternetMeasurement ser IMC rsquo05 2005

[20] P Kotzias L Bilge and J Caballero ldquoMeasuring pup prevalence and pupdistribution through pay-per-install servicesrdquo in 25th USENIX SecuritySymposium (USENIX Security 16) Aug 2016

[21] S Krishnan K Z Snow and F Monrose ldquoTrail of bytes Efficient supportfor forensic analysisrdquo in Proceedings of the 17th ACM Conference onComputer and Communications Security ser CCS rsquo10 2010

[22] K H Lee X Zhang and D Xu ldquoHigh accuracy attack provenance viabinary-based execution partitionrdquo in NDSS 2013

[23] mdashmdash ldquoLoggc garbage collecting audit logrdquo in Proceedings of the 2013ACM SIGSAC conference on Computer amp38 communications securityser CCS rsquo13 2013

[24] Linux Man Pages ldquoChattrrdquo httpman7orglinuxman-pagesman1chattr1html

[25] S Ma K H Lee C H Kim J Rhee X Zhang and D Xu ldquoAccuratelow cost and instrumentation-free security audit logging for windowsrdquo inProceedings of the 31st Annual Computer Security Applications Conferenceser ACSAC 2015 2015

[26] S Ma X Zhang and D Xu ldquoProtracer Towards practical provenancetracing by alternating between logging and taintingrdquo in NDSS 2016

[27] Mozilla Developers Network ldquoUsing imagesrdquo httpsdevelopermozillaorgen-USdocsWebAPICanvas APITutorialUsing images

[28] C Neasbitt B Li R Perdisci L Lu K Singh and K Li ldquoWebcapsuleTowards a lightweight forensic engine for web browsersrdquo in Proceedingsof the 22Nd ACM SIGSAC Conference on Computer and CommunicationsSecurity ser CCS rsquo15 2015

[29] C Neasbitt R Perdisci K Li and T Nelms ldquoClickminer Towardsforensic reconstruction of user-browser interactions from network tracesrdquoin Proceedings of the 2014 ACM SIGSAC Conference on Computer andCommunications Security ser CCS rsquo14 2014

[30] T Nelms R Perdisci M Antonakakis and M Ahamad ldquoWebwitnessInvestigating categorizing and mitigating malware download pathsrdquo inProceedings of the 24th USENIX Conference on Security Symposium serSECrsquo15 2015

[31] mdashmdash ldquoTowards measuring and mitigating social engineering softwaredownload attacksrdquo in Proceedings of the 25th USENIX Conference onSecurity Symposium ser SECrsquo16 2016

[32] J Palme A Hopmann and N Shelness ldquoMime encapsulation of aggregatedocuments such as html (mhtml)rdquo 1999 httpstoolsietforghtmlrfc2557

[33] J Saltzer and M Schroeder ldquoThe protection of information in computersystemsrdquo httpwebmiteduSaltzerwwwpublicationsprotection

[34] B Schneier and J Kelsey ldquoSecure audit logs to support computer forensicsrdquoACM Trans Inf Syst Secur vol 2 no 2 pp 159ndash176 May 1999

[35] The Chromium Project ldquoBlick schedulerrdquo httpsdocsgooglecomdocumentd16f RIhZa47uEKOdtTgzWdRU0RFMTQWMpEWyWXIpXUoeditheading=hsrz53flt1rrp

[36] mdashmdash ldquoCompositor thread architecturerdquo httpswwwchromiumorgdevelopersdesign-documentscompositor-thread-architecture

[37] mdashmdash ldquoGPU accelerated compositing in Chromerdquohttpswwwchromiumorgdevelopersdesign-documentsgpu-accelerated-compositing-in-chrome

[38] mdashmdash ldquoHow chromium displays web pagesrdquo httpswwwchromiumorgdevelopersdesign-documentsdisplaying-a-web-page-in-chrome

[39] mdashmdash ldquoInter-process communicationrdquo httpswwwchromiumorgdevelopersdesign-documentsinter-process-communication

[40] mdashmdash ldquoMulti-process architecturerdquo httpswwwchromiumorgdevelopersdesign-documentsmulti-process-architecture

[41] mdashmdash ldquoOut-of-process iframesrdquo httpwwwchromiumorgdevelopersdesign-documentsoop-iframes

[42] mdashmdash ldquoProposal for frame capture con-tent APIrdquo httpsdocsgooglecomdocumentd1gRndVmVn7gWJ-rbIHaaOMNsCjSIBn4CAJbZuwLM2ROEedit

[43] mdashmdash ldquoThe rendering critical pathrdquo httpswwwchromiumorgdevelopersthe-rendering-critical-path

[44] mdashmdash ldquoScheduling js timer executionrdquo httpsdocsgooglecomdocumentd163ow-1wjd6L0rAN3V U6t12eqVkq4mXDDjVaA4OuvCAedit

[45] mdashmdash ldquoThreadingrdquo httpswwwchromiumorgdevelopersdesign-documentsthreading

[46] K Thomas J A E Crespo R Rasti J-M Picod C Phillips M-ADecoste C Sharp F Tirelo A Tofigh M-A Courteau L BallardR Shield N Jagpal M A Rajab P Mavrommatis N ProvosE Bursztein and D McCoy ldquoInvestigating commercial pay-per-installand the distribution of unwanted softwarerdquo in 25th USENIX SecuritySymposium (USENIX Security 16) Aug 2016

[47] N Tolia D G Andersen and M Satyanarayanan ldquoQuantifying interactiveuser experience on thin clientsrdquo Computer vol 39 no 3 pp 46ndash52March 2006

[48] P Vogt F Nentwich N Jovanovic E Kirda C Kruegel and G VignaldquoCross site scripting prevention with dynamic data tainting and staticanalysisrdquo in NDSS vol 2007 2007 p 12

15

Introduction

WebShots

What is a WebShot

Input Events that Trigger a WebShot

Use Cases

System Details

Background

ChromePic Overview

Identifying the Target Renderer Process

Taking Screenshots Efficiently

Taking ``Deep DOM Snapshots Efficiently

Alternative Implementation using Extensions

ChromePicExt Overview

Our Approach

Reconstructing Attacks on Users

Social Engineering Download Attack

Phishing Attack

ClickJacking Attacks

Performance Evaluation

Experimental Setup

User Study Setup

ChromePic Performance Measurements

Storage Requirements

Discussion

Related Work

Conclusion

References

phishing attacks To this end we focus on instrumenting GoogleChromium [40] (the open source project on which the Chromebrowser is based) to efficiently record a browsing snapshot atevery meaningful interaction between the user and the browserFor example every time the user clicks on a page or presses akey (eg Enter) we record the input information (eg mousecoordinates key code etc) and the page URL shown in thebrowser bar Furthermore we take a screenshot of the renderedpage and a ldquodeeprdquo snapshot of the related DOM tree andembedded resources (eg iframes images etc) We refer tothis type of detailed browsing snapshots as webshots Intuitivelysuch rich logs would allow a security team or forensic analyst totravel back in time and effectively reconstruct a userrsquos browsingactions over a desired time window In fact we can consider thescreenshot contained in each webshot as a ldquovideo framerdquo Thesescreenshots can then be stitched back together to reconstructprecisely what the user saw during every significant interactionwith the browser Furthermore each screenshot is associated withthe related full state of the DOM (including embedded objectsand JavaScript source code) recorded at the very same instant intime Namely we record exactly how a specific page DOM wasstructured how it was rendered at the time of a user-browserinteraction and how the user interacted with it thus enabling ananalysis of how the attack was implemented Our ChromePicbrowser addresses the following challenges

bull Forensic Rigor Our main goal (and challenge) is to enablewebshots to be taken synchronously with user-browserinteractions Namely let u(t0) represent a user-browserinteraction u (eg a mouse click or key press) that occursat time t0 Because we aim to prevent any (potentiallymalicious) JavaScript code that listens on u from altering thepage before the webshot is taken our goal is to ldquofreezerdquo theprocessing of u until we take both a screenshot of the pagecurrently displayed by the browser as well as a full DOMsnapshot Only after the snapshot is completed the event uwill be released and processed by the browser The needfor this synchronous snapshots constraint is motivated bythe fact that we intend to prevent any discrepancy betweenwhat is logged in the webshot and what the user saw (andthe DOM of the page heshe interacted with) at the veryinstant of time t0 when the event u occurred

bull Efficiency As attacks cannot be easily predicted ChromePicaims to be always-on and continuously log webshotsThis allows us to record undetected (and unexpected)attacks in accordance with the compromised recordingdesign principle [33] However to make always-on loggingpractical efficiency is critical and logging overhead must bereduced to a minimum In particular because webshots aretaken synchronously with each user input we effectivelyintroduce a processing overhead that increases the naturalprocessing of input events Therefore the challenge we faceis to make sure that no negative effect (eg latency) will beperceived by the user Based on previous studies on human-computer interaction [47] we target a logging time budgetof around 150ms which would make the logging eventspractically unnoticeable to users To this end we implementa number of careful system-level browser modifications andoptimizations which we describe in detail in Section IV

bull Transparency We require webshots to be taken in atransparent way wrt the web pages that are being logged

For instance there should be no easy way for (malicious)javascript code running on a page to detect whether theinteractions (inputs) between the user and the page arebeing logged or not In addition webshots should also betransparent to users in that once they are enabled the usershould not notice any difference in the behavior of thebrowser when webshots are being recorded

In Sections IV and V we discuss why the existing snapshot-taking capabilities currently implemented by Chromium do notsatisfy the above requirements For instance we describe theimplementation of a browser extension that attempts to meet thesame requirements described above using the existing extensionAPI and demonstrate why such a solution is not viable

The reader may notice that because ChromePic continuouslyrecords detailed information about the state of the browserincluding visual screenshots our system may produce numerouslogs some of which may include sensitive information Whileprotecting the security and privacy of the logs recorded byChromePic is outside the scope of this paper it is important tonotice that existing solutions could be used to mitigate theseconcerns For example sensitive URL whitelisting and logencryption using a key escrow as proposed in [28] could also beused in ChromePic We discuss these solutions in more details inSection VIII Also while a typical browsing session may resultin numerous webshots often the changes to a page betweentwo consecutive user-browser interactions are minimal thusresulting in few changes between snapshots This provides anopportunity for storing only the difference between snapshotsIn addition the visual screenshots can be reduced in size usinglossy compression and the overall storage requirements for thelogs of each browsing session could be further reduced usingstandard archiving tools We discuss storage requirements inmore details in Section VII

In summary we make the following contributions

bull We propose ChromePic a web browser equipped with anovel forensic engine that aims to enable the reconstructionand trace-back of web browser attacks especially for attacksthat directly target users and have a significant visualcomponent such as social engineering and phishing

bull We develop ChromePic by implementing careful modifi-cations and optimizations to the Chromium code base tominimize overhead and make always-on logging practicalIn addition we discuss why implementing ChromePic usingexisting facilities such as Chromersquos extension API is nota viable option

bull We demonstrate that ChromePic can successfully captureand aid the reconstruction of attacks on users Specificallywe report the analysis of an in-the-wild social engineeringdownload attack on Android a phishing attack and twodifferent clickjacking attacks

bull We evaluate the efficiency of our solution via a user studyinvolving 22 different users who produced more than 165hours of browsing activities on hundreds of websites Weprovide precise measurements about the overhead introducedby ChromePic on multiple devices including desktop andlaptop Linux systems as well as Android tablet devicesOur results show that the vast majority of webshots can betaken very efficiently making them practically unnoticeableto users

2

II WEBSHOTS


A What is a WebShot












III USE CASES



3




IV SYSTEM DETAILS

A Background





Browser IO Thread

IPCSend(input)

User


input



Browser UI Thread

Send(input)








4

Browser IO Thread

IPCSend(input fd)

take tabscreenshot

User



input


Notify(input fd)



take DOMsnapshot

TakeScreenshot()

Browser UI Thread

Send(input fd)










GL GPU

User


Browser UI Thread




Browser IO Thread


Result



Browser File Thread

Save

TakeScreenshot()







5















6










B Our Approach







7

(a) (b) (c)













8

(a) (b) (c)





B Phishing Attack













9













B User Study Setup



10













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References

II WEBSHOTS


A What is a WebShot












III USE CASES



3




IV SYSTEM DETAILS

A Background





Browser IO Thread

IPCSend(input)

User


input



Browser UI Thread

Send(input)








4

Browser IO Thread

IPCSend(input fd)

take tabscreenshot

User



input


Notify(input fd)



take DOMsnapshot

TakeScreenshot()

Browser UI Thread

Send(input fd)










GL GPU

User


Browser UI Thread




Browser IO Thread


Result



Browser File Thread

Save

TakeScreenshot()







5















6










B Our Approach







7

(a) (b) (c)













8

(a) (b) (c)





B Phishing Attack













9













B User Study Setup



10













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References




IV SYSTEM DETAILS

A Background





Browser IO Thread

IPCSend(input)

User


input



Browser UI Thread

Send(input)








4

Browser IO Thread

IPCSend(input fd)

take tabscreenshot

User



input


Notify(input fd)



take DOMsnapshot

TakeScreenshot()

Browser UI Thread

Send(input fd)










GL GPU

User


Browser UI Thread




Browser IO Thread


Result



Browser File Thread

Save

TakeScreenshot()







5















6










B Our Approach







7

(a) (b) (c)













8

(a) (b) (c)





B Phishing Attack













9













B User Study Setup



10













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References

Browser IO Thread

IPCSend(input fd)

take tabscreenshot

User



input


Notify(input fd)



take DOMsnapshot

TakeScreenshot()

Browser UI Thread

Send(input fd)










GL GPU

User


Browser UI Thread




Browser IO Thread


Result



Browser File Thread

Save

TakeScreenshot()







5















6










B Our Approach







7

(a) (b) (c)













8

(a) (b) (c)





B Phishing Attack













9













B User Study Setup



10













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References















6










B Our Approach







7

(a) (b) (c)













8

(a) (b) (c)





B Phishing Attack













9













B User Study Setup



10













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References










B Our Approach







7

(a) (b) (c)













8

(a) (b) (c)





B Phishing Attack













9













B User Study Setup



10













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References

(a) (b) (c)













8

(a) (b) (c)





B Phishing Attack













9













B User Study Setup



10













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References

(a) (b) (c)





B Phishing Attack













9













B User Study Setup



10













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References













B User Study Setup



10













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References













0 50 100 150 200 250 300


02

04

06

08

10






11




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References




Linux desktop 2236 9319 1902 7611 274 2380 3895 11804 2211 8586 1874 7053

0 50 100 150 200 250 300 350 400


02

04

06

08

10












12






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References






VIII DISCUSSION






IX RELATED WORK




13







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References







X CONCLUSION



ACKNOWLEDGMENT


REFERENCES










14








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References








































15

Introduction

WebShots

What is a WebShot


Use Cases

System Details

Background

ChromePic Overview






Our Approach



Phishing Attack



Experimental Setup

User Study Setup



Discussion

Related Work

Conclusion

References

Enabling Reconstruction of Attacks on Users via Efﬁcient ... · social engineering attack and...

Documents

Transcript of Enabling Reconstruction of Attacks on Users via Efﬁcient ... · social engineering attack and...