478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2...

14
Architectural Protection of Application Privacy against Software and Physical Attacks in Untrusted Cloud Environment Lei Xu , JongHyuk Lee, Seung Hun Kim, Qingji Zheng, Shouhuai Xu , Taeweon Suh, Won Woo Ro, Member, IEEE, and Weidong Shi Abstract—In cloud computing, it is often assumed that cloud vendors are trusted; the guest Operating System (OS) and the Virtual Machine Monitor (VMM, also called Hypervisor) are secure. However, these assumptions are not always true in practice and existing approaches cannot protect the data privacy of applications when none of these parties are trusted. We investigate how to cope with a strong threat model which is that the cloud vendors, the guest OS, or the VMM, or both of them are malicious or untrusted, and can launch attacks against privacy of trusted user applications. This model is relevant because applications may be small enough to be formally verified, while the guest OS and VMM are too complex to be formally verified. Specifically, we present the design and analysis of an architectural solution which integrates a set of components on-chip to protect the memory of trusted applications from potential software and hardware based attacks from untrusted cloud providers, compromised guest OS, or malicious VMM. Full-system performance evaluation results show that the design only incurs 9 percent overhead on average, which is a small performance price that is paid for the substantial security gain. Index Terms—Virtualization, security, architectural support, hypervisor Ç 1 INTRODUCTION C LOUD computing offers affordable, practically unlim- ited storage and computing resources on demand. While cloud computing can substantially lower the cost of maintaining complex IT infrastructures, it also brings a new range of problems. In particular, cloud users are confronted with a more complex software stack, which includes the Virtual Machine Monitor (VMM) or hypervisor, and the guest Operating System (OS). The importance of secure cloud computing has been well recognized (e.g., [1]), and there have been studies on various aspects of cloud security. However, it is often assumed that the cloud vendor is trusted; the guest OS and VMM are secure. These assump- tions are questionable given the discovery of vulnerabilities in the commodity VMM [2], [3], [4], [5], [6] and the fact that OS vulnerabilities often facilitate cyber attacks. In this paper, we cope with the strong threat model, where cloud vendors, the guest OS, and VMM are not necessarily to be trusted. A compromised OS or VMM may launch mali- cious attacks against the trusted applications. Furthermore, an untrusted cloud vendor or a malicious insider may mount hardware based attacks against privacy of hosted applica- tions. The threat model is particularly relevant to cloud com- puting because trusted applications run in the cloud. Our architectural solution protects the privacy of the applications only assumes that the hardware is trusted while the guest OS and VMM are still responsible for resource management. Specifically, we make the following contributions. First, we initiate the study of a strictly stronger threat model in the setting of cloud computing by assuming that both the guest OS and VMM are not trusted. This newly introduced threat model moves a step closer to the ground trust assumption that the defender inevitably has to make. Sec- ond, we report the design, analysis, and full-system simula- tion of our solution for protecting trusted applications from potential software based and hardware based attacks to memory privacy. The solution tackles threats from multiple sources based on a unified framework. Our approach handles the complex scenarios of the cloud based systems that involve cloud vendors, VMM, guest OS, and user applications. Most of the existing hardware oriented solutions for protecting guest virtual machines only address a single challenge. Since our approach strives to deal with multiple challenges and protect cloud outsourced applications with a unified frame- work, it drastically differs from these prior efforts on protect- ing a guest as a whole from either software or physical attacks. For example, approaches described in [7] and [8] only handle software based attacks from compromised hypervi- sors. Approaches that protect virtual machines as a whole against physical attacks (e.g., [9], [10]) do not mitigate threats L. Xu and W. Shi are with the Department of Computer Science, Univer- sity of Houston, TX 77004. E-mail: [email protected], [email protected]. J. H. Lee is with Samsung Electronics, Seoul, Korea. E-mail: [email protected]. S.H. Kim and W.W. Ro are with the School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea. E-mail: [email protected], [email protected]. Q. Zheng and S. Xu are with the University of Texas at San Antonio, San Antonio, TX 78249. E-mail: {qzheng, shxu}@cs.utsa.edu. T. Suh is with the Computer Science and Engineering , Korea University, Seoul, Korea. E-mail: [email protected]. Manuscript received 30 Oct. 2014; revised 7 July 2015; accepted 17 Dec. 2015. Date of publication 23 Dec. 2015; date of current version 6 June 2018. Recommended for acceptance by J. Chen, I. Stojmenovic, and I. Bojanova. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TCC.2015.2511728 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2, APRIL-JUNE 2018 2168-7161 ß 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2...

Page 1: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

Architectural Protection of Application Privacyagainst Software and Physical Attacks

in Untrusted Cloud EnvironmentLei Xu , JongHyuk Lee, Seung Hun Kim, Qingji Zheng, Shouhuai Xu , Taeweon Suh,

Won Woo Ro,Member, IEEE, and Weidong Shi

Abstract—In cloud computing, it is often assumed that cloud vendors are trusted; the guest Operating System (OS) and the Virtual

Machine Monitor (VMM, also called Hypervisor) are secure. However, these assumptions are not always true in practice and existing

approaches cannot protect the data privacy of applications when none of these parties are trusted. We investigate how to cope with

a strong threat model which is that the cloud vendors, the guest OS, or the VMM, or both of them are malicious or untrusted, and can

launch attacks against privacy of trusted user applications. This model is relevant because applications may be small enough to be

formally verified, while the guest OS and VMM are too complex to be formally verified. Specifically, we present the design and analysis

of an architectural solution which integrates a set of components on-chip to protect the memory of trusted applications from potential

software and hardware based attacks from untrusted cloud providers, compromised guest OS, or malicious VMM. Full-system

performance evaluation results show that the design only incurs 9 percent overhead on average, which is a small performance price

that is paid for the substantial security gain.

Index Terms—Virtualization, security, architectural support, hypervisor

Ç

1 INTRODUCTION

CLOUD computing offers affordable, practically unlim-ited storage and computing resources on demand.

While cloud computing can substantially lower the cost ofmaintaining complex IT infrastructures, it also brings a newrange of problems. In particular, cloud users are confrontedwith a more complex software stack, which includes theVirtual Machine Monitor (VMM) or hypervisor, and theguest Operating System (OS). The importance of securecloud computing has been well recognized (e.g., [1]), andthere have been studies on various aspects of cloud security.However, it is often assumed that the cloud vendor istrusted; the guest OS and VMM are secure. These assump-tions are questionable given the discovery of vulnerabilitiesin the commodity VMM [2], [3], [4], [5], [6] and the fact thatOS vulnerabilities often facilitate cyber attacks.

In this paper, we copewith the strong threatmodel, wherecloud vendors, the guest OS, and VMMare not necessarily to

be trusted. A compromised OS or VMM may launch mali-cious attacks against the trusted applications. Furthermore,an untrusted cloud vendor or amalicious insidermaymounthardware based attacks against privacy of hosted applica-tions. The threat model is particularly relevant to cloud com-puting because trusted applications run in the cloud. Ourarchitectural solution protects the privacy of the applicationsonly assumes that the hardware is trusted while the guestOS and VMMare still responsible for resourcemanagement.

Specifically, we make the following contributions. First,we initiate the study of a strictly stronger threat model inthe setting of cloud computing by assuming that both theguest OS and VMM are not trusted. This newly introducedthreat model moves a step closer to the ground trustassumption that the defender inevitably has to make. Sec-ond, we report the design, analysis, and full-system simula-tion of our solution for protecting trusted applications frompotential software based and hardware based attacks tomemory privacy. The solution tackles threats from multiplesources based on a unified framework. Our approach handlesthe complex scenarios of the cloud based systems that involvecloud vendors, VMM, guest OS, and user applications. Mostof the existing hardware oriented solutions for protectingguest virtual machines only address a single challenge. Sinceour approach strives to deal with multiple challenges andprotect cloud outsourced applications with a unified frame-work, it drastically differs from these prior efforts on protect-ing a guest as a whole from either software or physicalattacks. For example, approaches described in [7] and [8] onlyhandle software based attacks from compromised hypervi-sors. Approaches that protect virtual machines as a wholeagainst physical attacks (e.g., [9], [10]) do not mitigate threats

� L. Xu and W. Shi are with the Department of Computer Science, Univer-sity of Houston, TX 77004. E-mail: [email protected], [email protected].

� J. H. Lee is with Samsung Electronics, Seoul, Korea.E-mail: [email protected].

� S.H. Kim and W.W. Ro are with the School of Electrical and ElectronicEngineering, Yonsei University, Seoul, Korea.E-mail: [email protected], [email protected].

� Q. Zheng and S. Xu are with the University of Texas at San Antonio, SanAntonio, TX 78249. E-mail: {qzheng, shxu}@cs.utsa.edu.

� T. Suh is with the Computer Science and Engineering, Korea University, Seoul, Korea. E-mail: [email protected].

Manuscript received 30 Oct. 2014; revised 7 July 2015; accepted 17 Dec. 2015.Date of publication 23 Dec. 2015; date of current version 6 June 2018.Recommended for acceptance by J. Chen, I. Stojmenovic, and I. Bojanova.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TCC.2015.2511728

478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2, APRIL-JUNE 2018

2168-7161� 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

to application privacy when the guest OS is compromised.Experimental results show that our solution only incurs 9 per-cent performance overhead, which is a small price that can bepaid for the substantial security gain.

The rest of the paper is organized as follows. Section 2describes the threat model and solution space. Section 3presents our design. Section 4 reports our performance eval-uation. Section 5 briefly reviews the prior related work.Section 6 concludes the paper.

2 SECURITY MODEL AND GOALS

In this section, we discuss the security model and goals ofthe proposed solution.

2.1 Security Model and Potential Threats

Generally, five parties are involved in the cloud computingscenario: the infrastructure, the VMM, the guest OS, thecloud vendor manages these resources, and the application.In an untrusted environment, the application may face thefollowing threats:

� Untrusted cloud vendors: Cloud vendors cannot be(fully) trusted because they have the incentives toturn off the data protection mechanisms to reducethe workload of cloud servers, or they may gatherthe users’ private information. This concern is notonly relevant to the volunteer clouds [11], [12], neb-ula clouds [13] and social clouds [14], but also rele-vant to the public clouds because of insiders andregulations (e.g., data-collection requests from thegovernmental authorities).

� Limitation of today’s virtualization technology: Stan-dard VMM typically runs a guest OS at a reducedprivilege level (called de-privileged guest), inter-cepts traps from the de-privileged guests, and emu-lates the trapping instruction against virtualmachine (VM) states. This means that the VMM hasaccess to the entire memory space and the states ofthe guest OS. Even in hardware-assisted virtualiza-tion (e.g., [15], [16]), system states (e.g., page tables)are still maintained by the VMM, which keeps ashadow copy of the system states. These expose con-tents of trusted applications to the VMM.

� Insufficient memory protection against physicalattacks: Contents residing in a guest virtualmachine’s

memory space can be eavesdropped using hardware-basedmethods. For example, hardware RAM capturedevices can scan and dump physical memory con-tents [17], while bypassing the guest OS [17].

� Insufficient OS security: The kernel as well as systemservices that run with privilege, are trusted in theOS. OS kernel is often too complex to subject to for-mal verification, and has a large attack surface that isvulnerable to exploits (e.g., [18], [19], [20]). The clas-sic stack smashing attack can open doors for subvert-ing the OS kernel even with advanced protectiontechniques such as the address space layout random-ization. Since OS kernel can access applicationsmemory space, the attackers can take complete con-trol of users’ private and confidential data once theOS kernel is compromised.

To sum up, the security model is as follows:

1) The processor deployed in the cloud computinginfrastructure is fully trusted. They can store secretvalues safely and it is very hard for an attacker totamper the computation process;

2) VMM and guest OS are not trusted. Due to potentialvulnerabilities, an attacker may access the applic-ation’s data without permission;

3) The cloud vendor is not trusted. The cloud comput-ing vendor may also take advantage of high privi-lege to snoop the application’s data even if both theVMM and guest OS are in normal running statuses;

4) The application running on top of guest OS is alsofully trusted. The application is free from back-doors/trojans and it is hard for an attacker to breakinto the application and cause information leakage.

2.2 Solution Space

The importance of protecting trusted applications inuntrusted environments has attracted due attention. Therehave been three types of architectural solutions, which arehighlighted in three respective columns of Table 1. Inthe first type of architectures (e.g., XOMOS [21] andAEGIS [23]), trusted applications are protected by thetrusted hardware, which copes with the untrusted OS.These solutions were proposed before the cloud computingera, and did not consider virtualization and VMM. It isunclear how they should be retrofitted to the environmentswith virtualization. In the second type of architectures (e.g.,

TABLE 1Comparison of Four Paradigms

Paradigm Untrusted OS(e.g., [21])

Trusted Hypervisor(e.g., [22])

Untrusted Hypervisor(e.g., [7], [9], [10])

Ours

Untrusted OS Guest OS VMM Guest OS& VMM& Vendor/Insider

Trusted Hardware Hypervisor, Hardware Hardware HardwareProtected Application Application Entire guest as a whole ApplicationHypervisor No Yes Yes YesCompatibleOblivious MemoryManagement

No NA Partial support,guest only

Yes

Arbitrary SecureMemory Sharing

No No No Yes

XU ET AL.: ARCHITECTURAL PROTECTION OF APPLICATION PRIVACY AGAINST SOFTWARE AND PHYSICAL ATTACKS IN... 479

Page 3: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

Overshadow [22]), the trusted VMM protects trusted appli-cations from the malicious guest OS, while (implicitly)assuming that the hardware is also trusted. In the third typeof architectures (e.g., [7]), the trusted hardware is employedto protect the trusted guest OS and applications as a wholefrom the untrusted VMM.

As highlighted in Table 1, the present paper introduces astronger threat model, where cloud vendors (e.g., insiderattacks, physical memory eavesdrop), the guest OS, andVMM are untrusted. In addition to realizing the protectionof trusted applications from the untrusted computing envi-ronment using a unified approach, our solution offers a real-ization of de-privileged memory management in the untrustedcloud (especially, with untrusted VMM) where the memorymanagement by the VMM and guest OS is performed withcontent blindness. In our design, system resource manage-ment and privacy protection are decoupled and orthogonalto each other. The untrusted guest OS and VMM onlymanage resources without seeing the (plaintext) memorycontents. Moreover, our solution enables unlimited andarbitrary secure memory apertures for access-controlledsharing of memory spaces between the applications, theguest OS, and the VMM. To our knowledge, this is the firstsystem that simultaneously offer these features.

2.3 Security Goals

The security goals of the proposed solution is to protect thedata privacy of the applications in the cloud computing envi-ronment under the security model described in Section 2.1:neither the VMM, the guest OS, nor the cloud vendor canaccess the application’s data without permission.

The proposed solution does not consider side channelattacks against the execution environment. In practice, anyprogress in side channel prevention techniques (e.g., [24],[25]) can be incorporated into the proposed solution toavoid side channel threats. Another security aspect that isnot covered by the proposed solution is denial-of-serviceattacks [26]. Because the VMM and the host OS are responsi-ble for resources management and task schedule and weassume these parties are not trusted in the security model,an adversary can always utilize these two parties to launchdenial-of-service attacks. Denial-of-service prevention tech-niques can be used together with our solution to make thewhole system more secure.

3 SOLUTION DESIGN AND ANALYSIS

In this section, we describe the design of our solution forprotecting application privacy in an untrusted hostingenvironment.

3.1 Design Overview

We propose that processor vendors integrate a set of archi-tectural components on-chip for safeguardingmemory spaceprivacy of the hosted applications or guest kernels againstexploits from the untrusted cloud vendors, compromisedguest operating systems or compromised hypervisors. Theprivacy enhanced trusted processor allows applications orguest kernels to create sealed memory spaces where accesscontrol, memory space cryptography, and memory integrityare enforced. Each sealed memory space has its own context(e.g., keys, states, various identifiers, aka IDS). Fig. 1 shows

Fig. 1. Architectural support for privacy protection in untrusted environment. The processor architecture is extended to encapsulate application’smemory space with privacy safeguard against both software and physical exploits.

480 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2, APRIL-JUNE 2018

Page 4: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

architectural on-chip components. The function of each com-ponent can be summarized as follows.

� In a processor, security extensions are integrated forsupporting special control registers and specialinstructions; details of instructions are shown inTable 3.

� On-chip caches (including L1 and L2) have extendedtags and cache controller is responsible for on-chipmemory access control and policy enforcement. Sec-tion 3.4 presents the operation of privacy controller.

� A sealed memory context buffer (SMCB) storessealed memory contexts of the running guest kernelsand applications; details are shown in the followingsection.

� DRAM privacy controller governs the policy forenforcing off-chip memory access control and alsothe policy to access the guarded memory spaces.

� Cryptography engine is integrated with the systemrequest interface (SRI) for encryption of the sealedmemory spaces and integrity verification of thesealed memory spaces.

� Proposed architecture supports oblivious memorymanagement and various common data sharing sce-narios such as data sharing among user applications,and sharing application memory space with theguest kernel.

The conventional on-chip processor states are expandedto cover these new architectural components, which areelaborated in the following sections.

3.2 Sealed Memory Context

3.2.1 Sealed Context

Support of sealed memory context allows secure migrationof user application’s privacy settings in memory spacebetween trusted processors. The settings cannot be tam-pered by the VMM or guest kernel (when user applicationsare concerned). Once the privacy settings are conveyed tothe trusted hardware, the trusted hardware will enforce theprivacy settings for both the on-chip and off-chip memoryaccesses. This is achieved through a sealed memory contexttable stored off-chip and an on-chip buffer that stores sealedmemory contexts for the currently running guest kernelsand applications. Each sealed memory context contains alist of settings and configurations such as context ID, virtualmachine ID, PID, program counter, sealed memory spacerange, cryptography keys, integrity codes, encrypted pro-cessor states, fine grained memory access setting vector(MASV), etc. Each context is digitally signed by the proces-sor after the context is verified and established by the pro-cessor. A processor hardware vendor (i.e., Intel, AMD) cancreate a public-private key pair ðpkvendor; skvendorÞ, such thatthe private key skvendor is permanently fused into the proces-sor while the public key pkvendor is public. The public key iscertified and signed by the vendor using standard approachsuch as X.509. In the sealed context components, applicationvirtual address is used. The unit size of each region isdecided by a paging mechanism similar to AMD-V NestedPaging for reducing TLB misses.

For a guest virtual machine, it can describe its context aspart of the virtual machine configuration file. VMMs can

read the context and link it to hardware context of the vir-tual machine. When vm_enter is executed, the context willbe automatically processed. For user application, its contextis integrated with the ELF binary format as extensions. Theextended ELF format provides backward compatibilityso unprotected application can run without change. Forinstance, pre-compiled binary can execute in the systemwithout any problem. A cloud user provides encryptedbinary images to a cloud vendor. The images are encryptedoffline by the user before they are uploaded to the cloud.

3.2.2 Context Management

The context ID is created by the trusted processor and pro-tected as part of extended application process context(encrypted and signed by the trusted processor). VMMs orguest operating system cannot alter or tamper the ID of asealed memory context without being detected by thetrusted processor. A processor can either enter into a sealedmemory space explicitly using instruction, sealed_memory_begin, or implicitly via vm_enter (enter a sealed guest vir-tual machine) or syscall int80 (enter a sealed guest kernel).After sealed_memory_begin is executed, execution controltransfers to the first instruction pointed by the programcounter of the sealed memory context. Starting from there,all the instructions within the sealed memory space willbe decrypted using the context’s symmetric code key(encrypted using a vendor’s public key when stored in off-chip sealedmemory context).

For supporting access control, processor states comprisespecific status registers as global settings. Based on the reg-ister values, one can tell if the processor is running in VMMmode, guest kernel mode, or within a sealed memory con-text. Further, the global settings can indicate which guestvirtual machine is running and ID of the application processmemory space. These status registers are set automaticallyby the trusted processor. For example, whenever sealed_-memory_begin, vm_enter, vm_exit, or syscall is executed,the trusted processor will reset the sealed memory contextID register accordingly.

The on-chip SMCB is managed by two instructions,smcb_allocate_row, and smcb_invalidate_row. Instructionsmcb_allocate_row reserves one specific row of the contextbuffer and generates a new context ID that is assigned to acontext. Instruction smcb_invalidate_row de-allocates aSMCB row. When a context is de-allocated from the SMCB,the trusted processor flushes all the cached data belongingto the context. If execution switches to a sealed memory con-text without allocating a SMCB row first, it will trigger anexception because the trusted processor could not retrievematching access settings and cryptography keys from thecontext buffer. To prevent tampering of the saved memorycontexts, a trusted processor encrypts sensitive states suchas values stored in the architect registers, and computes asignature for the stored context to protect the integrity ofthese information.

3.3 Control of Privacy

This is achieved by encrypting the codes and data usingsealed memory space to prevent disclose of private data ofoutsourced application tasks or guest operating systems. In

XU ET AL.: ARCHITECTURAL PROTECTION OF APPLICATION PRIVACY AGAINST SOFTWARE AND PHYSICAL ATTACKS IN... 481

Page 5: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

addition, for a sealed memory space, the executables anddata are encrypted with different keys. For a sealed memoryspace, though the contents are encrypted, the correspondingguest operating system or the VMM can still provide mem-ory management services for the sealed memory spaces. Theapplication page tables are managed by the guest operatingsystem. For supporting virtualization, shadow page tablesare kept and managed by the VMM.When a guest operatingsystem or a VMM reads data from a sealed memory space,encrypted data instead of decrypted data is returned.

Table 2 lists many access scenarios and the format of thedata returned (encrypted or decrypted). For a sealed mem-ory space by an application or guest operating system,memory accesses issued from the owner of a sealed memoryspace always return decrypted data. However, in a modernoperating system, data sharing between an application andthe guest operating system frequently occur (e.g., copy_from_user, copy_to_user). A VMM also needs to sharedata with the guest operating system. For example, a VMMneeds to access the guest virtual machine’s page tables inorder to update its own version of shadow page tables. Forsupporting sharing flexibility and compatibility, the solu-tion assigns an access setting bit to each word memory loca-tion. If the access setting bit is set, then the word location isaccessible by the guest operating system. For a word loca-tion, its access setting bit can be set or cleared by the ownerof the sealed memory space using special instructions,sm_set_os_visible and sm_set_os_invisible. Fora sealed memory space, all the access setting bits are placedtogether as a memory access setting vector. The MASVresides within the sealed memory space of an application orguest operating system. Base address of the MASV isdefined as part of the sealed memory space context.

A user application can share decrypted memory with theguest operating system. To do so, it can call a custom function,malloc_os_allow() that will allocate memory spaceaccessible by the guest operating system. Inside malloc_os_allow(), it goes through every word location of the allocatedmemory and executes instruction sm_set_os_ visible.

Paired with malloc_os_allow(), there is free_os_

allow() that will de-allocate memory space obtained usingmalloc_os_allow(). Inside

free_os_allow(), for each freed word location, it exe-cutes instruction sm_set_os_invisible. Both malloc_

os_allow() and free_os_allow() are functions imple-mented in user application space and statically linked. Theyare encrypted together with the application binary codesand protected with integrity verification. Furthermore, bothsm_set_os_visible and sm_set_os_invisible canonly be executed within a valid sealed memory context.

In the proposed method, the processor have extendedinstructions for the operation. Table 3 lists the semantics ofthe extended instructions.

3.4 Implementation of Access Privacy Control

Protection of sealed memory spaces is enforced at bothon-chip and off-chip levels. As shown in Fig. 2, additionaltag fields and information for global status are needed forthe on-chip level memory access controller. When a datais stored in on-chip caches, the data from a sealed mem-ory space can reside in decrypted format or in encryptedformat depending on the access scenario. A single bitcan be used to indicate whether the cache line storesencrypted or decrypted data (IsDecrypted). Also, addi-tional tag fields include context ID, VM ID, PID, memoryaccess setting bit vectors for all the words located in thecache line, a single bit and another bit indicating whetherthe cache line stores data fetched from a sealed memoryspace (IsSealed).

There are five global status settings (see Fig. 2) and theyare used when a processor reads from or writes to a regularcache (e.g., L1 or L2). In this case, the input tag will bematched with the stored tag and cache hit/miss is deter-mined according to the tag match result only if the IsSealedbit is clear. Otherwise, additional checking will be per-formed using the extended cache line tag fields and fiveglobal status settings. These five global status settingsinclude, (i) VMM mode indicating whether the accessrequest is issued from the VMM; (ii) guest OS mode indicat-ing whether the access request is issued from a guest OSkernel; (iii) VM ID, a unique ID number of the guest virtualmachine that issues the access; (iv) PID indicating the pro-cess ID of the process memory space where the accessrequest is issued; and (v) sealed memory context register(SMCR) indicating the current active sealed memory con-text. The five global status settings and the extended cachetags are sent to a cache hit/miss resolve unit (see Fig. 2)where their values are combined to determine whether theaccess results in a cache hit or cache miss. The cache hit/miss resolve unit makes cache hit/miss decision accordingto logic implementation of policies described in Table 2.Table 4 shows privacy control policies for on-chip memoryaccesses implemented through cache access policies.

Whenever a memory access misses the on-chip caches,the CPU needs to fetch the data from the external memory.During memory fetch, depending on the access scenario, adecision has to be made whether the fetched data should bestored in decrypted format or encrypted format. The mecha-nism of fetching data frommemory is explained in Fig. 3. Asshown in the figure, global status setting bits are transferred

TABLE 2Hardware Assisted Memory Privacy and Access Control

Access types Data returned

Application accesses its own sealedmemory space

Decrypted

Guest OS kernel accesses its ownsealed memory space

Decrypted

Application accesses sealed memoryspace whose owner has the same data key

Decrypted

Application accesses sealed memoryspace whose owner has different data key

Encrypted

OS kernel accesses sealed applicationmemory space

Encrypted

OS kernel accesses sealed application memorylocation with MASV setting indicatingthe location accessible by the guest OS

Decrypted

VMM accesses sealed applicationmemory space

Encrypted

VMM accesses sealed guest kernelmemory space

Encrypted

VMM accesses not sealed guest kernelmemory space

Not encrypted

482 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2, APRIL-JUNE 2018

Page 6: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

to privacy controller. The controller also receives data formSMCB and decides control attributes (i.e., require accesschecking and require decryption) using the privacy controlpolicy for physical memory accesses that is shown inTable 5. The physical address of the data to be fetched isprovided by TLB which is extended with sealed memorycontext buffer index.

MASV computes the address and offset of MASV whichshould be fetched. After that, the cache line is fetched from

the memory according to the control attributes and MASV.If the cache line should be decrypted, the on-chip cryptoengine will decrypt the cache line. Otherwise, encrypteddata will be stored in the cache; as shown in Fig. 3, a multi-plexer which is controlled by the results of access checkingselects proper data between encrypted and decrypted towrite cache line. At the same time, integrity of the sealedmemory space from which the cache line is fetched will beverified. The hardware only sets the valid bit of the cacheline after integrity of the sealed memory space is verified(e.g., [27], [28]).

With the access privacy implementation described above,an application can always prevent the malicious VMM/guest OS from accessing its memory space (by defining pro-pitiate MSAV) because the processor is trusted and respon-sible for the protection mechanism. For a hostile cloudvendor, the illegal memory access attempts will returneither cipher-texts or be blocked by the access privacy con-trol mechanism.

3.5 Necessary VMM Support

In hardware facilitated virtualization, support of virtualCPU is implemented for running instance of unmodifiedguest operating system. States of the virtual CPUs are pre-served or restored during vm_exit or vm_enter. For sup-porting the proposed privacy protection mechanisms andsealed memory spaces, one has to extend the current virtualCPU definition and states. For example, if a guest operating

TABLE 3Instruction Extensions

Instructions Semantic

sealed_memory_begin reg Register reg stores address of sealed memory context (SMC), verify signature of the SMCpointed by the reg, set the corresponding entry of SMCB whose id matches withreg!sealed_memory_id, start execution from reg!pc, set sealed_memory_context_register(SMCR) to reg!sealed_memory_id, reset sealed_memory_status. If error occurs, set sealed_memory_status (e.g., unverified context).

sealed_memory_end Yield from the current context, reset SMCR, store encrypted processor core states toSMCB½SMCR�.context_addr!states (e.g., register state).

sealed_memory_resume reg Resume execution of a context. Similar to sealed_memory_begin except no need to set SMCBentry.

vm_enter (implicit begin) Extended version of vm_enter. If guest OS memory space is sealed, carry out actions similarto sealed_memory_begin.

vm_exit (implicit end) Extended version of vm_exit. If running in sealed mode, carry out actions similar to sealed_memory_end.

int80 (implicit begin and end) Extended version of syscall. If application is in sealed memory mode, exit first with actionssimilar to sealed_memory_begin. Then if the guest OS is in sealed memory mode, applyactions similar to sealed_memory_begin.

smcb_allocate_row reg1, reg2 Allocate an SMCB entry. If SMCB[reg1].valid is set, invalidate the SMCB entry and flush allthe cache lines whose context tags match with SMCB[regs].id. A new context id will be gener-ated by the hardware, SMCB[reg1].id = new id, SMCB[reg1].valid = 1, and reg2!id = SMCB[reg1].id.

smcb_access reg Read the public portions of SMCB context buffer, *reg = public_part(SMCB). Public part ofSMCB includes, valid bit, context id, etc.

smcb_invalidate_row reg If SMCB[reg].valid is set, invalidate the SMCB entry and flush all cache lines whose contexttags match with SMCB[regs].id.

sm_set_os_visible reg Validate context id, set word address at *reg accessible by the guest OS.sm_set_os_invisible reg Validate context id, set word address at *reg inaccessible by the guest OS.sm_set_handler reg Set SMCB[SMCR].handler = PC + reg where reg stores instruction offset location of a handler

for processing sealed memory related access exceptions.sm_set_register_mask reg Validate context id and set register protection mask, sealed_memory_register_mask

(SMRM) = reg. The mask decides which register can pass its value during sealedmemory context switch.

Fig. 2. Privacy controller for on-chip memory accesses. Privacy protec-tion policies are enforced for each on-chip memory accesses.

XU ET AL.: ARCHITECTURAL PROTECTION OF APPLICATION PRIVACY AGAINST SOFTWARE AND PHYSICAL ATTACKS IN... 483

Page 7: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

system is protected as sealed memory space, its hardwarevirtual machine states should include a reference to the dec-laration of sealed memory context. During vm_enter andvm_exit, the associated sealed memory context is automati-cally established or preserved.

In addition, unlike the unvirtualized scenario, resourcesare shared by the multiple guest virtual machines. Forinstance, in a multi-core machine, a sealed memory contextbuffer is likely shared by multiple guest virtual machines(supported by our solution). One guest virtual machineshould not monopoly all the entries of sealed memory

context buffer. Proper sharing of sealed memory contextbuffer can be enforced by the VMM. For example, whensmcb_alloc_row is called by a guest virtual machine, execu-tion will switch to the VMM. The VMM keeps track of howmany sealed memory context buffer entries that one guestvirtual machine has occupied and requests a new entry onbehalf of the guest virtual machine. If a guest machine hasused up its quota allocated by the VMM, the VMM can failsmcb_alloc_row from the guest. This way, sealed memorycontext buffer can be allocated and shared among multipleguest machines. Furthermore, the VMM maintains shadow

TABLE 4Privacy Control Policies for On-Chip Memory Accesses Implemented through Cache Access Policies

Note: G is guest OS and C is cache.

Fig. 3. Privacy controller for off-chip memory accesses.

484 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2, APRIL-JUNE 2018

Page 8: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

page tables for the guest virtual machines. The shadowtables are crucial for translating guest physical addresses tohost machine addresses. In the proposed solution, the TLBand page tables are extended with sealed context id. Conse-quently, the shadow page tables should be extended as well.

Furthermore, because cryptography protection of mem-ory space is implemented in system request interfaceinstead of each individual processor core, the solutionapplies to multi-core or multi-processor based systems. Thedesign can be extended to cover both snoopy bus basedsymmetric multi-processor systems and distributed sharedmemory based systems such as AMD Opeteron or Xeonserver based on QuickPath.

3.6 Security Remarks

Though being able to enhance privacy protection for theoutsourced computing tasks in cloud, the proposed solu-tion is not intended to be a panacea that can solve all theexisting security problems. Our goal is to improve pri-vacy protection in untrusted computing environmentsinstead of claiming that all the security issues can besolved using trusted hardware. Currently, the protectedapplication has to include all the libraries statically. It isassumed that the libraries are trusted and they don’t con-tain backdoors or trojans.

Software based and hardware based tampering of asealed memory space can be detected and prevented due tothe following protection and design measures, (i) integrityof a sealed memory space is enforced using MAC tree andits integrity is verified for every memory access; (ii) newlyupdated cache line is not set valid until successful verifica-tion of a sealed memory space; (iii) access control is appliedfor data cached on-chip using the extended cache line fields(e.g., context id, access vector, VM id, PID); (iv) separatecryptography keys are used for sealed codes and data, andcode key is only used for decryption; (v) processor states(e.g., registers) are encrypted during sealed memory contextswitch.1 Furthermore, for a protected memory space, its

MASV is situated inside the protected memory space.Update to the MASV is performed through special atomicinstruction that is protected.

It is important to point out that our solution does notaffect the OS to carry out the normal I/O operations. Whenan application wants to share data with the I/O services ofthe operating system, it will create an aperture of its sealedmemory space that grants permission to the operating sys-tem for accessing the shared data. In this case, the OS kernelwill access the shared data in plaintext. This way, the appli-cation can access network and disk I/O without problems.Furthermore, the shared memory space apertures can alsosupport DMA transfers.

To prevent tampering of guest file systems and filesused by user’s applications, the files inside a guest virtualmachine should be encrypted. Since the guest operatingsystem is not fully trusted by the applications or may becompromised, decryption and encryption of individualfiles should not be performed by the guest operatingsystem. Once the operating system is compromised viaremote attacks, privacy of the files will no longer beguaranteed. The solution is to encrypt and decrypt out-put/input file data blocks within the sealed memoryspaces of applications. This can be achieved by a userspace file system wrapper library that temporarily storesfile data blocks in sealed memory buffers, see Fig. 4. Out-put data is encrypted before leaving the sealed user mem-ory space and encrypted input data is decrypted insidethe sealed user memory space. For supporting kerneloperations such as copy_from_user or copy_to_user, auser application can open certain memory space to beaccessible by the guest operating system. The user spacefile system wrapper is statically linked with the applica-tion codes, encrypted with the application itself, and pro-tected with digital signature.

4 PERFORMANCE EVALUATION

Evaluation of the designed security enhancements is basedon system emulation and architecture simulation usingcycle based machine models.

TABLE 5Privacy Control Policies for Physical Memory Accesses

Note: G is guest OS and S is sealed memory context buffer.

1. Only exception is during system call and system call return whereparameters are passed via registers.

XU ET AL.: ARCHITECTURAL PROTECTION OF APPLICATION PRIVACY AGAINST SOFTWARE AND PHYSICAL ATTACKS IN... 485

Page 9: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

4.1 Evaluation Methods

Table 6 shows the evaluation environment including theimplementation method. As shown in the table, we performthe evaluation in two ways: Performance measurement andfunctional verification. To evaluate the performance, we usea functional full system emulator Simics [29] combined witha timing simulator FeS2 [30]. Performance measurementrequires the analysis of the hardware component includingsealedmemory contexts, on-chip cache with extended accesscontrol, pipelined crypto and sealed memory space encryp-tion engine, memory space integrity verification, etc. Inorder to tune up our simulation model, we calculate thelatency by using reference RTL implementation. After set-ting up all the parameters, we establish our model on FeS2and study its performance using different benchmark appli-cations. FeS2 can decode x86 instructions into uops and theimplementation of the uops is based on [31]. Also, Simics“magic instructions” are used to trigger and terminate theoperation of FeS2. In addition, for accurate modeling of thememory system, we integrated the simulator with DRAM-Sim2. DRAMSim2 [32] is a cycle accurate open source mem-ory system simulator. It provides a DDR2/3 memory systemmodel and a library mode that can be used with many archi-tectural simulators. Functional verification is performedusing Bochs [33]. Details are shown in the following section.

4.2 Implementations

Cipher and crypto unit. Advanced Encryption Standard (AES)specifies the encryption of electronic data. It has a fixed datablock size of 128 bits and a key size of 128, 192, or 256 bits.Based on a round function, the AES cipher specifies as anumber of repetitions of transformation rounds that convertthe input plaintext into the ciphertext (e.g., 10 cycles of repe-tition for 128 bit key, 12 cycles of repetition for 192 bit key,and 14 cycles of repetition for 256 bit key).

AES is often unrolled with each round pipelined intomultiple pipeline stages (4-7) to achieve high decryption/encryption throughput. The total area of unrolled and pipe-lined AES is about 100 K - 400 K gates to achieve 15 -50 Gbit/sec throughput [34]. We evaluate the pipelined

AES module by using existing verilog RTL implementationand synthesis results from OpenCores [35]. The pipelinedAES takes around 30 cycles to encrypt 16 bytes data. Thedesign can operate at around 330 MHz with cost of around14 K LUT and over 40 Gbps throughput. This encryptionmodule is integrated with the system request interface. Thetotal area cost is 1,000 k gates.

Integrity verification. Integrity verification based onmessage authentication code (MAC) is often a standardoperation. But variation of different MAC approaches canhave significant impact on the verification latency. In thereference implementation, we use a hierarchical messageauthentication code tree. A MAC value is generated usingSHA-256 hash function [36] for each cache line size memoryblock of a virtual machine. All the MACs form one layer ofnodes and are stored linearly. Similarly, a new MAC valuefor the next level in the MAC tree is computed byconcatenating the new MAC line and the secret key of theapplication as the inputs to the SHA-256 function untilthe root MAC is generated. The root MAC is stored insidethe processor once the program enters the trusted environ-ment to avoid any potential tampering of the root node.Whenever the external memory of a cache line is modified,the root is updated through a specific path from the leafnode to itself. The MAC tree is 8-way. The leaf level MAC isstored as part of the L2 cache lines. So only the internalMAC tree nodes are cached by the MAC tree cache. Opera-tion and design details of the MAC tree can be found in therelated work [37], [38]. Performance simulation of the MACtree is based on Verilog implementation of SHA�256, syn-thesized using Synopsys compiler. This design is totallyasynchronous and has a gate count of 19,000 gates. Thelatency for this design is 74ns for 512 bits of padded input(required padding in SHA�256).

Onchip hardware overhead. The onchip hardware resourcesrequired include SMCB, additional cache line tags, TLBextensions with SMCB index, cache access resolve unit andpipelined crypto engines. In Fig. 1, each entry in the SMCBincludes 1-bit valid tag, 128-bit context ID, 6-bit VM ID, 64-bit PID, 64-bit address line, 128-bit data/code key and 128-bit integrity code. Assuming that SMCB has 64 entries insize, the hardware cost is about 50.4 K bits per core. Theoverhead of resolve units can be safely ignored becausethey are very simple combinational logics. For a 64 bytescache line, there are additional 7-bit context ID index, 6-bitPID index, 6-bit VM ID, 16-bit MASV, 1-bit decrypted? tag,and 1-bit sealed? tag. The hardware cost is around 1 k bitswhen the cache size is 2 MB. A state-of-the-art high perfor-mance crypto engine integrated with the SRI costs less than1,000 K gates. For accelerating memory space encryption,one can use a small size nonce cache. The typical size isfrom 32 to 64 KB. The overall onchip hardware cost remains

TABLE 6Evaluation Environment

Purpose

PerformanceMeasurement

FunctionalVerification

Environment Simics with FeS2 BochsSealed Memory DRAMSim2and GEMS Bochs

Fig. 4. Protection of application files.

486 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2, APRIL-JUNE 2018

Page 10: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

small when considering the typical transistor count oftoday’s server processors (e.g., commercial Xeon processorhas over 2.6 billion transistors) and the security benefits.

System emulation and functional evaluation. We useBochs [33] – a full-system open source x86 emulator to eval-uate our design at system and functional level. Bochs mod-els an entire platform including network device, hard drive,VGA, multiple processors, and other devices to support theexecution of a complete OS and its applications. It emulatesx86 instructions. In addition, Bochs supports emulation ofIntel VMX hardware support for virtualization. Bochs canbe extended to emulate new instructions and architecturaldesigns including these described in this paper. Weextended Bochs to emulate the effects of sealed memoryspaces at system level. Our emulation framework emulatesa multi-processor platform. It can support Xen 3.3 and runUbuntu 8.04 Linux distribution as guest OS. For cycle basedperformance simulation, we use Simics and FeS2 as theevaluation environment.

4.3 Benchmarks and Parameters

For performance evaluation, we used the Phoronix bench-mark Test Suite [39], including clamav, diff, gzip, jpython,luindex, snort, sphinx, and xalan. The Phoronix Test Suiteincludes a comprehensive set of applications, covering appli-cation domains of scientific computing, compression, cryp-tography, media encoding, web serving, database queryprocessing, and graphics rendering. We use the test scriptsprovided by Phoronix Test Suite, and detailed descriptionscan be found in [40].

The simulation is performed with a 4-wide out-of-ordersuperscalar processor running at 2GHz and x86 ISA. Theprocessor uses a bimodal branch predictor. It has a 32-entryload/store queue, 128-entry reorder buffer, and non-block-ing caches with 16-entry MSHR. The I-TLB and D-TLB have64 fully associative entries. The L1-instruction and L1-datacaches are 32 KB write-back caches with 64-byte block size,and an access latency of two cycles. The L2 cache is unified,non-blocking, 4 MB size, 16-way associativity, 128-byteblock size, and has an 10-cycle access latency. The

simulation started when the application passed the initiali-zation stage (using Simics checkpoint support). The cyclebased simulation executed each benchmark application forone billion instructions.

4.4 Evaluation Results

We conducted full system simulations for the 15 benchmarkapplications to evaluate the impact of sealed memory spaceprotection (e.g., encryption/decryption and integrity verifi-cation). Also, we measure the performance as the reciprocalof the execution time and normalized the measured valueusing the result of baseline system (i.e., the systemwhich hassame specifications except architectural protection support).

4.4.1 Kernel Mode versus User Mode

We analyzed characteristics of the benchmark applications(i.e., memory accesses, cache miss rate, and copy betweenkernel and user memory spaces). Fig. 5 shows the ratio ofcycles spending in kernel and user modes to the total cyclesof the fifteen benchmark applications. It shows that morethan 73 percent of total cycles are spent in the user space.

Fig. 6 shows the miss rates of L1 and L2 caches. In all thebenchmarks, the average miss rate of L1 cache is 1.1 percent,and that of L2 cache is only 0.1 percent.

Fig. 7 shows the percentage of the time for copying databetween the kernel and the user memory spaces using ker-nel functions during execution to the total time. As indi-cated by the figures, different applications have variousmemory access patterns in kernel and user modes, forexample, luindex spends over 3 percent on copying databetween different memory spaces, while python spendsnearly none. The diversity means that the benchmark appli-cations are appropriate enough to be used for experimentsrelated to memory space.

Fig. 8 shows the ratio of the number of user memoryaccesses by kernel to the total number of memory accessesby kernel. On average, about 6.6 percent of memory accessinstructions are executed by kernel.

Fig. 5. Distribution of kernel and user space execution.

Fig. 6. Miss rate of L1 and L2 caches.

Fig. 7. Percentage of time spent moving data between kernel and usermemory spaces.

Fig. 8. Percentage of user memory accesses by kernel.

XU ET AL.: ARCHITECTURAL PROTECTION OF APPLICATION PRIVACY AGAINST SOFTWARE AND PHYSICAL ATTACKS IN... 487

Page 11: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

4.4.2 Encryption/Decryption and Integrity Check

Evaluation

We evaluated the performance overhead of using sealedmemory spaces for the benchmark applications. The over-head results are shown in Fig. 9. As indicated by the results,the average performance overhead is less than 9 percent.The encryption/decryption overhead is small because thememory encryption/decryption approach employed by ourdesign allows memory encryption/decryption to be over-lapped with memory fetches when possible. However, theoverhead of the fhourstones benchmark is relatively higherthan others (i.e., about 25 percent). The reason is that the L2cache miss rate of the fhourstones benchmark is low asshown in Fig. 6, but the number of L2 cache misses is rela-tively high, which causes encryption/decryption overhead.The effect of L2 cache miss also can be observed in otherapplications. Application such as 7zip, clamav, diff, andsphinx have relatively large overhead compared to applica-tions gcrypt, openssl, and python that have small L2 cachemiss rate. Therefore, it is important to decrease the cachemiss number of the lower-level cache.

Fig. 10 shows the miss rate of MAC tree cache under dif-ferent L2 cache sizes. As indicated by the results, the missrate varies according to the L1 miss rate which is shownin Fig. 6. That is, the miss rate dose not affected by the sizeof L2 cache although the leaf level of MAC is in the L2 cacheline. Above all, observed value of L2 miss rate is less than0.3 percent which is quite small. It is because the MAC treecache used in this experiment can hold a sufficient numberof nodes for the hierarchical message authentication codetree and thus achieves a high hit rate.

4.4.3 Shared Memory Evaluation

We also evaluated the performance overhead of sharedmemory in our solution. For this experiment, we developeda program that creates a sealed memory space as the sharedmemory in the first process and passes a string between thefirst one and the second one that acts as a peer if both of

them are running simultaneously. Fig. 11 shows the ratio oftime with privacy protection to the time without privacyprotection under different shared memory sizes. As indi-cated by the results, even if we share some memory sizeunder any L2 cache size, the performance overhead is lessthan 20 percent. Fig. 12 shows actual performance overheadof the fifteen benchmarks to move data between kernel anduser memory spaces (i.e., using privacy protection. Asshown in Fig. 12, the performance overhead is 3.5 percenton average. Especially, the gzip benchmark shows the high-est performance overhead. It is because the gzip benchmarkmoves larger amount of data (i.e., about twice) than theother benchmarks. Also, the number of user memoryaccesses of kernel that is shown in Fig. 8 is related to perfor-mance overhead. For example, clamav, jpython, and sphinxhave relatively high performance overhead than fhour-stones, gcrypt, and python that have less number of usermemory accesses of kernel.

4.5 Discussion of the Performance

The extra overhead of the proposed solution mainly comefrom the privacy protected memory, i.e., encryption/decryption and integrity related operations for memoryaccess. The performance can be further improved by adopt-ing more efficient AES and MAC (e.g., utilizing Intel’sembedded AES instruction for both encryption/decryptionand integrity protection [41].)

For a hybrid cloud computing infrastructure with bothCPUs and GPUs, the proposed solution also applies as longas the GPU is also trusted. As GPUs usually do not providespecial instruction for AES operation, the extra overheadmay be more oblivious. But data transfers between GPUsand CPUs are not affected as encryption will not increasethe data size.

5 RELATED WORK

Trusted computing base and related research. There is a bodyof literature on trusted computing base (TCB) and the

Fig. 9. Performance of the benchmark applications using memory spaceprotection and integrity check, L2 size=2 MB. The average extra over-heard is around 9 percent.

Fig. 10. Miss rate of MAC tree cache of the benchmark applicationsunder different L2 cache sizes.

Fig. 11. Performance of privacy protection under different shared mem-ory sizes and different L2 cache sizes.

Fig. 12. Performance of moving data between kernel and user memoryspaces, L2 size=2MB. The average extra overheard is around 3:5 percent

488 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2, APRIL-JUNE 2018

Page 12: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

applications of TCB for creating secure systems. TCB andthe related hardware implementations such as TCM (trustedcomputing module) primarily focus on supporting secureboot. It does not and cannot provide confidentiality andintegrity protection for the entire physical memory space.TPM as a peripheral device is a discrete unit and situated inthe peripheral bus. It cannot protect contents of a virtualmachinememory space from being eavesdropped. This limi-tation applies to all the security approaches derived from theconcept of trusted computing base. For example, Flicker [42]concentrates on executing an application security-sensitivecode in isolation from all the other software. Flicker is notdesigned for protecting the complete execution of an entirevirtual machine. Furthermore, Flicker doesn’t mitigate phys-ical eavesdropping attacks to a virtual machine memoryspace. In contrast, our solution protects the privacy andintegrity of the entire virtual machine under both softwareand hardware based memory attacks instead of focusing onsupporting aminimal trusted computing base.

Architectural support for physical RAM privacy. There havebeen many proposals on encrypting the physical memory todefeat hardware attacks against data privacy and integrity(e.g., [21], [23], [27], [43], [44], [45], [46], [47], [48]). These sol-utions don’t tackle the scenarios and challenges of protect-ing virtual machine privacy and integrity in the contexts ofcloud computing. It is not straightforward how to retrofitthese solutions for the multi-tenant and resource sharingcloud environment. Furthermore, none of these solutionshandle the complex scenarios protecting application privacyagainst attacks from compromised hypervisors, cloud ven-dors, and compromised guest operating system.

Architectural support for virtual machine security. Thoughmany secure and trust cloud computing approaches havebeen proposed recently (e.g., [8]), none of those solutionsaddresses the fundamental cause of lack of privacy protec-tions in the cloud, stemming from the facts that the VMMsand the cloud vendors have unrestricted access to thehosted virtual machines. There are architectural solutions toprotect applications and data from powerful softwareattacks [49], [50]. The recent efforts of [7] and [8] try toaddress this issue by means of architectural support. How-ever, these architectural solutions primarily deal with soft-ware based exploits from compromised hypervisors. In [9]and [10], solutions are presented to protect privacy of vir-tual machines as a whole. One important feature that setsour solution apart from these related work is that we pro-posed a unified approach to handle the complex scenariosof the cloud based systems that involve cloud vendors,VMM, guest OS, and user applications. Most of the existinghardware oriented approaches for protecting guest virtualmachines only address a single challenge. Since our solutionstrives to deal with multiple challenges and protect cloudoutsourced applications with a unified framework, it drasti-cally differs from these prior efforts on protecting a guest asa whole from either software or physical attacks. For exam-ple, approaches described in [7] and [8] only handle soft-ware based attacks from compromised hypervisors.Approaches in [9] and [10] do not mitigate threat to applica-tion privacy when the guest OS is compromised.

Cryptography approaches for protection of cloud computingapplications. Different schemes utilizing cryptography have

been proposed to protect applications in the cloud environ-ment. These approaches usually do not need to assume anyparty on the cloud side are trusted. Lei et al. proposed a solu-tion for secure matrix inversion in cloud environment [51],Tysowski and Hasan designed a cloud based secure datasharing scheme [52], and He et al. provided a secure P2Pcloud [53]. These approaches usually focus on special appli-cations and suffer from high computation burden.

6 CONCLUSION

We introduced a strong threat model for secure cloud com-puting, where cloud vendors, the guest OS, and VMM areuntrusted (i.e., insiders, malicious VMM, or compromisedOS) and can launch attacks against privacy of the trustedapplications. We presented the design, implementation, andeva.luation of an architectural solution for protecting privacyand integrity of the trusted applications from both of soft-ware based and hardware based attacks in untrusted com-puting environment. Our cycle-based full-system simulationshows that the solution only incurs a small (9 percent) perfor-mance overhead, which is an affordable price that can bepaid for the substantial security gain. In the near future, weplan to develop hypervisor which can be used with our pro-posed system to support large number of virtual machines.

ACKNOWLEDGMENTS

T. Suh is the corresponding author.

REFERENCES

[1] M. D. Ryan, “Cloud computing privacy concerns on our door-step,” Commun. ACM, vol. 54, no. 1, pp. 36–38, Jan. 2011.

[2] Secunia. Advisory sa37081 - VMware ESX sever uodate for DHCP,kernel, and JRE [Online]. Available: http://secunia.com/advisories/37081/

[3] P. Ferrie, “Attacks on virtual machine emulators,” Symantec Secu-rity Response, vol. 5, pp. 1–13, 2006.

[4] K. Kortchinsky. (2009). Cloudburst – hacking 3D and breaking outof VMware. in Black Hat USA. [Online]. pp. 1–15. Available: http://www.blackhat.com/presentations/bh-usa-09/KORTCHINSKY/BHUSA09-Kortchinsky-Cloudburst-PAPER.pdf

[5] T. Ormandy, “An empirical study into the security exposure tohosts of hostile virtualized environments,” in Proc. CanSecWest,2007, pp. 1–10.

[6] R. Wojtczuk. (2008). Subverting the Xen hypervisor. in Black HatUSA. [Online]. pp. 1–9. Available: https://www.blackhat.com/presentations/bh-usa-08/Wojtczuk/BH_US_08_Wojtczuk_Subverting_the_Xen_Hypervisor.pdf

[7] J. Szefer and R. B. Lee, “Architectural support for hypervisor-secure virtualization,” in Proc. 17th Int. Conf. Archit. Support Pro-gramm. Lang. Oper. Syst., 2012, pp. 437–450.

[8] S. Jin, J. Ahn, S. Cha, and J. Huh, “Architectural support for securevirtualization under a vulnerable hypervisor,” in Proc. 44th Annu.IEEE/ACM Int. Symp. Microarchit., 2011, pp. 272–283.

[9] Y. Xia, Y. Liu, and H. Chen, “Architecture support for guest-trans-parent VM protection from untrusted hypervisor and physicalattacks,” in Proc. Internal Symp. High Perform. Comput. Archit.,2013, pp. 246–257.

[10] Y. Wen, J. Lee, Z. Liu, Q. Zheng, W. Shi, S. Xu, and T. Suh, “Multi-processor architectural support for protecting virtual machine pri-vacy in untrusted cloud environment,” in Proc. Internal Conf. Com-put. Frontiers, 2013, pp. 25:1–25:10.

[11] S. Caton andO. Rana, “Towards autonomic management for cloudservices based upon volunteered resources,” Concurrency and Com-putation: Practice and Experience. Hoboken, NJ, USA:Wiley, 2011.

[12] S. Distefano, V. D. Cunsolo, A. Puliafito, and M. Scarpa,“Cloud@Home: A new enhanced computing paradigm,” in Hand-book of Cloud Computing. New York, NY, USA: Springer.

XU ET AL.: ARCHITECTURAL PROTECTION OF APPLICATION PRIVACY AGAINST SOFTWARE AND PHYSICAL ATTACKS IN... 489

Page 13: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

[13] A. Chandra and J. Weissman, “Nebulas: Using distributed volun-tary resources to build clouds,” in Proc. Conf. Hot Topics CloudComput, 2009, pp. 1–5.

[14] S. Xu and M. Yung, “Socialclouds: Concept, security architectureand some mechanisms,” in Proc. 1st Int. Conf. Trusted Syst., 2010,pp. 104–128.

[15] G. Neiger, A. Santoni, F. Leung, D. Rodgers, and R. Uhlig, “Intel vir-tualization technology: Hardware support for efficient processorvirtualization,” Intel Technol. J., vol. 10, no. 3, pp. 167–177, Aug. 2006.

[16] (2005). Advancedmicro devices. Secure Virtual Machine Architec-ture Reference Manual. [Online]. Available: http://www.amd.com/

[17] N. L. Petroni, Jr., T. Fraser, J. Molina, and W. A. Arbaugh,“Copilot - a coprocessor-based kernel runtime integrity monitor,”in Proc. 13th Conf. USENIX Security Symp., 2004, pp. 1–13.

[18] Z. Cutlip. (2012). Sql injection to mips overflows: Rooting sohorouters [Online]. Available: http://media.blackhat.com/bh-us-12/Briefings/Cutlip/BH_US_12_Cutlip_SQL_Exploitation_WP.pdf

[19] R. Hund, T. Holz, and F. C. Freiling, “Return-oriented rootkits:Bypassing kernel code integrity protection mechanisms,” in Proc.18th Conf. USENIX Security Symp., 2009, pp. 383–398.

[20] M. C. A. Cui and S. J. Stolfo, “When firmware modificationsattack: A case study of embedded exploitation,” in Proc. NDSS,2013, pp. 1–13.

[21] D. Lie, C. A. Thekkath, and M. Horowitz, “Implementing anuntrusted operating system on trusted hardware,” SIGOPS Oper.Syst. Rev., vol. 37, no. 5, pp. 178–192, Oct. 2003.

[22] X. Chen, T. Garfinkel, E. C. Lewis, P. Subrahmanyam, C. A. Wald-spurger, D. Boneh, J. Dwoskin, and D. R. Ports, “Overshadow: Avirtualization-based approach to retrofitting protection in com-modity operating systems,” SIGOPS Oper. Syst. Rev., vol. 42, no. 2,pp. 2–13, Mar. 2008.

[23] G. E. Suh, D. Clarke, B. Gassend, M. van Dijk, and S. Devadas,“Aegis: Architecture for tamper-evident and tamper-resistantprocessing,” in Proc. 17th Annu. Int. Conf. Supercomput., 2003,pp. 160–171.

[24] T. G€uneysu and A. Moradi, “Generic side-channel countermeas-ures for reconfigurable devices,” in Proc. Cryptographic Hardw.Embedded Syst., 2011, pp. 33–48.

[25] M. Godfrey and M. Zulkernine, “Preventing cache-based side-channel attacks in a cloud environment,” IEEE Trans. Cloud Com-put., vol. 2, no. 4, pp. 395–408, Oct./Dec. 2013.

[26] M. Ficco and M. Rak, “Stealthy denial of service strategy in cloudcomputing,” IEEE Trans. Cloud Comput., vol. 3, no. 1, pp. 80–94,Jan.-Mar. 2014.

[27] C. Yan, D. Englender, M. Prvulovic, B. Rogers, and Y. Solihin,“Improving cost, performance, and security of memory encryp-tion and authentication,” in Proc. 33rd Annu. Int. Symp. Comput.Archit., 2006, pp. 179–190.

[28] W. Shi, H.-H. S. Lee, M. Ghosh, and C. Lu, “Architectural supportfor high speed protection of memory integrity and confidentialityin multiprocessor systems,” in Proc. 13th Int. Conf. Parallel Archit.Compilation Techn., 2004, pp. 123–134.

[29] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G.Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner,“Simics: A full system simulation platform,” Computer, vol. 35,no. 2, pp. 50–58, Feb. 2002.

[30] (2007). Fes2: A full-system execution-driven simulator for x86[Online]. Available: http://fes2.cs.uiuc.edu/index.html

[31] M. T. Yourst, “Ptlsim: A cycle accurate full system x86-64 micro-architectural simulator,” in Proc. IEEE Int. Symp. Perform. Anal.Syst. Softw., 2007, pp. 23–34.

[32] P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “Dramsim2: A cycleaccurate memory system simulator,” IEEE Comput. Archit. Lett.,vol. 10, no. 1, pp. 16–19, Jan. 2011.

[33] K. Lawton. (2015).Welcome to the Bochs x86 PC emulation softwarehome page [Online]. Available: http://bochs.sourceforge.net/

[34] (2001, Nov.). Advanced encryption standard [Online]. Available:http://en.wikipedia.org/wiki/Advanced_Encryption_Standard

[35] S. Das. (2010). Pipelined AES OpenCore [Online]. Available:http://opencores.org/project/aes_pipe

[36] (2002). Secure hash standard [Online]. Available: http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf

[37] C. Lu, T. Zhang, W. Shi, and H.-H. S. Lee, “M-tree: A high effi-ciency security architecture for protecting integrity and privacy ofsoftware,” J. Parallel Distrib. Comput., vol. 66, no. 9, pp. 1116–1128,2006.

[38] W. Shi and H.-H. S. Lee, “Authentication control point and itsimplications for secure processor design,” in Proc. 39th Annu.IEEE/ACM Int. Symp. Microarchit., 2006, pp. 103–112.

[39] Phoronix test suite [Online]. Available: http://www.phoronix-test-suite.com/

[40] Open benchmarking [Online]. Available: http://openbenchmarking.org/

[41] E. Ozturk and V. Gopal. (2012, Oct.). Enabling high-performanceGalois-counter-mode on Intel architecture processors. [Online].Available: http://www.intel.com/content/www/us/en/intelligent-systems/network-security/enabling-high-performance-gcm.html

[42] J. M. McCune, B. J. Parno, A. Perrig, M. K. Reiter, and H. Isozaki,“Flicker: An execution infrastructure for TCB minimization,” inProc. 3rd ACM SIGOPS/EuroSys Eur. Conf. Comput. Syst., 2008,pp. 315–328.

[43] G. Suh, C. O’Donnell, and S. Devadas, “Aegis: A single-chipsecure processor,” IEEE Des. Test Comput., vol. 24, no. 6, pp. 570–580, Nov./Dec. 2007.

[44] J. Yang, Y. Zhang, and L. Gao, “Fast secure processor for inhibit-ing software piracy and tampering,” in Proc. 36th Annu. IEEE/ACM Int. Symp. Microarchit., 2003, p. 351.

[45] W. Shi, H.-H. S. Lee, M. Ghosh, C. Lu, and A. Boldyreva, “Highefficiency counter mode security architecture via prediction andprecomputation,” in Proc. 32nd Int. Symp. Comput. Archit., 2005,pp. 14–24.

[46] B. Rogers, S. Chhabra, M. Prvulovic, and Y. Solihin, “Usingaddress independent seed encryption and bonsai Merkle trees tomake secure processors os- and performance-friendly,” in Proc.40th Annu. IEEE/ACM Int. Symp. Microarchit, 2007, pp. 183–196.

[47] J. Valamehr, M. Chase, S. Kamara, A. Putnam, D. Shumow, V.Vaikuntanathan, and T. Sherwood, “Inspection resistant memory:Architectural support for security from physical examination,” inProc. 39th Int. Symp. Comput. Archit., 2012, pp. 130–141.

[48] S. Chhabra, B. Rogers, Y. Solihin, and M. Prvulovic, “Secureme: Ahardware-software approach to full system security,” in Proc. Int.Conf. Supercomput., 2011, pp. 108–119.

[49] J. S. Dwoskin and R. B. Lee, “Hardware-rooted trust for secure keymanagement and transient trust,” in Proc. 14th ACM Conf. Comput.Commun. Security, 2007, pp. 389–400.

[50] R. Lee, P. Kwan, J. McGregor, J. Dwoskin, and Z. Wang,“Architecture for protecting critical secrets in microprocessors,”in Proc. 32nd Int. Symp. Comput. Archit., Jun. 2005, pp. 2–13.

[51] X. Lei, X. Liao, T. Huang, H. Li, and C. Hu, “Outsourcing largematrix inversion computation to a public cloud,” IEEE Trans.Cloud Comput., vol. 1, no. 1, pp. 78–86, 2013.

[52] P. K. Tysowski and M. A. Hasan, “Hybrid attribute-and re-encryption-based key management for secure and scalable mobileapplications in clouds,” IEEE Trans. Cloud Comput., vol. 1, no. 2,pp. 172–186, Jul. 2013.

[53] H. He, R. Li, X. Dong, and Z. Zhang, “Secure, efficient and fine-grained data access control mechanism for p2p storage cloud,”IEEE Trans. Cloud Comput., vol. 2, no. 4, pp. 471–484, Oct.-Dec.2013.

Lei Xu received the BSc degree in applied math-ematics from Hebei University, China, in 2004,and the PhD degree of computer science fromthe Institute of Software, Chinese Academy ofSciences, in 2011. He is currently a postdoctoralresearcher at the University of Houston. From2011 to 2013, he worked as a research engineerat the Central Research Institute, Huawei Tech-nologies Co. Ltd. His research interests includecloud computing and big data security, appliedcryptography, and algebraic algorithms.

490 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2, APRIL-JUNE 2018

Page 14: 478 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 6, NO. 2 ...shxu/Architectural-Protection-of-Application-Priva… · services that run with privilege, are trusted in the OS. OS kernel

JongHyuk Lee received the PhD degree of com-puter science education from Korea Universitywhere he did research in distributed systems. Hewas previously a research professor at KoreaUniversity, and a research scientist at the Univer-sity of Houston. Currently, he is employed as asenior engineer by Samsung Electronics. In thepast, he authored and coauthored publicationscovering research problems in distributed sys-tems, computer architecture & systems, mobilecomputing, p2p computing, grid computing, cloud

computing, computer security, and computer science education.

Seung Hun Kim received the BS and MS degreein electrical and electronic engineering from Yon-sei University, Korea, in 2009 and 2011, respec-tively. He is currently working toward the PhDdegree in the Embedded Systems and ComputerArchitecture Laboratory, School of Electrical andElectronic Engineering, Yonsei University. Hisresearch interests include the transactional mem-ory systems and multi-core architecture.

Qingji Zheng received the PhD degree in com-puter science from the University of Texas atSan Antonio. He is currently a research scientist atFuturewei Technologies in Santa Clara, California.His research interests lie in the security technolo-gies in cloud computing with applied cryptography.

Shouhuai Xu received the PhD degree in com-puter science from Fudan University, China. Heis a professor in the Department of Computer Sci-ence, University of Texas at San Antonio. Hisresearch interests include cyber security model-ing and analysis. He is an associate editor forIEEE Transactions on Dependable and SecureComputing and IEEE Transactions on Informa-tion Forensics and Security. More informationabout his research can be found at www.cs.utsa.edu/�shxu.

Taeweon Suh received the BS degree in electri-cal engineering from the Korea University, Korea,the MS degree in electronics engineering fromthe Seoul National University, Korea, and thePhD degree in computer engineering from theGeorgia Institute of Technology. He is an associ-ate professor in the Graduate School of Informa-tion Security, Korea University. Prior to joiningacademia, he was a systems engineer at IntelCorporation in Hillsboro, Oregon. His researchinterests include embedded systems, computer

architecture, multiprocessor, and virtualization.

Won Woo Ro received the BS degree in electri-cal engineering from Yonsei University, Seoul,Korea, in 1996, and the MS and PhD degrees inelectrical engineering from the University ofSouthern California in 1999 and 2004, respec-tively. He worked as a research scientist in theElectrical Engineering and Computer ScienceDepartment, University of California, Irvine. Hecurrently works as an associate professor inthe School of Electrical and Electronic Engi-neering at Yonsei University. Prior to joining

Yonsei University, he has worked as an assistant professor in theDepartment of Electrical and Computer Engineering at CaliforniaState University, Northridge. His industry experience also includes acollege internship at Apple Computer, Inc., and a contract softwareengineer in ARM, Inc. His current research interests are high-perfor-mance microprocessor design, compiler optimization, and embeddedsystem designs. He is a member of the IEEE.

Weidong Shi received the PhD degree of com-puter science from the Georgia Institute of Tech-nology where he did research in computerarchitecture and computer systems. He was pre-viously a senior research staff engineer at Motor-ola Research Lab, Nokia Research Center, andcofounder of a technology startup. Currently, heis employed as an assistant professor by the Uni-versity of Houston. In the past, he contributed todesign of multiple Nvidia platform products andwas credited to published Electronic Art console

game. In addition, he authored and coauthored over publications cover-ing research problems in computer architecture, computer systems, mul-timedia/graphics systems, mobile computing, and computer security. Hehas multiple issued and pending USPTO patents.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

XU ET AL.: ARCHITECTURAL PROTECTION OF APPLICATION PRIVACY AGAINST SOFTWARE AND PHYSICAL ATTACKS IN... 491