[IEEE 2009 International Conference on Information Engineering and Computer Science - Wuhan, China...

4
This work was supported by The National Natural Science Foundation of China (Grant No.60773093,60873209), The Key Program for Basic Research of Shanghai (Grant No.08JC1411800) , The Ministry of Education and Intel joint research foundation (Grant No.MOE-INTEL-08-11). A New Software Approach to defend against Cache-based Timing Attacks He Yuemei, Guan Haibing*, Chen Kai, Liang Alei School of Electronic, Information and Electrical Engineering Shanghai Jiao Tong University Shanghai, China {heidysjtu, hbguan, kchen, liangalei}@sjtu.edu.cn Abstract—Cache-based timing attacks recover cipher keys by exploiting side channel information leaks which are caused by the implementations of cryptographic algorithms and the data- dependent behavior of cache memory. This kind of attacks has been proved to be effective in experiments and even feasible in practice. A number of software-based mechanisms have been proposed to protect against such attacks, however, most of them only aims at a specific sort of cache-based attacks by altering the implementation of the algorithm. In this paper, we put forward a novel idea with the goal of providing general protection. With the help of dynamic binary translation technique, we create a sandbox where the cryptographic implementations are executed. During the runtime, redundancy instructions can be inserted into the binary code of the cipher routine, and thus the leaked information is skewed and becomes useless to the attackers. The preliminary experimental results indicate that this defending mechanism can provide strong protection against the cache- based timing attacks. Moreover, in the part of conclusion, we discuss that this mechanism can also be effective against other types of cache-based side channel attacks. Keywords - cache-based timing attacks; countermeasure; dynamic binary translation technique; sandbox I. INTRODUCTION Different from traditional attacks on cryptographic systems, side channel attacks exploits the vulnerability of the cryptographic implementations rather than their mathematical models. Cache-based side channel attacks are one kind of such attacks. Using the cache-memory architecture as a side channel, the cache-based side channel attacks are able to retrieve the keys by utilizing the relationship between the leaked information from the cache and the data-dependent table lookups which are employed in the cryptographic implementations. Back in 1992, Kocher predicted that RAM cache hits could produce timing characteristic in the implementations of Blowfish, SEAL, DES and other ciphers if tables in memory were not used identically in every encryption [1]. Later, Page proposed a theoretical model for narrowing the possible values of secret information and pointed out that the cache could be used as a side channel [2]. Since then, cache- based side channel attacks have been carried out in different ways. According to the types of leaked information taken into use, such attacks fall into three categories: Time-driven attacks, which exploit the aggregate execution time over a large number of samples; trace-driven attacks, which analyze individual cache hits and misses to yield information; and access-driven attacks, where the cache accesses are spied by another process. Among these categories of cache-based side channel attacks, the time-driven attacks appear to be much easier to implement as they require less leaked information. Bernstein first put this idea into reality and successfully carried out the attacks on the AES algorithm [3]. He represented a hypothesis that the graph of the execution time as a function of plaintext[i] key[i] looks the same for any key. His work was continued by other researchers. For example, Bonneau proposed a different strategy which aimed at the last round of the AES algorithm rather than the first round. He assumed that the execution time of the AES encryption would cost more if less cache collision occurred during the runtime [4]. In spite of the differences of these strategies, the cache-based timing attacks all depend on the fact that the execution time of an encryption process is directly affected by the number of cache misses and share the goal of narrowing the possible values of the cipher key. As this sort of attacks has been proved to be effective in experiments, and to be worse, they can be carried out in a remote mode [10], they have become serious threats against modern computer systems with cache-memory architecture. In this case, many researchers have been working on the defending mechanisms against the cache-based timing attacks. Some researchers have suggested that the non-deterministic factors which lead to the variability of execution time, such as caches, pipelines and branch predictors, should be eliminated from current hardware architecture [5]. However, it is not realistic to have the manufacturers design new hardware architecture in near future. Other researchers concentrate on the software-based countermeasures. Some of them propose that part of the lookup tables can be removed from the cryptographic implementations or a smaller set of tables should be used [7] [8], but this will bring in performance degradation as the lookup tables are introduced to achieve better performance. Some suggest that dummy code can be used to skew the execution time [6], and in this paper, we adopt this strategy for our defending mechanism. However, rather than directly modifying the source code by adding a piece of dummy computation, we utilize the technique of dynamic 978-1-4244-4994-1/09/$25.00 ©2009 IEEE

Transcript of [IEEE 2009 International Conference on Information Engineering and Computer Science - Wuhan, China...

Page 1: [IEEE 2009 International Conference on Information Engineering and Computer Science - Wuhan, China (2009.12.19-2009.12.20)] 2009 International Conference on Information Engineering

This work was supported by The National Natural Science Foundation of China (Grant No.60773093,60873209), The Key Program for Basic Research of Shanghai (Grant No.08JC1411800) , The Ministry of Education and Intel joint research foundation (Grant No.MOE-INTEL-08-11).

A New Software Approach to defend againstCache-based Timing Attacks

He Yuemei, Guan Haibing*, Chen Kai, Liang Alei School of Electronic, Information and Electrical Engineering

Shanghai Jiao Tong University Shanghai, China

{heidysjtu, hbguan, kchen, liangalei}@sjtu.edu.cn

Abstract—Cache-based timing attacks recover cipher keys by exploiting side channel information leaks which are caused by the implementations of cryptographic algorithms and the data-dependent behavior of cache memory. This kind of attacks has been proved to be effective in experiments and even feasible in practice. A number of software-based mechanisms have been proposed to protect against such attacks, however, most of them only aims at a specific sort of cache-based attacks by altering the implementation of the algorithm. In this paper, we put forward a novel idea with the goal of providing general protection. With the help of dynamic binary translation technique, we create a sandbox where the cryptographic implementations are executed. During the runtime, redundancy instructions can be inserted into the binary code of the cipher routine, and thus the leaked information is skewed and becomes useless to the attackers. The preliminary experimental results indicate that this defending mechanism can provide strong protection against the cache-based timing attacks. Moreover, in the part of conclusion, we discuss that this mechanism can also be effective against other types of cache-based side channel attacks.

Keywords - cache-based timing attacks; countermeasure; dynamic binary translation technique; sandbox

I. INTRODUCTION

Different from traditional attacks on cryptographic systems, side channel attacks exploits the vulnerability of the cryptographic implementations rather than their mathematical models. Cache-based side channel attacks are one kind of such attacks. Using the cache-memory architecture as a side channel, the cache-based side channel attacks are able to retrieve the keys by utilizing the relationship between the leaked information from the cache and the data-dependent table lookups which are employed in the cryptographic implementations. Back in 1992, Kocher predicted that RAM cache hits could produce timing characteristic in the implementations of Blowfish, SEAL, DES and other ciphers if tables in memory were not used identically in every encryption [1]. Later, Page proposed a theoretical model for narrowing the possible values of secret information and pointed out that the cache could be used as a side channel [2]. Since then, cache-based side channel attacks have been carried out in different ways. According to the types of leaked information taken into use, such attacks fall into three categories: Time-driven attacks, which exploit the aggregate execution time over a large

number of samples; trace-driven attacks, which analyze individual cache hits and misses to yield information; and access-driven attacks, where the cache accesses are spied by another process.

Among these categories of cache-based side channel attacks, the time-driven attacks appear to be much easier to implement as they require less leaked information. Bernstein first put this idea into reality and successfully carried out the attacks on the AES algorithm [3]. He represented a hypothesis that the graph of the execution time as a function of plaintext[i] key[i] looks the same for any key. His work was continued by other researchers. For example, Bonneau proposed a different strategy which aimed at the last round of the AES algorithm rather than the first round. He assumed that the execution time of the AES encryption would cost more if less cache collision occurred during the runtime [4]. In spite of the differences of these strategies, the cache-based timing attacks all depend on the fact that the execution time of an encryption process is directly affected by the number of cache misses and share the goal of narrowing the possible values of the cipher key. As this sort of attacks has been proved to be effective in experiments, and to be worse, they can be carried out in a remote mode [10], they have become serious threats against modern computer systems with cache-memory architecture.

In this case, many researchers have been working on the defending mechanisms against the cache-based timing attacks. Some researchers have suggested that the non-deterministic factors which lead to the variability of execution time, such as caches, pipelines and branch predictors, should be eliminated from current hardware architecture [5]. However, it is not realistic to have the manufacturers design new hardware architecture in near future. Other researchers concentrate on the software-based countermeasures. Some of them propose that part of the lookup tables can be removed from the cryptographic implementations or a smaller set of tables should be used [7] [8], but this will bring in performance degradation as the lookup tables are introduced to achieve better performance. Some suggest that dummy code can be used to skew the execution time [6], and in this paper, we adopt this strategy for our defending mechanism. However, rather than directly modifying the source code by adding a piece of dummy computation, we utilize the technique of dynamic

978-1-4244-4994-1/09/$25.00 ©2009 IEEE

Page 2: [IEEE 2009 International Conference on Information Engineering and Computer Science - Wuhan, China (2009.12.19-2009.12.20)] 2009 International Conference on Information Engineering

binary translation (which translates binary code from one kind of architecture to another at the runtime) to inject redundancy instructions. In other words, we save the work of modifying the implementations and recompiling the source code, and create a sandbox where the cryptographic implementations are safely executed. Although this sandbox cannot stop the attackers from detecting the leaked information, i.e. the execution time, it can skew the execution time though redundancy instructions with the help of dynamic binary translation technique. Moreover, as discussed later in the part of conclusion, the cryptographic implementations running inside the sandbox can also be immune to other kinds of cache-based side channel attacks.

The remainder of this paper is organized as follows. Section 2 introduces relative work on the countermeasures against the cache-based timing attacks. Section 3 describes our idea in detail and section 4 gives the preliminary experimental results. Finally, we draw a conclusion of this paper and discuss our future work.

II. RELATIVE WORK In this section, we introduce several hardware-based and

software-based countermeasures against the cache-based timing attacks.

Though the cache memory architecture is the root cause of leaking secret information, removing the cache from modern architecture seems unacceptable because the cache itself can bring enormous performance acceleration. In this case, Page proposes partitioned cache architecture which can provide hardware assisted defense [11]. The cache is dynamically split into protected regions and can be specifically configured for an application. Based on this idea, partition-locked cache is suggested to achieve the effect of cache partitioning with less performance degradation [12]. And in [5], a precision timed architecture is put forward to achieve the timing-invariance features of hardware. The researchers claim that their mechanisms will completely eliminate the root cause of the problem, but hardware countermeasures cannot be expected to be in place within the near future.

In software-based countermeasures, a fundamental solution seems to be removing the elements which provoke cache activity, i.e., implementing the cryptographic algorithms without any lookup tables. However, this can lead to unacceptable performance degradation. An alternative choice is to reduce the size of the lookup tables or use a smaller set of tables [7] [8], but this only increase the attackers’ workload. Cache warming is another way to avoid cache misses during the runtime [6], but it cannot prevent the attackers from flushing the cache before the encryption routine start. Some researchers suggest that a piece of dummy code can be used to increase the variation of the execution time [6], but this require extra work of modifying the original source code, so we come up with a better solution which provides a sandbox where redundancy instructions can be injected at the runtime with the help of dynamic binary translation technique.

III. ELIMINATION OF CACHE-BASED TIMING ATTACKS

In this section, we present a defending mechanism based on the technique of dynamic binary translation. Before that, we will first discuss the characteristics of cache-based timing attacks, which are the theoretical foundation of our idea.

A. Discussion of cache-based timing attacks As introduced above, the cache-based timing attacks extract

the cipher keys by utilizing the relationship between the execution time and the data-dependent lookup tables. To be specific, for those cryptographic implementations which adopts massive lookup tables to achieve better performance, the table lookup operations (i.e. a series of memory accesses to contiguous space) can take up a large proportion of all the operations, and hence the cache behavior (i.e. cache hit and miss) caused by these table lookups will directly influence the execution time. That is, an encryption process with more number of cache misses can result in longer execution time. In this sense, the execution time can yield the information which indicates the number of cache misses incurred by those table lookup operations for an encryption process. By collecting large number of samples, i.e. the plaintext (or the ciphertext) and the execution time for each cipher, the attackers can be able to deduce the cache runtime behavior according to the relationship between the execution time and the data-dependent lookup tables, and hence narrow the key space by a significant order of magnitude rather than enumerating all the possible key values using the brute-force search method.

B. The defending mechanism Since the cache-based timing attacks depend on the

assumption that the execution time is proportional to the number of cache misses incurred by the table lookup operations, it is obvious that breaking this assumption is an effective way to defend against such attacks. As mentioned before, redundancy code can be used to skew the execution time. That is, the execution time will be partly influenced by the amount of extra code, and hence the relationship between the execution time and the table lookup operations no longer holds. Instead of directly adding a piece of dummy code to the source code, we provide a sandbox where the binary code of the cryptographic implementation is executed and redundancy instructions are injected at the runtime with the help of dynamic binary translation technique. As a result, this sandbox can provide a secure execution environment for the cryptographic implementations.

Dynamic binary translation is a technique that provides cross-platform compatibility. It allows the applications which are compiled for a source ISA (Instruction Set Architecture) to run on a different target platform. When the binary code of an application runs on a dynamic binary translator, its instructions will be first translated into the target instructions, and once a block of instructions are translated, the translated code can be cached in the memory and repeated executed [9]. In our defending mechanism, however, the part of translation is used for redundancy instruction injection rather than its original meaning, assuming that the source ISA is the same as the target ISA. In this sense, the sandbox can be taken as a special type of the dynamic binary translator.

Page 3: [IEEE 2009 International Conference on Information Engineering and Computer Science - Wuhan, China (2009.12.19-2009.12.20)] 2009 International Conference on Information Engineering

Similar to the dynamic binary translator, the sandbox in our defending mechanism is mainly composed of two parts:

• Interpretor, which converts the binary code into the source instructions.

• Translator, which injects redundancy instructions into the target instructions when necessary, and then converts them into the binary code.

As shown in Fig. 1, when the binary code is executed in the sandbox, it is first transformed into the form of instruction by the interpretor. If it is a memory access instruction, the translator will copy the original instruction and inject redundancy instructions into the target instructions; otherwise, the translator will only copy the original instruction. Then the target instructions are converted into the binary code. Once a block of instructions are translated, the translated code will be cached in the memory and then put into execution.

Figure 1. Infrastructure of the sandbox

As the translated code is cached in the memory after its first translation, the part of interpretation and translation can be omitted when the same block of binary code is executed later.

Since the redundancy instructions are executed together with the original instructions, they should be injected following these principles:

• The redundancy instructions should be independent of the original instructions, i.e. the computing results should remain the same with or without the redundancy instructions.

• The extra time cost by the redundancy instructions should be variable for different plaintexts.

In this case, we adopt the following strategy. First of all, a large piece of memory is allocated at the phase of initialization. Then we can inject memory access instructions which only access this preserved memory, and thus the computing results

will not be influenced by the extra code. In order to achieve the time variability, the extra instructions should access different addresses confined in the preserved memory. So we make use of the RDTSC instruction (Read Time-Stamp Counter) as a random number generator, and the memory addresses can be generated according to the random number.

IV. CASE STUDY

In the experiments, we study the vulnerability of the AES algorithm using our defending mechanism. We use the AES implementation in OpenSSL 0.9.7a for analysis. The experiments ran on a 2.80GHz Pentium 4 with 512 KB of L2 cache. We first ran the original AES implementation without any protection using three different cipher keys and collected 2^24 samples respectively. Then we used Bonneu’s attack method [4] to test the vulnerability of AES. As shown in TABLE I, the cipher keys were recovered successfully after at most 2^16 samples were analyzed by the attack program. To make comparison, we repeated the same procedure as described above except that we ran the AES implementation using our defending mechanism. The experimental results showed that the attack program still failed to retrieve the cipher keys after using up all the samples.

TABLE I. EXPERIMENTAL RESULTS

Cipher Key AES (OpenSSL 0.9.7a) AES running in sandbox Key 1 Success (< 2^15) Failure (2^24) Key 2 Success (< 2^15) Failure (2^24) Key 3 Success (< 2^16) Failure (2^24)

As the experimental results indicated, our defending mechanism can provide protection for the cryptographic algorithms against the cache-based timing attacks.

V. CONCLUSION AND FUTURE WORK

In this paper, we study the cache-based timing attacks and demonstrate a new defending mechanism against such attacks. With the help of dynamic binary translation technique, we create a sandbox where the cryptographic implementations are executed and special redundancy instructions can be injected by the sandbox during the runtime. In this way, the execution time which is detected by the attackers can be skewed as a result of the redundancy instructions, and the relationship between the execution time and the table lookup operations performed by the cryptographic implementations is no longer holds, so that the attack strategies are unable to retrieve the cipher keys even with the leaked information.

Apart from that, our defending mechanism can be protective against other kinds of cache-based side channel attacks in theory. As described above, the redundancy instructions are injected after each memory access instruction, which can change the trace of memory accesses during the encryption process and the results of cache behavior can also be skewed. As a result, the secret information leaked from the cache will become useless. We will prove this in experiments as our future work.

Page 4: [IEEE 2009 International Conference on Information Engineering and Computer Science - Wuhan, China (2009.12.19-2009.12.20)] 2009 International Conference on Information Engineering

REFERENCES

[1] Paul C. Kocher, Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS and other Systems. Lecture Notes in Computer Science, Springer, 1996.

[2] D. Page, Theoretical Use of Cache Memory as a Cryptanalytic Side-Channel. Technical Report CSTR-02-003, Department of Computer Science, University of Bristol, June 2002. Available at: http://www.cs.bris.ac.uk/Publications/Papers/1000625.pdf.

[3] Daniel J. Bernstein, Cache-timing Attacks on AES. Available at: http://cr.yp.to/antiforgery/cachetiming-20050414.pdf.

[4] J Bonneau, I Mironov, Cache-collision timing attacks against AES. Workshop on Cryptographic Hardware and Embedded Systems, 2006.

[5] Isaac Liu, David McGrogan, Elimination of Side Channel attacks on a Precision Timed Architecture. Technical Report, 2009. Available at: http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-15.pdf.

[6] D. Page, Defending against cache-based side-channel attacks. Information Security Technical Report, 8(1). ISSN 1363-4127, pp. 30–44. April 2003.

[7] Dag Arne Osvik, Adi Shamir, Eran Tromer, Cache Attacks and Countermeasures: the Case of AES. Topics in Cryptology - CT-RSA 2006, The Cryptographers' Track at the RSA Conference, 2006.

[8] Ernie Brickell , Gary Graunke , Michael Neve , Jean-Pierre Seifert, Software mitigations to hedge AES against cache-based software side channel vulnerabilities. Cryptology ePrint Archive, Report 2006/052, February 2006.

[9] Erik R. Altman, David Kaeli, Yaron Sheffer, Welcome to the Opportunities of Binary Translation. IEEE Computer, vol. 33, no. 3, pp. 40-45, Mar. 2000.

[10] David Brumley, Dan Boneh, Remote Timing Attacks are Practical. Proceedings of the 12th conference on USENIX Security Symposium, Volume 12, 2003.

[11] D. Page, Partitioned cache architecture as a side-channel defence mechanism. Available at: http://eprint.iacr.org/2005/280.

[12] Zhenghong Wang, Ruby B. Lee, New cache designs for thwarting software cache-based side channel attacks. Proceedings of the 34th annual international symposium on Computer architecture, 2007.