Accelerating OpenSSL* Using Intel® QuickAssist Technology · cryptographic level for SSL/TLS...

4
SOLUTION BRIEF Intel ® QuickAssist Technology Accelerating OpenSSL * Using Intel ® QuickAssist Technology Modifications to open source implementation yield dramatic performance improvement. Wider Use of Encryption Historically, the demand for secure data transmissions over the Internet was driven primarily by institutions conducting e-commerce and banking transactions. Today, the volume of secured communications is skyrocketing, as personal information of all sorts is being encrypted by applications like Gmail * , Twitter * , and Facebook * using the HTTPS protocol. As a result, servers in data centers, telecom networks, and enterprises are expected to handle increasing amounts of traffic using the Secure Sockets Layer (SSL) protocol, increasing compute requirements. Industry forecasts, shown in Figure 1, suggest SSL traffic could increase by as much as 16-fold from 2012 to 2018. 1 Open Source Optimization The OpenSSL project 2 provides an open source implementation of the SSL/TLS 3 protocols and is a commonly deployed library for SSL/TLS world-wide. The SSL/TLS protocols consist of two phases: an initial session- initiation/handshake and a bulk data transfer. Over time, the implementation has been modified to increase performance, including contributions by Intel that speed up the cryptographic workloads in both phases. OpenSSL performance can be further improved with Intel QuickAssist Technology, which is supported by select Intel architecture platforms. Equipment manufacturers who utilize the current version of OpenSSL will benefit from the optimizations already integrated in the release, as well as the straightforward installation process. Not surprisingly, contributors to OpenSSL continue to work to make the implementation more efficient, and one of these efforts is referred to as Asynchronous OpenSSL, which is expected to be delivered as a patch in a development branch (i.e., parallel development project) with the potential of many-fold performance increases. This solution brief discusses two models for increasing the performance of OpenSSL* on Intel® architecture, and is one in a series of five briefs describing how to maximize the benefits from Intel® QuickAssist Technology. Please see the Resources section for links to the series. Figure 1. Projected Growth in SSL Traffic 2012 to 2018 2012 2013 2014 2015 2016 2017 2018 Sandvine GIPR Projection Coyote Point* Projection 45,000 40,000 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 Exabytes per Year Data courtesy of Sandvine* Global Internet Phenomena Report - 2H 2012 SSL Traffic Growth

Transcript of Accelerating OpenSSL* Using Intel® QuickAssist Technology · cryptographic level for SSL/TLS...

Page 1: Accelerating OpenSSL* Using Intel® QuickAssist Technology · cryptographic level for SSL/TLS protocols, which in turn allows for other types of optimizations. Although the concept

SOLUTION BRIEFIntel® QuickAssist Technology

Accelerating OpenSSL* Using Intel® QuickAssist TechnologyModifications to open source implementation yield dramatic performance improvement.

Wider Use of Encryption

Historically, the demand for secure data transmissions over the Internet was driven primarily by institutions conducting e-commerce and banking transactions. Today, the volume of secured communications is skyrocketing, as personal information of all sorts is being encrypted by applications like Gmail*, Twitter*, and Facebook* using the HTTPS protocol. As a result, servers in data centers, telecom networks, and enterprises are expected to handle increasing amounts of traffic using the Secure Sockets Layer (SSL) protocol, increasing compute requirements. Industry forecasts, shown in Figure 1, suggest SSL traffic could increase by as much as 16-fold from 2012 to 2018.1

Open Source Optimization

The OpenSSL project2 provides an open source implementation of the SSL/TLS3 protocols and is a commonly deployed library for SSL/TLS world-wide. The SSL/TLS protocols consist of two phases: an initial session-initiation/handshake and a bulk data transfer.

Over time, the implementation has been modified to increase performance, including contributions by Intel that speed up the cryptographic workloads in both phases. OpenSSL performance can be further improved with Intel QuickAssist Technology, which is supported by select Intel architecture platforms.

Equipment manufacturers who utilize the current version of OpenSSL will benefit from the optimizations already integrated in the release, as well as the straightforward installation process. Not surprisingly, contributors to OpenSSL continue to work to make the implementation more efficient, and one of these efforts is referred to as Asynchronous OpenSSL, which is expected to be delivered as a patch in a development branch (i.e., parallel development project) with the potential of many-fold performance increases.

This solution brief discusses two models for increasing the performance of OpenSSL* on Intel® architecture, and is one in a series of five briefs describing how to maximize the benefits from Intel® QuickAssist Technology. Please see the Resources section for links to the series.

Figure 1. Projected Growth in SSL Traffic 2012 to 2018

2012 2013 2014 2015 2016 2017 2018

Sandvine GIPR Projection Coyote Point* Projection

45,000

40,000

35,000

30,000

25,000

20,000

15,000

10,000

5,000

0

Exab

ytes

per

Yea

r

Data courtesy of Sandvine* Global Internet Phenomena Report - 2H 2012

SSL Traffic Growth

Page 2: Accelerating OpenSSL* Using Intel® QuickAssist Technology · cryptographic level for SSL/TLS protocols, which in turn allows for other types of optimizations. Although the concept

2

Figure 2. OpenSSL* Speed: Asynchronous Versus Synchronous3

Figure 3. OpenSSL* Speed: Asynchronous Versus Synchronous With Two and Four Processes3

Asynchronous OpenSSL*

The standard release of OpenSSL is serial in nature, meaning it handles one connection within one context. From the point of view of cryptographic operations, the release is based on a synchronous/blocking programming model. A major limitation is throughput can be scaled higher only by adding more threads (i.e., processes) to take advantage of core parallelization, but this will also increase context management overhead.

Asynchronous OpenSSL is a non-blocking approach that supports a parallel-processing model at the cryptographic level for SSL/TLS protocols, which in turn allows for other types of optimizations. Although the concept of a non-blocking mode of operation is not new within OpenSSL, it previously had not been applied to cryptographic transforms. This capability allows cryptographic transforms to be processed on dedicated hardware engines or on separate logical cores, thereby allowing the protocol stack and applications to run other tasks unencumbered. Asynchronous operation also enables single-threaded applications to efficiently handle multiple SSL connections because individual flows are not blocked when using hardware acceleration.

Two major benefits of asynchronous OpenSSL are increased single-flow throughput, leading to maximum performance, and fewer contexts, thus reducing context management overhead.

Performance Results

Preliminary test results demonstrate that asynchronous OpenSSL running on an Intel architecture platform with Intel QuickAssist Technology is significantly faster than the standard (synchronous) release. Figure 2 compares the OpenSSL speed for the two versions at various packet sizes running on one process. These measurements are indicative of the throughput during the bulk data transfer phase.

For 64 byte requests, asynchronous operation is about 49 times faster than synchronous, and over 7.5 times faster for 16,384 byte requests.4 When the number of processes is increased to two and four (Figure 3), the difference in performance decreases somewhat, with asynchronous being 20 and 2 times faster than the standard release for 64 byte and 16,384 byte requests, respectively.

Synchronous Asynchronous

Gbp

s

AES-128-CBC-HMAC-SHA1 - 1 process

OpenSSL* Speed: Asynchronous Versus Synchronous

Request Size (Bytes)

60

50

40

30

20

10

0

Higher is better 50.12

25.03

6.85

0.430.01 0.12 0.49 1.83

64 1024 4096 16384

Gbp

s

AES-128-CBC-HMAC-SHA1 - 2 and 4 processes

OpenSSL* Speed: Asynchronous Versus Synchronous

Request Size (Bytes)

60

50

40

30

20

10

0

Higher is better

64 1024 4096 16384

0.01 0.03 0.80 1.56 0.25 0.47

12.6

23.1

0.94 1.88

41.15 42.76

3.697.01

50.09 51.38

Synchronous - 2 processes Synchronous - 4 processes

Asynchronous - 2 processes Asynchronous - 4 processes

Page 3: Accelerating OpenSSL* Using Intel® QuickAssist Technology · cryptographic level for SSL/TLS protocols, which in turn allows for other types of optimizations. Although the concept

3

Figure 4 shows the number of RSA 2028 bit sign operations per second for asynchronous and synchronous OpenSSL. This operation is used during the initial session-initiation/handshake phase. The key finding from this test is asymmetric cryptography produces full throughput with only one process (i.e., context), whereas the synchronous version delivers significantly less throughput with four processes. Consequently, synchronous OpenSSL may require significantly more cores than asynchronous OpenSSL to achieve the same performance.

Figure 4. RSA 2028 Bit Sign Operations: Asynchronous Versus Synchronous3

Simplifying Accelerator Integration

With more and more traffic being encrypted, servers and security appliances will rely more heavily on accelerators to offload cryptography workloads. For this reason, Intel is working with the OpenSSL Software Foundation to optimize its implementation for use with hardware accelerators, such as those provided by Intel QuickAssist Technology. In addition, the findings from this effort will be used to optimize the performance of proprietary SSL/TLS-based solutions running on Intel QuickAssist Technology-enabled platforms.

1

Synchronous Asynchronous

Sign

Ope

rati

ons

Per

Seco

nd

RSA 2028 Bit Sign OperationsAsynchronous Versus Synchronous

Number of Processes

Higher is better

1,001

2 4

4,316

34,93334,99335,714

2,112

40,000

30,000

20,000

10,000

0

Page 4: Accelerating OpenSSL* Using Intel® QuickAssist Technology · cryptographic level for SSL/TLS protocols, which in turn allows for other types of optimizations. Although the concept

4

For For more information About Intel QuickAssist Technology, visithttp://www.intel.com/content/www/us/en/io/quickassist-technology/quickassist-technology-developer.html

1 Global Internet Phenomena Report: Sandvine,20122 OpenSSL: http://www.openssl.org3 The TLS protocol: http://www.ietf.org/rfc/rfc2246.txt4 Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel® products as measured by those tests. Any difference in system hardware or software design or configuration, as well as system use patterns including wireless connectivity, may affect actual test results and ratings.

Copyright © 2013 Intel Corporation. All rights reserved. Intel, the Intel logo, and Xeon are trademarks ofIntel Corporation in the United States and/or other countries.

*Other names and brands may be claimed as the property of others.

Printed in USA MS/VC/1113 Order No. 329877-001US

Resources

Solution Brief Series: Intel® QuickAssist Technology

Part 1: Integrated Cryptographic and Compression Accelerators on Intel® Architecture Platforms

Part 2: Bridging Open Source Applications and Intel® QuickAssist Technology Acceleration

Part 3: Accelerating OpenSSL* Using Intel® QuickAssist Technology

Part 4: Accelerating Hadoop* Applications Using Intel® QuickAssist Technology

Part 5: Scaling Acceleration Capacity from 5 to 50 Gbps Intel® QuickAssist Technology

Processor

Chipset

Memory

Operating System

Compiler

Appendix A: Platform Configuration

Dual socket Intel® Xeon® processor E5-2680 v2

Intel® Communications Chipset 8950 (Intel® DH8920 PCH)

DDR3 1333MHz

Linux* version 3.1.0-7.fc16.x86_64

gcc version 4.6.2 20111027 (Red Hat 4.6.2-1) (GCC)

Hardware Platform Configuration