Regeldokument - Linnéuniversitetetcs.uccs.edu/.../stedla/doc/OGNSuneethaTedlaPhDThesisV3.docx ·...
Transcript of Regeldokument - Linnéuniversitetetcs.uccs.edu/.../stedla/doc/OGNSuneethaTedlaPhDThesisV3.docx ·...
REDUCED VECTOR TECHNIQUE HOMOMORPHIC ENCRYPTION WITH VERSORS
A SURVEY AND A PROPOSED APPROACH
by
SUNEETHA TEDLA
M.C.A, Osmania University, India 1998
A dissertation submitted to the Graduate Faculty of the
University of Colorado at Colorado Springs
in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Department of Computer Science
2019
2
© COPYRIGHT BY SUNEETHA TEDLA 2019
ALL RIGHTS RESERVED
3
This dissertation for the Doctor of Philosophy degree by
Suneetha Tedla has
been approved for the
Department of Computer Science
By
Dr. Carlos Araujo, co-Chair
Dr. C. Edward Chow, co-Chair
Dr. T.S. Kalkur
Dr. Jonathan Ventura
Dr. Yanyan Zhuang
Date: 20 September 2023
4
Tedla, Suneetha (Ph.D., Security)
Reduced Vector Technique Homomorphic Encryption with Versors
A Survey and a Proposed Approach
Dissertation directed by Professors Carols Araujo and C. Edward Chow
ABSTRACT
In this research, a new type of homomorphic encryption technique, based on
geometric algebra and versors, called Reduced Vector Technique Homomorphic
Encryption (RVTHE) is designed, developed and analyzed. This new cipher method
is optimized to be faster and compact in cipher length while preserving the security
strength.
Performance criteria are proposed to generate benchmarks to evaluate the
homomorphic encryption for a fair comparison to benchmarks used for non-
homomorphic encryption. The basic premise behind these performance criteria is to
establish the understanding of the baseline to measure the variations of performance
between different encryption methods for Cloud Storage type Solid State Drives
(SSDs). Significant differences between in throughput penaltiesperformance, up to
20-50%, are observed among between our proposed encryption method and the AES
software methods on Cloud storage SSD or encrypted SSDs.
The central thesis of the research is to verify that homomorphic encryption is
better accomplished with the use of versors instead of multi-vectors. Using properties
of versors, it is possible to design a homomorphic cipher that has simple structure
versality of key assignments while achieving a great speed that rivals existing non-
5
homomorphic ciphers. In the thesis, I demonstrated that the versors based
homomorphic encryption is faster than an existing non-homomorphic encryption. Iit is
shown that RVTHE is a symmetric somewhat homomorphic encryption performing
addition, deletion, scalar multiplication, and scalar division. The evaluation of the
implementation shows a file can be edited/appended in .001 sec. And it showed, in the
case of full file encryption, RVTHE is 75% faster on encryption and 25% slower on
decryption, compared with the AES-Crypt encryption software which implements the
AES standard. The ciphertext sizes of RVTHE are found to be reduced on average of
25% from those of previous approaches using multi-vectors and Clifford Geometric
Algebra. RVTHE has the potential for use as an encryption method on real workloads.
Keywords: Encryption, Homomorphic, AES, SSD, AES-Crypt, Vectors, Versors.
6
DEDICATION
I wish to dedicate this body of research to my husband and my best friend
Shravan Tedla; with him everything is possible for me.
7
ACKNOWLEDGEMENTS
I am blessed with beautiful people in my life. I am very thankful to all who
supported me with my journey of schooling. I really appreciate all the support,
encouragement, love and understanding provided by my family, friends, colleagues
and Advisory Committee.
A special thank you to Dr. Carols Araujo and Dr. C. Edward Chow for their
support, sharing their knowledge, and guiding me for the last several years. Dr.
Xiaobo Charles Zhou advised me prior to Dr.Carols Araujo, and I am very thankful
to Dr. Xiaobo providing me the skills and insight needed to pursue my Ph.D. I very
much enjoyed and admired Dr. Carols Araujo’s knowledge and the way he educates
his thoughts to create a new way of doing the security, and that helped me
tremendously for my research. I really appreciate Dr. Chow’s support and
knowledge while discussing the ideas and analyzing how to put my thoughts and
ideas into actions. I am very thankful to both of you. I appreciate my Advisory
Committee members: Dr. Jonathan Ventura, Dr. Yanyan Zhuang, Dr. T.S.Kalkur
providing me their feedback and support. Many thanks to Ali Langfels who helps all
the students with a great smile while managing all the administrative work.
I am very thankful to my parents and my in-laws; one gave me the beautiful life
and one provided me the beautiful life partner with their unconditional love and
support. I am blessed with beautiful friend, my husband Shravan Tedla, and my kids
SaiKiran and Siddhartha and my gratitude to them supporting me in all aspects of
my life including my Ph.D. I am very thankful to my friend Tim Murphy spending
so many hours to help me to write thesis.
8
TABLE OF CONTENTS
CHAPTER 1.........................................................................................................................
INTRODUCTION................................................................................................................
1.1 Security Terminology...............................................................................1
1.2 Security systems.......................................................................................3
1.3 Cloud Storage Security.............................................................................4
1.4 Design Criteria for Cryptographic Algorithm..........................................5
1.5 Encryption................................................................................................6
1.6 Homomorphic Encryption........................................................................7
1.7 Possible fully homomorphic encryption method......................................9
1.8 Vector product spaces with Clifford Geometric Algebra.......................12
1.9 Reduce Vector Technique for Homomorphic Encryption.....................13
CHAPTER 2.......................................................................................................................
BACKGROUND................................................................................................................
2.1 Cloud Storage SSD.................................................................................14
2.1.1 Data Reliability and Integrity.............................................................15
2.1.2 Sanitization and Secure Deletion of SSD...........................................15
2.2 Survey of Various Encryption Approaches............................................16
2.2.1 Block Ciphers.....................................................................................17
2.2.2 Block Cipher Modes...........................................................................25
9
2.2.3 Encryption Methods for SSD..............................................................32
2.2.4 Comparable Encryption for Evaluations............................................36
2.2.5 Homomorphic Encryption..................................................................36
2.3 Mathematical Foundation.......................................................................37
2.3.1 Geometric Algebra Overview.............................................................40
2.3.2 Inner Product......................................................................................42
2.3.3 Outer Product......................................................................................43
2.3.4 Geometric Product..............................................................................45
2.3.5 Inverse of Vector................................................................................47
2.3.6 Versors................................................................................................47
CHAPTER 3......................................................................................................................
PROBLEMS AND LIMITATIONS....................................................................................
3.1 Defining the Problem.............................................................................50
3.1.1 Encryption Security Limitations and Problem...................................51
3.1.2 Encryption Limitations:......................................................................52
3.2 Other problems contributed for research motivation.............................52
3.2.1 Cyber Attacks.....................................................................................54
3.2.2 Real Randomness................................................................................55
3.2.3 Storage Security Limitations..............................................................56
3.2.4 SSD System Level Induced Limitations.............................................56
3.2.5 Existing research to mitigate the software limitations........................63
10
CHAPTER 4.......................................................................................................................
STORAGE ENCRYPTION ANALYSIS............................................................................
4.1 Measurement Environment....................................................................71
4.1.1 Selection of Encryption methods........................................................73
4.1.2 Experimental Tools and Workloads...................................................74
4.2 SSD performance without Encryption...................................................75
4.2.1 Performance differences between Amazon EC2 VMs.......................76
4.2.2 Did various block sizes significantly affect I/O throughput?.............76
4.2.3 Did various levels of parallelism affect I/O throughput?...................77
4.2.4 Did random and sequential jobs have a different IOPS?....................78
4.2.5 SSD Random Workload Analysis on t2.micro VM............................79
4.3 SSD performance with Encryption.........................................................81
4.3.1 Did various block sizes significantly affect IOPS..............................82
4.3.2 Did various block sizes affect Performance Throughput...................83
4.3.3 Did various Encryptions Versus Performance Throughput................85
4.3.4 Reads, Writes and Mixed workloads Versus Block Sizes..................86
4.4 Fully Homomorphic Encryption Limitations.........................................87
4.4.1 FHE with Vector Space......................................................................87
4.4.2 Previous homomorphic encryption using multivector technique.......88
CHAPTER 5.......................................................................................................................
RVTHE...............................................................................................................................
11
5.1 Design of RVTHE..................................................................................90
5.1.1 RVTHE Encryption and Decryption..................................................91
5.1.2 Encryption of RVTHE........................................................................91
5.1.3 Decryption of RVTHE........................................................................91
5.2 Mathematical Implementation of RVTHE Using Versors.....................92
5.3 Homomorphism of RVTHE...................................................................93
5.3.1 Addition..............................................................................................93
5.3.2 Subtraction..........................................................................................95
5.3.3 Multiplication.....................................................................................95
5.3.4 Division...............................................................................................96
5.4 Security of RVTHE................................................................................96
CHAPTER 6.......................................................................................................................
IMPLEMENTATION AND EVALUATION OF RVTHE.................................................
6.1 Implementation of RVTHE....................................................................97
6.2 Experimental Systems............................................................................98
6.3 Experimental Evaluations.......................................................................99
6.3.1 Time measurements on various key sizes.......................................99
6.3.2 Time measurements on various file sizes.........................................100
6.3.3 Size measurements on Encrypted Files............................................101
6.4 Security Evaluation of RVTHE............................................................101
CHAPTER 7.....................................................................................................................105
12
LESSONS LEARNED AND FUTURE WORK...............................................................105
7.1 Challenges and Lessons Learned.........................................................105
7.2 Contributions........................................................................................108
7.3 Success of work....................................................................................110
7.4 Future Work.........................................................................................110
CHAPTER 8.....................................................................................................................113
CONCLUSION.................................................................................................................113
REFERENCES.................................................................................................................116
Appendix A – Cloud Storage SSD...................................................................................133
Appendix B – Cloud Storage and Encryptions................................................................135
Appendix C – Multi-Vector Based Encryption................................................................141
Appendix D – RVTHE.....................................................................................................161
Appendix E – Acronym List............................................................................................185
13
LIST OF FIGURES
Figure 1 - Data Encryption Standard [27]..............................................................18
Figure 2 - TDEA [27]............................................................................................19
Figure 3 – AES encryption process........................................................................20
Figure 4 - Blowfish Algorithm...............................................................................22
Figure 5 – Twofish process [41].............................................................................23
Figure 6 – Serpent Algorithm - [45].......................................................................24
Figure 7 - CBC Encryption and Decryption...........................................................26
Figure 8 - CFB mode with 8 bits............................................................................27
Figure 9 - XTS mode..............................................................................................29
Figure 10 - GCM mode..........................................................................................30
Figure 11 - Outer Product.......................................................................................44
Figure 12 - Address Mapping between physical to logical....................................57
Figure 13 - Flashes and their parallel architecture.................................................61
Figure 14 - Consumer Vs Enterprise SSD..............................................................63
14
LIST OF GRAPHS
Graph 1 - IOPS Vs Block Size...............................................................................77
Graph 2 -Parallelism Vs Throughput.....................................................................78
Graph 3 - Random Versus Sequential Operations..................................................79
Graph 4 - t2.micro Block Size Versus IOPS..........................................................80
Graph 5 - t2.micro Block Size Versus KB/Sec......................................................80
Graph 6 - Encrypted SSD Block Size Versus IOPS...............................................82
Graph 7 - Best Crypt Block Size Vs IOPS.............................................................82
Graph 8 - Dm-crypt Block Size Vs IOPS...............................................................83
Graph 9 - Encrypted EBS SSD Volume Block Size Versus throughput................83
Graph 10 - BestCrypt Block Size Versus Throughput...........................................84
Graph 11 -Dm-Crypt Block Size Versus Throughput...........................................84
Graph 12 - Encryption Methods versus IOPS........................................................85
Graph 13 - Encryption Methods versus Throughput..............................................85
Graph 14 - Read workloads for various Block Sizes.............................................86
Graph 15 – Write workloads IOPS for various Block Sizes..................................86
Graph 16 - Mixed Workloads IOPS for Various Block Sizes................................87
Graph 17 - Multivector Based Homomorphic Encryption.....................................88
Graph 18 - Multivector based encrypted file sizes.................................................89
Graph 19 - Key size Vs Encryption/Decryption time in Sec.................................99
Graph 20 - File size and Encryption/Decryption times........................................100
Graph 21-Key Size and Time on Regualr SSD....................................................100
Graph 22 - Encrypted file sizes in MB.................................................................101
15
LIST OF TABLESTable 1--AES Key Size and Number of Rounds....................................................21
Table 2 - Key and data location in versors.............................................................90
CHAPTER 1
INTRODUCTION
Rapid changes in information technology, specifically the need to use data from
anywhere, are leading users to use Cloud environments with the expectations of
availability (able to provide the data access as needed), reliability, solid integrity
(maintain the data reliability accuracy throughout its life cycle), and full security
(assuring the data is accessed by only authorized parties with authorized level of
access). In this digital age, protecting the PII (Personal Identifiable Information) is
imperative. Tax IDs, Medical Information, Credit Information, and other extremely
sensitive data needs to be secured at the highest level, because it can be used for
Identity theft and other information crimes [1]. Various methods or processes are
implemented to secure the data; among these methods, encryption techniques are the
most commonly used. Scholars have been implementing different cryptographic
algorithms and methods such as the following: Secure Channel, Public-Key
encryption, Digital Signatures, and PKI. Cryptographic algorithms consist of Block
Ciphers (DES, AES, Serpent, and Twofish), Blocker Cipher Modes (Padding, ECB,
CBC, Fixed IV, Counter IV, Random IV, Nonce-Generated IV, OFB, CTR,
Combined Encryption, Authentication, and Hash functions (MD5, SHA-1, SHA-2,
SHA-256, SHA-512). But even with all these encryption methods each one requires
full decryption of all the data including decrypting the sensitive data. Also, I observed
a significant difference between throughput penalties up to 20-50% using encryption
software methods on Cloud storage SSD or encrypted SSDs, as described in abstract.
1
FHE (Fully Homomorphic Encryption) allows computing on encrypted data without
decrypting it, keeping sensitive data encrypted and thus not exposed [2] [3].
This thesis is organized as follows: Chapter 1 discusses the introduction. Chapter
2 discusses the most common techniques to secure systems or data. Chapter 3 presents
background work and shows the proof of performance penalties of cloud storage SSD
and encryption software methods. Chapter 4 introduces math and RVTHE. Chapter 5
presents use of RVTHE evaluation in real workloads. Chapter 6 discusses future
work. Chapter 7 concludes the thesis.
This chapter discusses introduction of research in terms of generic survey of
overall security and storage. Discusses about terminology of storage, cloud, security
systems and various encryptions methods and ciphers.
1.1Security Terminology
The “Security” word Originated from “Late Middle English: from Old French
“securite” or Latin securitas, from securus ‘free from care’ and it means “check to
ensure that all nuts and bolts are secure” [4]. The following are some of the most used
terms in the field of cyber security. They will help to clearly define their role in
Information Technology System Security [5].
Assurance: Specific security method implementation that has adequately met
these four security goals: integrity, availability, confidentiality, and
accountability.
Integrity: Ensuring the data in intact with all the modifications only with
proper allowable authenticity.
Availability: Able to provide timely reliable access to an entity.
2
Confidentiality: A set of practices and procedures that supports a security
policy.
Accountability: Principle that an authorized individual is responsible to
follow the safeguard controls of the system.
Asymmetric Encryption: This encryption method which uses two unique
keys, a public key for encryption and a private key for decryption. It is
impossible to derive the private key from the public key.
Authentication: Able to verify the identity of an individual or system
accessing an entity.
Block Cipher: Arrays of bytes in the form of binary bits that are used as input,
output, state, and round key in the encryption process.
o State: Intermediate Cipher of encryption process.
o Round Key: Values derived from Cipher Key.
Cipher and Ciphertext: A procedure containing a series of operations that
convert plaintext to ciphertext. Output generated from a Cipher method on
plain-text.
Classified Information: Information requiring the highest level of security
and mandating authorized access.
Cloud Computing: Way to provide network access of shared resources that
can be rapidly provisioned with minimal effort.
Cryptography: Study that incorporates the foundations, mechanisms, or
methods used to hide data and protect it from unauthorized access.
Cyber Attack: Intentionally disrupting the assurance of a system or the data.
Decryption: A technique of converting ciphertext to plaintext.
Encryption: A technique of converting plaintext to ciphertext.
3
Key: A secret code needed to perform encryption and decryption.
Private Key: A key needed for the decrypting process of asymmetric
encryption.
Public Key: A key needed for the encrypting process of asymmetric
encryption.
Reliability: A system is consistently performing with quality.
Symmetric Encryption: A form of an encryption that uses same key for
encryption and decryption process.
User: An individual who has proper level of authorization to access the
system.
1.2 Security systems
“A security system is only as strong as its weakest link.” [6]
We can only guarantee the level of security of the system depending on how
strongly we secured the weakest links. If we create an attack tree for any real system,
it will provide an insight for possible lines of attack [7]. If we leave one single weak
link, the rest of the system would be just as vulnerable, even with having the strongest
security elsewhere. Secrecy systems are broken three categories:
Concealment Systems use a fake covering cryptography method to hide a
message.
Privacy Systems need special equipment to recover the original message.
“True” Secrecy Systems use cipher for recovering the message.
To build a “True” secrecy system, one must follow the design criteria for a
cryptographic algorithm [8].
4
1.3 Cloud Storage Security
Cloud uses SSD and there has been a lot of research done related to SSD
characteristics, internal design, and performance for different types of workloads [9]
[10]. Previous studies have shown SSD outperforms HDD in speed while accessing
the data from each device [11] but this research had not considered encryption on an
SSD. There has also been a lot of research related to different types of encryption
methods, vulnerable attacks, and secure methods [6] [12]. These existing algorithms
were suitable for regular HDD, but they may not be optimal for SSD. This is because,
with SSD, the physical structure is different, so the encryption algorithms for HDD
might not be ideal or even compatible for SSD.
There is a need for research to make sure these encryption methods are good
enough; they could be measured by calculating their impact on SSD in terms of
performance and security. The rethinking of existing encryption algorithms is good for
SSD or coming up with new algorithms that will accommodate new environments like
the Cloud. The best encryption method could be found using an assessment between
already-existing encryptions and new encryption methods. For this I study first about
SSD’s physical and logical limitations.
The research showed workloads performances improved, always adding SSD or
just SSD as the storage. SSD is faster than HDD [39] [13], so adding it to the storage
system is what is expected to improve performance. Very little research happened to
show the impact on performance of the different types of workloads with the different
encryption methodologies. When I explored what type of encryption is better for the
cloud, we need to consider data at all stages, which means data traveling and data at
rest for cloud [14]. This can be accomplished by using fully data-centric security [15].
This can also be accomplished by using homomorphic encryption methods.
5
1.4 Design Criteria for Cryptographic Algorithm
Encryption is a small component of the system but provides a higher level of
security during cyber-attacks [6]. Encryption is the original goal of cryptography.
Encryption converts plain text into unreadable data which is also called ciphertext. A
good encryption makes it impossible to find the plaintext from the ciphertext without
knowing the key. With good encryption, the only information that will be accessible
is the plaintext length and the time stamp [16]. The following are some of the design
principles that will help to generate stronger cipher [6].
Algorithm should provide effective security, should be easy to use, and
completely stated.
Security should depend on the key secrecy, not on the algorithm secrecy.
Algorithm should be available to users, adaptable to applications and systems.
Algorithm must be implementable on a targeted system.
Algorithm should be efficient, verifiable, and portable between systems.
The cipher must be dependent on the key, modifications in message should not
mask the key. Randomness of the key is critical for the security of the system and it is
hard to generate or guess it [8]. In 1999, NIST selected AES - Encryption method
Criteria Security. The evaluation criteria were divided into three major categories [17]:
Security:
Resistance of the algorithm to cryptanalysis.
Soundness of its mathematical basis.
Randomness of the algorithm output.
6
Cost:
Licensing requirements.
Computation efficiency on various platforms.
Algorithm and Implementation Characteristics:
Flexibility.
Hardware and software suitability.
Algorithm simplicity.
Flexibility.
Key and block size agility.
1.5 Encryption
Encryption is an imaginative technical derivate of cryptography. Building the
optimal encryption technique is still very important. In the encryption process the
“key” takes an important role for encrypting and decrypting data, without the key the
data can’t be interpreted. The strength of the key depends on its secrecy, randomness,
length (size), and complexity. Over the years the encryption processes became more
complex during each iteration of cipher text generation. Various encryption methods
use a unique key generated for each iteration. However, the definition of Kerckhoffs’
principle is, “security of the encryption depends on the secrecy of the key not the
algorithm.” Meaning that everybody knows how the key is applied in the algorithm,
therefore the complexity of that key is all that matters. Most of the common
cryptographic methods follow this principle.
7
In 1997, the NIST (National Institute of Standards and Technology) received
fifteen new security algorithms from twelve countries. Out of these encryption
methods, MARS, RC6, Rijndael, Serpent, and Twofish were selected as finalists [18].
Out of these finalists Rijndael, Serpent, and Twofish took top 3 places respectively.
The winning algorithm, Rijndael, also called AES (Advanced Encryption Standard) is
still in use by different encryption methods [19]. All these methods are symmetric
encryption ciphers.
1.6 Homomorphic Encryption
1978 was the first time the idea of homomorphism for cryptography was theorized
by Rivest, Adleman, and Dertouzous [20]. You can define homomorphism in
abstract algebra in terms of functions and algebraic structures. Once a function (map)
is applied on algebraic structures the result still holds the same algebraic structure
from the domain to range of algebraic sets. In group theory, homomorphism theorems
are developed on subgroups as quotients groups. Ideals introduced in 19th century
played a parallel role defining quotient rings and in the comparable homomorphism
theorems in ring theory [21]. In algebra, A and B are the same type of algebraic
structure and mapping a function “f ” from A to B is the homomorphism from A to B.
A map from A → B operation “µ” and arity “k” and a 1 , a2 …, ak elements in A.
f ¿ (1.1)
Mapping from A to B with µ and µepimorphism, B is homomorphic image of A.
When homomorphism holds a one to one relation it is called endomorphism and noted
as A=B.
8
This same homomorphism also can be derived using lattices, groups, modules, and
monoids [22]. In groups, homomorphism is a category of isomorphism when the
homomorphism must be a bijection. If the A and B are two rings, and f is a function
from A to B, where A is the domain of f and B is the range of f , then each a element
belongs to A, and f ( a )belongs to B. This homomorphism can hold addition,
subtraction, and multiplication algebraic operations (¿). It can be showed as below.
f ( a∗a' )=f (a )∗f (a ') (1.2)
If the f satisfies above the following are true
f (0 )=0 (1.3)
f (1 )=1 (1.4)
f (−a )=−f (a ) (1.5)
f ( a )=f (b ) then a=b (1.6)
If the properties of homomorphism are incorporated in an encryption method or
cipher, then it is a homomorphic encryption method. Homomorphic encryption can
be organized in three approaches that are partial, somewhat, and full. Partial
Homomorphic Encryption allows only one operation with unlimited iterations.
Somewhat Homomorphic Encryption allows more than one but not all types of
operations and limits the iterations. Fully Homomorphic Encryption allows all types
of operations with unlimited iterations [23].
FHE (Fully Homomorphic Encryption) can be defined as: Applying an encryption
method (E) on data 1 (D1) and data 2 (D2) where the ‘⨳’ represented any operation
(addition, subtraction, multiplication, and division).
This is the mathematical representation: E ( D 1⨳D 2 )=E (D 1 )⨳E ( D2 )
9
The first feasible form of FHE was proposed by Craig Gentry in 2009 using ideal
lattices with “bootstrappable” encryption methods [2].
1.7 Possible fully homomorphic encryption method
In 2009, Craig Gentry introduced the first possible fully homomorphic encryption
method with an arbitrary depth circuit (composed of additions and multiplications) on
the encrypted data. This research provided the blueprint of FHE. It is referred to as
SwHE (Somewhat Homomorphic Encryption) and it uses limited depth circuit,
addition, and multiplication for evaluation [2]. This research helped develop an
encryption method using lattice-based, integer-based, LWE (learning-with-errors),
and RLWE (ring-learning-with-errors). Further research of SwHE and FHE showed
promise for potential usage in cloud computing environments and other MPCs (multi-
party computing) [24] [25]. In the Gentry method, using a lattice-based scheme takes
too long to generate the key (ranging from 2.5 sec to 2.2 hours), the implementation is
complex, noise creation can exceed thresholds, and bigger key sizes ( 17MB to
2.25GB) require high memory resources; all this becomes impractical in real systems
[3] . Fully Homomorphic Encryption (FHE), is on the “bleeding edge” of encryption
technology. But currently there is no FHE available for real time applications [26].
There is still a lot of work that needs to be done to have “production ready” version of
FHE.
Gentry defines the algorithm, in public-key encryption scheme ε consists of three
algorithms: KeyGenε, Encrypt ε, and Decrypt ε. KeyGenε takes λ security parameter as
input and implemented as randomly which it results a public key pk and secret key sk
and public key pk. Plain text space P and ciphertext space C is defined by pk. Gentry’s
encryption method Encrypt ε also randomized algorithm and it uses pkand plaintext π
10
∈ P as input, and generates outputs a ciphertext ψ ∈ C. His decryption technique
Decrypt ε takes sk and ψ as input, and outputs the plaintext π. Algorithm computations
work of all of them must be polynomial in λ. Algorithm correctness is:
if (sk,pk) R← KeyGenε, π ∈ P, and ψ R← Encrypt ε(pk,π), then Decrypt ε(sk,ψ) → π.
Homomorphic encryption scheme has property possibly randomized efficient
algorithm ¿ε|¿ is derived using public key pk and a circuit C from a permitted set C ε
of circuits, and tuple of ciphertexts ψ = ⟨ ψ1 , ...,ψ t ⟩ for the input wires of C; generated
ciphertext ψ ∈ C. Informally, the functionality that I want from ¿ε|¿ is that, using pk
if ψi “encrypts π i”, then ψ ← ¿ε|¿(pk,C,ψ) “encrypts C(π1 ,... , π t)” using pk, for input
(π1 ,... , π t) generates output C(π1 ,... , π t) of C. For the encryption the minimal
requirement is correctness. The following are couple of different ways to formalize
Gentry’s homomorphic encryption methods. Gentry defined them as follows.
“Definition 1: (Correctness of Homomorphic Encryption). Gentry says a
homomorphic encryption scheme ε is correct for circuits in C ε if, for any key-pair (sk,
pk) output by KeyGenε(λ), any circuit C ∈ C ε, any plaintexts π1 ,... , π t, and any
ciphertexts ψ = ⟨ ψ1 , ...,ψ t ⟩ with ψi ← Encrypt ε(pk,π i), it is the case that: ψ ← ¿ε|¿(
pk,C,ψ), then Decrypt ε(sk,ψ ) → C(π1 ,... , π t) except with negligible probability over
the random coins in ¿ε|¿.
By itself, mere correctness fails to exclude trivial schemes. Suppose I define ¿ε|¿(
pk,C,ψ) to just output (C,ψ) without “processing” the circuit or ciphertexts at all, and
Decrypt ε to decrypt the component ciphertexts and apply C to results.
Definition 2: (Compact Homomorphic Encryption). We say that a homomorphic
encryption scheme E is compact if there is a polynomial f such that, for every value of
11
the security parameter λ, E’s decryption algorithm can be expressed as a circuit DE of
size at most f(λ).
Definition 3: (“Compactly Evaluates”). We say that a homomorphic encryption
scheme E “compactly evaluates” circuits in CE if E is compact and correct for circuits
in CE.
Definition 4: (Fully Homomorphic Encryption). We say that a homomorphic
encryption scheme E is fully homomorphic if it compactly evaluates all circuits.
Definition 5: (Leveled Fully Homomorphic Encryption). We say that a family of
homomorphic encryption schemes {E(d) : d ∈ Z+} is leveled fully homomorphic if,
for all d ∈ Z+, they all use the same decryption circuit, E(d) compactly evaluates all
circuits of depth at most d (that use some specified set of gates), and the
computational complexity of E(d)’s algorithms is polynomial in λ, d, and (in the case
of ¿ε|¿) the size of the circuit C.
Definition 6: ((Statistical) Circuit Private Homomorphic Encryption). We say that a
homomorphic encryption scheme E is circuit-private for circuits in CE if, for any
keypair (sk, pk) output by KeyGenE(λ), any circuit C ∈ CE, and any fixed ciphertexts
Ψ = hψ1,...,ψti that are in the image of EncryptE for plaintexts π1,...,πt, the following
distributions (over the random coins in EncryptE, ¿ε|¿) are (statistically)
indistinguishable:
EncryptE(pk, C(π1,...,πt))≈¿ε|¿(pk,C,Ψ)
The obvious correctness condition must still hold.
12
Definition 7: (Leveled Circuit Private Homomorphic Encryption). Like circuit private
homomorphic encryption, except that there can be a different distribution associated
to each level, and the distributions only need to be equivalent if they are associated to
the same level (in the circuit) [3].”
All the above definition to show a very high level of Gentry’s work to defend the
thinking behind homomorphic encryption. Gentry scheme is an asymmetric
encryption scheme and his work very revolutionary to the thought behind
homomorphic encryption scheme bringing back to the world, so for that reason all the
definitions are mentioned in this thesis, but the details of his work is out of scope of
this this research. The mathematics used in his scheme has some shortcomings
because the primitive itself is not homomorphic, but his circuit computation algorithm
allowed for homomorphic properties. His algorithm organizes the data and
manipulates the circuits to achieve the computations on encrypted data.
1.8 Vector product spaces with Clifford Geometric Algebra
Geometric algebra was used as the basis for various encryption methods. For
example, RSA (Rivest–Shamir–Adleman) uses math in the form of factors with larger
prime number based key sizes. This approach creates complex factoring for RSA [27].
AES uses mathematics in the form of bit manipulations to increase the “diffusion” of
cyphertext and register based operations to increase “confusion” on shared key.
Applying Clifford Geometric Algebra on vector product spaces gives the results
which is intractable because the results will produce output as a vector in different
direction, space or volume. The geometric product that is a Clifford Geometric
Algebra operation, is an extension of the inner product of the vectors and it represents
the geometric objects of all dimensions in vector space. Versors represents the
13
multiple vectors geometric product and hold the properties of vectors in vector space.
Selecting multiple vectors with smaller dimensions and performing a geometric
product on them, results in an intractable vector in vector product space.
1.9 Dissertation Contributions
This dissertation contribute to the state of art in the following <number>
contributions:
1.
2.
3.
1.10 How this dissertation is organized
Chapter 2 is . Chapter 3….
1.10 Reduce Vector Technique for Homomorphic Encryption
<merge this section to the 1.9 and 1.10.> The use of multi-vectors for
homomorphic encryption had been demonstrated by David Williams Honorio Araujo
Da Silva in his master thesis, the algorithm was designed using a concept invented by
Dr. Carlos Paz de Araujo in 2017. However, there is another way we can use vectors
with geometric product in vector space. Versors are vectors in the geometric product
space which have simpler inverse characteristics. RVTHE (Reduced Vector
Technique for Homomorphic Encryption) is cryptographic cipher powered by Clifford
Geometric Algebra and versors. This approach is an incredibly efficient method for
encryption, decryption, and real time usage [28].
14
Securing the data involves two stages, data at rest, and data while traveling. “Data
at rest” means to describe the data before or after sending to server, storage, or cloud.
“Data in transit” means sending the data between client and server, storage, or cloud. I
will refer to these two stages in this paper as ESD security (Every Stage of Data).
Enterprises have been using security networks, servers, and storage; but the data has
not always been stored in a secure state, therefore there is a need for fully data-centric
security. Data-centric encryption is a way to achieve data-centric security. RVTHE is
a data-centric encryption cipher which is simple to implement and provides ESD
security for the entire system. This method requires less resources to encrypt and
decrypt, and it offers real-time data updates. It is also scalable and adaptable from
small devices to large enterprise storage.
15
CHAPTER 2
BACKGROUND
This chapter discusses all the research related to storage and security methods
background. Mainly discussing about SSD storage device characteristics, various
encryption approaches, and mathematical foundation related to new cipher that will
be presented in this research.
2.1 Cloud Storage SSD
Most of the Cloud environments uses SSD as data storage or in the form of flash
cache to increase the performance. Data is stored regardless of power availability
status. It does not contain an actual disk (platter) as in a traditional HDD (Hard Disk
Drive). SSD technology uses electronic interfaces like SATA (serial ATA) Express to
make compatible with any host. It also uses typical traditional block input/output (I/O)
provided by any host, thus permitting simple replacement of traditional hard disk drive
technology in common applications. SSD is used as the primary data storage for
communication devices, storage systems, modern computers, etc. [11].
In the perspective of security for an SSD, it strives to achieve the best data
reliability, integrity, secure deletion, and encryption; plus, the unique physical nature
of the device. These aspects depend on ECCs, reliably erasing data from storage media
(no digital footprint), and proper encryption methods. SSD’s built-in commands are
effective for ECCs and deletion, but manufacturers sometimes implement them
incorrectly. However, previous research has been done to solve some of the above
16
issues by implementing a variety of different approaches for achieving better ECCs
and encryption methods. Previous research had not considered the sacrifice to
performance due to encryption, when they implemented their methods. This thesis will
consider those factors in the form of performance of SSD in IOPS for different
workloads, while doing the encryption process.
2.1.1 Data Reliability and Integrity
ECC is one of the functions of the FTL. ECC schemes are implemented to ensure
raw reliability of data. It usually contributes overhead on resources; thus, it impacts the
performance. Conventional ECCs such as the commonly used BCH (Bose-Chaudhuri-
Hocquengham) code reliability degrades as SSD capacity grows. It is important to
implement a powerful ECC Engine with LDPC (low-density parity-check) code to
improve the reliability of SSDs [29] . Previous research proposed different ECC
approaches to increase the data reliability. One of the ECC research approaches is
lightweight EDC (Error Detection Code) for the block to achieve better cache
performance [30].
2.1.2 Sanitization and Secure Deletion of SSD
The physical SSD architecture of the non-encrypted SSD had limitations for
sanitizing the disk or securely deleting a file from SSD. In case the vendor did not
implement the host interface built-in commands correctly, the sanitation of the SSD
will not be achieved. There is no full sanitizing technique that worked for HDD that is
guaranteed to work for an SSD. Usually we can achieve sanitization of SSD by
writing to the visible address space twice using FTL procedures. But this is a time-
consuming process and it is not a true sanitization, because it does not take care of
invisible address space (files marked as deleted, but physical data still exists). The data
in SSD can be erased with erasure-based sanitization techniques (overwriting the disk
17
with multiple IO operations) that may be able to sanitize the SSD but these techniques
have shortcomings and fail to do a real sanitization [31].
Completely deleting and securely erasing data in SSD is challenging. For that
reason, storing unencrypted data creates a risk of exposing that data to unauthorized
access. And although erasing files via sanitization methods will make the SSD more
secure, it also creates a lot wear and tear on the device which will shorten its lifespan.
To avoid these problems, the best option is encrypting the data on an SSD. Previous
research created a couple of methods for encrypting files on SSDs, they are node level
and password-based file level encryptions. In node level encryption, you encrypt the
nodes. It stores keys on the dedicated KSA (Key Storage Area). That is the concern
though, because KSA blocks can turn into bad blocks and at such a time they can be
read [32]. In password-based file level encryption, files are encrypted using
passwords, but encryption and deletion of the files is slow and accessing the files each
time is tedious [33].
Even with all the challenges of encryption, it is still the best option for securing the
data of SSD.
2.2 Survey of Various Encryption Approaches
This chapter describes encryption methods and algorithms; and we’ll look at their
strengths and weaknesses in high detail. They are used in encryption software for
SSD. This chapter also talks about real randomness for creating keys and the most
common types of existing encryption methods.
18
2.2.1 Block Ciphers
An encryption function on fixed-sized blocks of data is called a block cipher. “A
secure block cipher is one for which no attack exists” [6]. Block cipher is an
encryption function for fixed-sized blocks of plaintext and generates the same sized
block ciphertext using the same secret key (key size can be different then plaintext).
Without a secret key, no plaintext can be produced from ciphertext. Security of the
block cipher is also defined as using attacks as non-generic methods to differentiate
between block and an ‘ideal block cipher’ [6]. “Block cipher written in terms of
E(K,p) or Ek(p) for encryption of plaintext p with key K and D(K,c) or Dk(c) for
decryption of ciphertext c with key K” [6]. In block cipher encryption, the key is a
critical component and its integrity is absolute, changing a single bit in the key value
can result into a different ciphertext [6].
Using a permutation on k-bit values generating the k-bit cipher with each of the
key values can create 2k cipher values [6]. Suppose we have single permutation on
128-bit values, it will create a table of 2128 cipher values (each cipher is 128-bit). The
ideal cipher should have a random permutation for each key value, this will give the
ability to choose the look up table randomly. The “distinguisher” is an algorithm that
converts data to a block cipher or an ideal block cipher using a black-box function.
The distinguisher does not have knowledge of the internal process of the black-box
function. There are limited amounts of computing that a distinguisher can do,
otherwise more computing would complicate the process beyond an acceptable level
of efficiency. A practical block cipher should be designed such that each encryption
function appears to be a randomly chosen key with an invertible function [6].
19
The block cipher is an ‘ideal block cipher’ if it can withstand attacks like known
plaintext, ciphertext only, related key, chosen plaintext, and other types of attacks. In
SSD, encryption software uses one or more of the following block ciphers, the
following sections will address them and some of their attacks.
2.2.1.1 DES (Data Encryption Standard)
The algorithms described in this DES standard specifies both enciphering and
deciphering operations which are based on a binary number called a key. DES uses
Feistel ciphers design with 16 rounds.
Figure 1 - Data Encryption Standard [27]
20
In Figure 1, DES starts with a 64-bit input block (binary digits). DES then applies
a 56-bit key which was randomly generated from the 64-bit key. Out of this 56-bit key,
48 bits are used directly by the algorithm, and the other 8 bits are used for error
detection as needed. These 8 bits are used to set the parity of each 8-bit byte, the
output should have an odd number of "1"s. XOR operations performed between key
and data along with permutations makes the final cipher [18].
Figure 2 - TDEA [27]
In [27]Figure 2 - TDEA [27], a 3DES (TDEA) key is made out of three DES keys,
which are also referred to as a key bundle. The keys inside the key bundle are different
from each other. This key bundle is used for encryption and decryption. The
encryption process starts with encrypting using the first key, decrypting using the
second key and then encrypting using the third key. The decryption process follows
the reverse order of encryption process. The encryption algorithms specified in this
standard are commonly known among those using standard encryption [34].
21
3DES was heavily used by organizations until researchers discovered Active
Collision Attacks on different modes (e.g. CBC, CTR, GCM, OCB, etc.) [35]. The
small key size, data block size (64 bits), and using same key for encryption became a
vulnerability because it created the same ciphertext in every two to the power of data
block size (232). These matching collision ciphers can expose security to attacks like
birthday attacks. Due to XOR operation you can find plaintext XOR between the
collision ciphers. Cipher collision is not enough to discover the plaintext, but that
along with same secret key feed and some fraction of known plaintext, will make it
easier to perform successful attacks. Due to ever increasing computer power these
attacks are more easily committed by various attack methods like the man-in-the-
browser attack [35].
2.2.1.2 AES (Advanced Encryption Standard)
Figure 3 – AES encryption process
22
AES is symmetric encryption algorithm that was created to replace DES and 3DES
encryption. Joan Daemen and Vincent Rijmen developed AES encryption. It was
adopted as NIST encryption standard 2001 [36]. Figure 3 shows the AES encryption
process.
AES is created with the key lengths of 128, 192 and 256 bits. It encrypts a 128-bit
block of plaintext and generates a 128-bit block of ciphertext. It uses only one key for
encryption and decryption. AES encryption consists of repeated rounds of
implementing the following steps: sub bytes (replacing bytes using S-box table), shift
rows, mix columns, and add round keys. The last round does all functions except mix
columns. AES decryption reverses the encryption process.
Key Size Total Rounds
128 10
192 12
256 14
Table 1--AES Key Size and Number of Rounds.
Table 1 shows the total rounds performed based on key size [36] [37]. In recent
years most of the AES implementations are using the 256-bit key length instead of
192-bit key. Even though AES 256 has a longer key, the way the key schedule is
designed it makes it more vulnerable to sub-key attacks. AES is subject to a theoretical
brute-force attack, but even with current technology it would take a quintillion year to
break the encryption key. There are some additional theoretical attacks documented,
and they are cryptanalytic attacks, related key attacks on AES 192 and 256, middle
attacks on AES 128, and first key attack on all of AES. Exploits of AES 256 have
23
received the focus by the security community more than AES 128 and AES 192.
Despite this, all AES versions are considered not breakable by today’s technology [37]
[38].
2.2.1.3 Blowfish
Blowfish is a symmetric block cipher that was designed by Bruce Schneier to
replace DES in 1993.
Figure 4 - Blowfish Algorithm
In Figure 4 , the original design of Blowfish manipulates data in 32-bit, 64-bit and
128-bit block sizes with variable key size scales from 32 bits to 256 bits. In Figure ,
the algorithm uses the XOR operation, table lookup (S-box), and modular
multiplication. It has the same structure as the DES algorithm. This algorithm uses
precomputable sub-keys to expedite the speed of encryption. After a year, to increase
24
the security the key size was increased from 256 bits to 448 bits and published in Dr.
Dobb's Journal [39].
If the security person has chosen a small key length, then it will behave like weak
keys in Blowfish, which will make it vulnerable to chosen-key and related-key attacks.
Due to its Feistel structure and key dependent S-box substitution it is also prone to
slide and simple power attacks. Because Blowfish is a block cipher, it is vulnerable to
similar attacks as block ciphers are already prone to such as; side-channel, exhaustive
search, and birthday attacks are just name a few [40].
2.2.1.4 Twofish
Twofish symmetric encryption algorithm is like AES. It uses key lengths of 128,
192, and 256-bits and a 128-bit block cipher. The National Institute of Standards and
Technology selected it as one of the top 5 finalists, but it was not selected for
standardization in the end. Still recently developed encryption software for storage and
file systems incorporated this algorithm (i.e. TrueCrypt, BestCrypt, Dm-crypt, and
DiskCryptor). Twofish algorithm is one of the ciphers included in the OpenPGP
standard and it is free with no restrictions.
25
Figure 5 – Twofish process [41]
In Figure 5 , shows Twofish algorithm uses the same predefined key-dependent, S-
box, and key schedule as AES. The first half of the key is used for encryption and the
second half is used for an S-box lookup and modifying the encryption algorithm.
Twofish’s design looks like a mix of DES and AES, one half is like DES in that it uses
a Feistel structure, and other half is like AES in that it uses S-box and a Maximum
Distance Separable matrix. Twofish’s 128-bit key encryption is slower than its AES
counterpart, but the 256-bit key encryption is faster [41].
Researchers had claimed that when the weak key pairs were present, there might
have been vulnerability to a Twofish cipher by partial chosen-key and related-key
attacks. But it was determined that the existence of these key pairs was not realistic, so
the proposed attacks would not work [42]. With time the scholars found there are
vulnerabilities with the Twofish cipher after all. One attack, SPA (Simple Power
26
Analysis) revealed the secret key of the cipher. It uses S-box with 8-bit predefined
permutation and round operations so it is prone to Side Channel attack with one
iteration to discovering encryption key. [43].
2.2.1.5 Serpent
Serpent is also a block cipher, and it was published in 1998 by Ross Anderson, Eli
Biham and Lars Knudsen. This algorithm was selected as one of the finalists by the
US National Institute of Standards and Technology [44].
Figure 6 – Serpent Algorithm - [45]
Figure 6 shows the process of Serpent. Serpent uses 128, 192 and 256-bit key
lengths, and it uses 32-bit words with 32-bit round substitutions and a permutation
network with 4-bit S-boxes running 32 rounds key mixing operation [44].
In 2011, there was a cryptographic analysis performed using a multidimensional
linear method to find vulnerability with Serpent. Researchers proved Serpent breaks
27
in 11 rounds of using a 128-bit key length with key mixing operations that find the
encryption key [46].
2.2.2 Block Cipher Modes
A Block Cipher Mode is a repeated cryptographic conversion of a single-block
operation on several bits to achieve confidentiality and authenticity. It adapts to the
different operating environments and requirements. The following are some of the
most commonly used block cipher modes.
2.2.2.1 CBC (CIPHER BLOCK CHAINING) MODE:
CBC mode uses the IV (Initial Vector) to encrypt the first plaintext using XOR
operation. This method uses the previous ciphertext to encrypt the next plaintext
block. The encrypted ciphertext is stored in a feedback register and used for inputting
the XOR function with the next plaintext. This process repeats until all the plaintext
has been addressed. From the second block onwards, all the blocks depend on the
previous blocks. In the decryption process, the same thing is applied but in reverse
order. To decrypt the next cipher text block, use the cipher from previous decryption
cycle and apply XOR with decryption key to get the next plaintext block. After each
decryption cycle the cipher is stored in the feedback register.
Encryption: Ci = Ek(Pi Ci − 1) and Decryption: Pi = Ci − 1 Dk(Ci)
28
Figure 7 - CBC Encryption and Decryption
Figure 7 shows CBC encryption and decryption process. CBC structure may be
exposed to some vulnerabilities. For example, in CBC mode the encryption process
will not start until there is enough plaintext data to fill the entire block being
processed. In secured network communications, the terminals need to immediately
send each character or string of bytes to the destination host, as they can’t wait until
the block is full. But when the string of bytes is smaller than a block, CBC mode will
not be able to handle the encryption. Another weakness, the birthday paradox exposes
identical patterns of the plaintext every 2m/2 blocks (m = block size), this is due to
chaining. There are ways, you can mitigate these issues for example: taking care of the
message starting point and endpoint, including controlled redundancy and
authentication [27].
If an attacker added some bits to the ciphertext block and it was undetected during
the decryption cycle, that block will result in gibberish. Sometimes it may not be an
issue, but other times it can cause problematic situations. Altering the ciphertext by
even one bit will cause the subsequent block to have the wrong input, and that will
affect the decryption of that block. The combination of SSL v3 and TLS v1 with CBC
29
is not recommended as it uses the entire traffic single set of ‘initialization vector’ for
the communication. This exposes the targeted block to a padding oracle attack, where
an attacker can figure out the padding information, then the attacker can determine the
plaintext bytes from the ciphertext by running multiple queries [47]. This was
addressed in TLS 1.2, which checks for multiple queries and stops the connection to
prevent that type of queries, and it was recommended to upgrade all the secure
communications by implementing this change.
2.2.2.2 CFB (CIPHER-FEEDBACK) MODE
Usually the block ciphering won’t start until the block data is received. As
mentioned in the CBC section, CBC cannot handle when a string of bytes is smaller
than a block. On the other hand, CFB mode can handle this smaller string of bytes.
This process derives the next key from encrypting the previous ciphertext. This key is
used for the next iteration to encrypt next plaintext bytes.
Figure 8 - CFB mode with 8 bits
Figure 8 shows the encryption and decryption for n bit block. The encryption and
decryption use block size with shifting and XOR operations.
30
Encryption:
Ci = Pi Ek(Ci − 1)
Decryption:
Pi =Ci Ek(Ci − 1)
CFB uses synchronous stream cipher on both the encryption side and the
decryption side. Encryption and decryption keystream generators need to derive the
exact same keys on corresponding iterations. If any of them miss a cycle it can result
in generating the wrong ciphertext or plaintext. CFB mode is like CBC mode, in that
one incorrect bit can propagate to all the subsequent processing [27].
2.2.2.3 LRW (Liskov, Rivest, and Wagner) mode
To prevent attacks from the CBC mode the LRW mode was introduced. This is a
tweakable narrow-block encryption, which is a random permutation using a key with
a known tweak I on the plaintext P, the result of which will be block cipher C. This
method uses two keys: The first key K is used to encrypt the plaintext with XOR, and
the second key F is used for a finite field permutation. The key F is the same size as a
block, it is used in the finite field permutation with a precomputation tweak of the
plaintext. This outcome X will be used for the encrypting process [48].
Encryption:
C = Ek(P X) X
Where X = F I
Decryption:
P = C Ek(Ci − 1)
The XOR and multiplication are performed using key K and F on the plaintext and
finite field (GF (2128) for AES) with a precomputation tweak.
F I = F (I0 ) = F I0 F
31
represents all possible values in the binary finite field of (GF (2128)).
This method protects from CBC mode attacks, but still have its own leak. If the
attacker changes a single block it only affects that cipher block but not all the
subsequent cipher blocks.
2.2.2.4 XTS Mode
Figure 9 - XTS mode
In Figure 9, XTS mode is an Advanced Encryption Standard with XEX (XOR
Encrypt XOR) tweakable code value and ciphertext stealing. Simplified tweaked AES
with XEX method will use the XOR operation on the plaintext to generate the tweaked
output. Then the second-time AES encryption is applied on the tweaked output it will
generate the final ciphertext. Ciphertext stealing is a block cipher mode that allows the
encryption of the messages without having to divide them into sizes that are not
divisible by the block size, this results in same size ciphertext, but it is more complex.
X = Ek(I) αj
C = Ek(P X) X
P - The plaint text.
32
I - The number of the sector.
α - The primitive element of (GF (2128)) defined by polynomial.
j - The number of the block within the sector.
XTS mode has similar vulnerabilities like CBC mode. For example, tampering of
data can go unrecognized, which will when decryption occurs, generate gibberish. The
system must be built to recognize this potential threat and be able to protect the data
using checksums and authentication tags. This mode is prone to other vulnerabilities
like replay attacks and randomization attacks. If the attacker has access to ciphertext
blocks they can analyze them and use them for replay attacks and randomization
attacks [49].
2.2.2.5 GCM (Galois/Counter Mode)
Figure 10 - GCM mode
In Figure 10, GCM is a symmetric key cryptographic block cipher. It is derived
from GMAC (Galois Message Authentication Code), an authenticated incremental
message communication. All blocks are numbered and then they are encrypted using
XOR operation (similar to a stream cipher operation order in the form of counters).
33
GCM uses a hash key H, it is a string of a 128 zero bits encrypted using the block
cipher. For encryption, along with the hash key, it uses a unique arbitrary length
initialization vector for each stream [50].
GCM mode does not have vulnerabilities like CBC. For example, in CBC mode
tampering can occur without noticing, but in GCM the operations are performed using
an authenticated encryption method, which keeps data and communication
confidential. It also maintains integrity, by using the main function’s authentication
tag or mode to verify the data. It uses reasonable hardware resources (memory, CPU,
etc.,), it also performs very efficiently due to parallel processing, and provides high
speed communication [50].
The key in GCM mode is similar to the one in LRW mode (multiplication for
Galois field) per each 128-bit block cipher (GF (2128) for AES). The GF polynomial is
defined as:
x128+x7+x2+1.
Feeding the blocks of data into the GHASH function and encrypting the output
will generate the authentication tag.
GHASH (H,A,C) = X m+n+1
H - Hash Key ,A - Authenticated data (plaintext)
C – Ciphertext ,m - The number of 128-bit blocks in A
n - The number of 128-bit blocks in C
34
This encryption method has been shown to be secure and efficient. Currently,
Google uses as it’s mode for their website certificate.
2.2.3 Encryption Methods for SSD
SSD serves as a typical alternative to HDD. In fact, SSD considerably emulates the
technology of HDD such as the communication protocol and hardware interfaces. So,
the technology of HDD can quickly be adapted to SSD. However, the methods that
SSDs employ to process data is different from HDDs in storing, managing, accessing,
and securing. Because of the differences between the two technologies, it is possible
that the processing of the same commands on HDD will produce different results on
an SSD [11]. When it comes to encryption, we need to consider these differences.
There are couple of encryption techniques that have been used for SSD. This chapter
will discuss those methods.
2.2.3.1 Dm-crypt
Dm-crypt is a disk encryption method compatible with Linux kernel version 2.6 or
later. It uses API routines. Devices are mapped to encrypted containers using a device
mapper [51]. This API uses AES-256 cryptographic method along with other
methods. Dm-crypt uses Linux Unified Key Setup (LUKS) to create encrypted
containers which are independent from outside platforms. LUKS was developed by
Clemens Fruhwirth in 2004 [52]. Using this method, user can even encrypt the root
device. A passphrase is required to create encrypted containers.
There has been some research around the drawbacks of dm-crypt. For example, it has
been discovered that hackers can sidestep the passphrase to access encrypted
containers by hitting the ‘Enter’ key couple of times. They can also delete the
containers, because to delete the containers they do not require the passphrase.
35
Utilizing disk commands on the system an intruder can determine critical components
of the hidden containers relatively easily [53] [54].
2.2.3.1.1 Process Method
Dm-crypt uses device mapper and the Linux kernel’s Crypto API routines. This API is
built with a cryptographic method using an AES-256 algorithm. Dm-crypt supports
XTS, LRW, and other modes for the encryption. The encrypted containers are stored
as files inside a folder. Users can create these containers (volumes) with LUKS (Linux
Unified Key Setup) encryption specification that is protected by a passphrase. Using
the system device mapper, it mounts the encrypted containers on the top of existing
devices. Clemens Fruhwirth created LUKS in 2004, Dm-crypt uses this method to
create encrypted containers, which are independent from the existing platform and
allow compatibility from system to system.
2.2.3.1.2 Weaknesses
Using this method, a user can encrypt the root device, but they may need a smart
device attached to the system so that they can boot to the primary system. When
creating the containers, a passphrase is required, but to delete a container a passphrase
is not even requested. This method is mainly used for Linux like systems. Some of the
research showed that you can bypass the passphrase to access the containers by just
pressing the enter key a couple of times. The file systems information displays the
sizes of volumes, which may result in someone guessing information about the hidden
containers.
36
2.2.3.2 BestCrypt
BestCrypt is encryption software implemented in 1995 and it is still in use. It
creates, mounts, and manages encrypted volumes called containers. Because this
encryption software is still in use, it will be selected for evaluation.
2.2.3.1 Process Method
This encryption method stores files in encrypted containers and keep them safe from
unauthorized access. The benefits of BestCrypt is the system disk volumes can be
mounted and stored as encrypted files when not in use. This method can be applied to
removable media, network shares, archived storage, and email attachments on
Windows or Linux OS. It uses the following cryptographic methods: AES, Blowfish,
DES, Triple DES, Twofish, Serpent, and GOST 28147-89. All these cryptographic
methods use LRW and CBC modes. AES, Twofish and Serpent also use XTS mode
[55].
2.2.3.2 Weaknesses
It seems a viable option, but like any software it can have bugs, these errors can be as
large as damaging entire partitions.
2.2.3.3 FDE (Full Disk Encryption)
FDE is a hardware encryption method, it started implementation in 2009 and is still in
use. It encrypts all the partitions, system files and operating system using hardware
component. This technique is used by Samsung SSDs which are commonly used.
Applying FDE on an SSD is called an SED (Self-Encrypting Drive). Self-encrypting
SSDs provide better performance than SSDs where the encryption software is installed
[12]. This encryption implementation method will be selected for evaluation.
37
When Full Drive Encryption (FDE) is applied on an SSD it is called a Self-Encrypting
Drive (SED). FDE was developed in 2009, it is a literal encryption of the entire system
which includes all the partitions, system files, and operating system. This encryption
method assigns the process to use the hardware component of the drive. This helps to
enhance the security by utilizing the Opal Storage Specification (which is a set of
specification features of SEDs) [12]. SED needs a master password for the SED and a
user password for each user. They are stored in the BIOS and handled by the hard disk
controller. SED uses AES 128 and AES 256.
Researchers have found the following vulnerabilities of this method: Hot Plug Attack,
Hot Unplug Attack, Forced Restart Attack, and Key Capture Attack. They have also
shown that attackers can bypass the encryption and access data; this undermines the
purpose of securing the data [56].
2.2.3.3.1 Process Method
This encryption method delegates the process logic to a dedicated hardware
component of the drive using the Opal storage specification (a set of specification
features of SEDs) to enhance security. The hard disk controller handles key
management, it enhances the security and protects the data from unauthorized access.
SED will have two passwords and they are User and Master password. Both
passwords are stored in the BIOS. The Master password is generated by the SED and
the user password generated by users for system access. In situations where user
password is lost or forgotten then the Master password can be used to unlock the
system. It uses the following cryptographic methods: AES 128 and AES 256. Using a
BIOS password, it is used for pre-boot authentication of the system.
38
2.2.3.3.2 Weaknesses
There are some attacks that are related to this method: Hot Plug Attack, Hot Unplug
Attack, Forced Restart Attack, and Key Capture Attack. Research has shown the
attacker can bypass the encryption and access data; this undermines the purpose of
securing the data [56].
2.2.4 Comparable Encryption for Evaluations
“AES Crypt is a file encryption software available on several operating systems
that uses the industry standard Advanced Encryption Standard (AES) to easily and
securely encrypt files.” Represented in this paper as AES-Crypt [57].
2.2.5 Homomorphic Encryption
The first practical and feasible version of homomorphic encryption was introduced
by Craig Gentry applying addition and multiplication on the encrypted data over
circuits in 2009 [58] [2]. Research had shown that there were advantages of
leveraging the homomorphic encryption in the Cloud and in Multi-Party Computing
environments [24] [25]. Most of the previous implementations were asymmetric
homomorphic methods. But researchers observed that some behaviors not practical
for real world usage and they were:
Key Sizes: Ranged from 17MB to 2.25GB
Key Generation Time: Ranged from 2.5secs to 2.2hours
Cipher Text size: Much larger cipher texts
Noise: Creation exceeding thresholds
Time: Very long execution times
39
These weaknesses made homomorphic encryption impractical to use in the cloud
or real time systems [3] [59]. Currently there is no encryption method in production
which can take advantages of homomorphic features for any system [26].
It must be way; we could create an encryption methodology that could derive
great value from the advantages of the unique features of homomorphic encryption.
Using Versors from Clifford Algebra and Versors I developed a symmetric
homomorphic encryption scheme. The next section will discuss mathematical
foundation of new encryption method.
2.3 Mathematical Foundation
This section discusses the mathematical foundation which was used to architect
RVTHE.
Algebra is the base for most homomorphic encryption methods. It uses positive
numbers, real numbers, complex numbers, linear algebra, geometric algebra and
function spaces (e.g., Hilbert Spaces and Clifford Algebra) for number fields. If,
Geometric Algebra uses vector spaces with a quadratic form and it is associative, then
it is called Clifford Algebra. I chose to use Clifford Algebra for RVTHE, because it
calculates a geometric product of vectors and the generated results are not traceable,
this is ideal for level of security that we want to achieve. So, it is important to
understand these Clifford Geometric Algebra terms [60]:
Geometric Algebra is the foundation for homomorphic encryption. It uses positive
numbers, real numbers, complex numbers, linear algebra, and function spaces (e.g.,
Hilbert Spaces and Clifford Algebra) for number fields. If Geometric Algebra uses
vector spaces with a quadratic form and it is associative then it is called Clifford
Algebra. It is important to understand these Clifford Geometric Algebra terms [60]:
40
Vector: “a quantity having direction as well as magnitude, especially as
determining the position of one point in space relative to another.”
Vector Dimension: “Let V be a finite dimensional vector space over the field
𝔽. The Dimension of V denoted dim𝔽 V is the number of vectors in any
basis of V. If V is an infinite dimensional vector space over 𝔽 then we
write dim𝔽 V =∞”.
We can represent a “n” dimension vector as “nD”.
i.e. If n=2 then “2D” is used to represent a 2-dimensional vector.
Vector Space or Bivectors: “a space consisting of vectors, together with the
associative and commutative operation of addition of vectors, and the
associative and distributive operation of multiplication of vectors by scalars.”
Multivector: “a mathematical structure comprising a linear combination of
elements of different grade, such as scalars, vectors, bivectors, tri-vector, etc.”
Geometric Algebra Axioms: To understand combinations of scalars, vectors,
and bivectors, we first need to know the axioms behind the geometric algebra.
These are the proven axioms in geometric algebra. Vectors are represented by
(a , b , c ), scalars by ¿,ε ¿ , and bivectors by (ab ,ba ,ac , etc ¿.
Axiom 1: associative rule
a (bc)=(ab)c (4.1.1.1)
Axiom 2: distributive rules
a (b+c )=ab+ac(b+c)a=ba+ca
(4.1.1.2)
Axiom 3: (λ a)b=λ(ab)=λ ab[ λ∈R] (4.1.1.3)
Axiom 4: λ (ε a)=(λε )a[ λ , ε∈R] (4.1.1.4)
Axiom 5: λ (a+b)=λ a+λ b[ λ∈R] (4.1.1.5)
41
Axiom 6: (λ+ε )a=λ a+ε a [ λ , ε∈ R] (4.1.1.6)
Axiom 7: a2=¿a∨¿2 ¿ (4.1.1.7)
Axiom 8: |a · b| = |a||b| cos θ (4.1.1.8)
Axiom 9: |a ∧ b| = |a||b| sin θ (4.1.1.9)
Axiom 10: ab = a · b + a ⋀ b (4.1.1.10)
Axiom 11: a ⋀ b = −b ⋀ a. (4.1.1.11)
Product of Vectors: The result of multiplying the vectors with scalar and
cross products. These two products are foundation for geometric algebra’s
inner, outer, and geometric products of vectors.
o Scalar Product: (Also known as dot product) The magnitude of
production of vector quotients.
o Cross Product: (Also known as vector product) A binary operation on
two vectors in three-dimensional space.
o Outer Product: (Also known as wedge product) The tensor product of
two coordinate vectors.
o Inner Product: The dot product of the Cartesian coordinates of
two vectors.
o Geometric Product: The sum of the inner and outer products
Vector Inverse: When performing geometric product between vector A and
another vector B; if the result is “1” then vector B is called the inverse of
vector A and vice versa.
Blade: The outer product of k vectors is called a k-blade, suppose 1-blade
means vector, 2-blade means bivector, 3-blade means tri-vector, and so on.
Where k indicates the grade of the blade.
42
Versors: Versors are multiple vectors using geometric product following Clifford Geometric Algebra.
Vector: “a quantity having direction as well as magnitude, especially as
determining the position of one point in space relative to another.”
Vector Dimension: “Let V be a finite dimensional vector space over the field
𝔽. The Dimension of V denoted dim𝔽 V is the number of vectors in any basis
of V. If V is an infinite dimensional vector space over 𝔽 then we write dim𝔽 V =∞”.
We can represent a “n” dimension vector as “nD”.”
i.e. If n=2 then “2D” is used to represent a 2-dimensional vector.
Vector Space or Bivectors: “a space consisting of vectors, together with the
associative and commutative operation of addition of vectors, and the
associative and distributive operation of multiplication of vectors by scalars.”
Multi-vector: “a mathematical structure comprising a linear combination of
elements of different grade, such as scalars, vectors, bivectors, tri-vector, etc.”
To show how Clifford Geometric Algebra is represented in math, I will use two
dimensional (2D) vectors for inner product, outer product, and geometric product
representations [21] [60].
2.3.1 Geometric Algebra Overview
Geometric Algebra combines the work of Hamilton (Quartenion) and Grassman
(Non-Commutative Algebra) into a field that generalizes the product of two vectors,
including the 3-dimensionally restricted “Cross Product” to an n-dimensional
subspace of the vector space (V) over number fields (Z , R , C , N ,etc .) such that the
43
subspace is a product space that allows two vectors to have a “geometric product” as
[60]::
V 1 V 2 ¿V 1∙ V 2+V 1∧V 2 ¿
Where V 1 and V 2 are vectors or multivectors (i.e.: a collection of “blades”). The
peration V 1∧V 2 is known as a “wedge product” or “exterior product.” The operation
V 1 ∙V 2 is the “dot product” or “interior product” (aka. “inner product”).
For a simple pair of two-dimensional vectors:
V 1=a1 e1+a2 e2
V 2=b1 e1+b2 e2
where the set {e1 , e1 } are unit vectors and {ai } , {bi } ,i=1,2 are scalars, the geometric
product follows the rules of Geometric Algebra, as described below:
e i∧ e i=0 e i∧ e j=−e j∧ ei
e i∧ e j=eij (compact notation)
e i∧ e i=0
e i ∙ e i=1
e i ∙ e j=0
Thus, by performing the geometric product of V 1and V 2we have
44
V 1 V 2=[( a1b1 ) e1 ∙ e1⏞ei ∙ ei=1
+( a1b2 ) e1 ∙ e2⏞e i ∙e j=0
+( a2b1 ) e2 ∙ e1⏞e j∙ e i=0
+( a2b2 ) e2 ∙ e2⏞e j ∙ e j=1]⏟̇
product
+
[ (a1 b1 ) e1∧ e1⏞ei∧e i=0
+( a1b2 ) e1∧e2+(a2 b1 ) e2∧ e1⏞e j∧e i=−e i∧e j
+ (a2 b2 ) e2∧ e2⏞e j∧e j ]
⏟wedge product
Resulting in
V 1 V 2=(a1b1+a2b2 )+(a1 b2−b1 a2 ) e1∧e2
The product V 1 V 2 produces a scalar and an object e1∧ e2 which in compact
notation is written as e12 and represents an area created bye1∧ e2 rotation (clockwise)
or −e2∧ e1 in anti-clockwise. The orientation is given by the sign of the term in front
of the e1∧ e2 component.
A versors is product of vectors in the geometric product space which has simpler
inverse characteristics. V=¿ V 1 V 2 V 3… V n
2.3.2 Inner Product
Inner product (also called dot product or scalar product) is synonymous with
transforming
vectors into scalars. Inner product of vectors ‘a’ and ‘b’ is represented by a “a ·b
”.
If ‘a’ and ‘b’ are vectors, defined as: a=(a1e1+a2 e2 ) and b=(b1e1+b2 e2 ) then:
a ·b=(a1 e1+a2 e2 ) · ( b1 e1+b2e2 )
a · b=(a1 b1 e1 · e1+a1b2e1 · e2+a2 b1 e2 ·e1+a2 b2 e2 · e2 )
45
a ·b=a1 b1+a2 b2
Inner product is the magnitude of production of vector quotients. If we were to
reverse the order of the vectors to the inner product, then the resulting value will
always be the same.
a·b=b · a
Example:
W hen a=(2 e1+3 e2 ) and b=( 4 e1+5 e2 )
Then the inner product a ·bis:
a · b=(2e1+3e2 )· ( 4e1+5 e2 )
a · b=(8 e1· e1+10 e1 · e2+12 e2 · e1+15 e2 ·e2 )
a ·b=8+15
a ·b=23
Reversing the order of the vectors, the inner productb · a is:
b · a=(4 e1+5 e2 )· (2e1+3 e2 )
b · a=(8 e1· e1+12 e1 · e2+10 e2 · e1+15 e2 ·e2 )
b ·a=8+15
b ·a=23=a·b 2.3.3 Outer Product
Outer product of vectors ‘a’ and ‘b’ (also called wedge product) is represented by
a “a ⋀ b”. If ‘a’ and ‘b’ are vectors defined as: a=(a1e1+a2 e2 ) and b=(b1e1+b2 e2 )
then:
a ⋀ b=( a1 e1+a2e2 )⋀ (b1e1+b2 e2 )
46
a ⋀ b=( a1 b1 e1 ⋀ e1+a1 b2 e1 ⋀e2+a2 b1 e2 ⋀ e1+a2 b2 e2 ⋀ e2 )
a ⋀ b=( a1 b2 e1 ⋀ e2−a2b1e1 ⋀ e2 )
a ⋀ b=(a1 b2−a2 b1)e1 ⋀ e2
a ⋀ b=(a1 b2−a2 b1)e12
In the above formula the “(a1b2−a2 b1)” represents a coefficient scalar term of the
area of a parallelogram associated with the plane containing the two basis vectors e1
and e2.
Figure 11 - Outer Product
Figure 11 shows outer product. Outer product of two vectors is antisymmetric.
Such that a ⋀ b=−b⋀ a
Example:
‘a’ and ‘b’ are vectors and when a=(2 e1+3 e2 ) and b=( 4 e1+5 e2 ) ;
a ⋀ b=( 2e1+3 e2) ⋀ (4 e1+5e2)
a ⋀ b=( 8 e1 ⋀e1+10 e1 ⋀e2+12 e2 ⋀ e1+15 e2 ⋀ e2 )
a ⋀ b=10 e1 ⋀ e2−12 e1 ⋀ e2
a ⋀ b=−2e1 ⋀ e2 a ⋀ b ¿−2 e12
If we reverse the order of the vectors, then the outer productb ⋀ a is:
47
b ⋀ a=( 4 e1+5e2) ⋀ ( 2 e1+3e2)
b ⋀ a=( 8 e1 ⋀e1+12 e1 ⋀ e2+10 e2 ⋀ e1+15 e2 ⋀ e2 )
b ⋀ a=12 e1 ⋀ e2−10 e1 ⋀ e2
b ⋀ a=2 e1 ⋀ e2 b ⋀ a ¿2e12 −b ⋀ a ¿−2 e12
The math confirms that the outer product is antisymmetric: a ⋀ b ¿ −b ⋀ a 2.3.4 Geometric Product
Geometric product (also called wedge product) of vectors ‘a’ and ‘b’ is
represented by a “ab”. If ‘a’ and ‘b’ are vectors defined as: a=(a1e1+a2 e2 ) and
b=(b1e1+b2 e2 ) then [60]:
As per
V 1 V 2=[( a1b1 ) e1 ∙ e1⏞ei ∙ ei=1
+( a1b2 ) e1 ∙ e2⏞e i ∙e j=0
+( a2b1 ) e2 ∙ e1⏞e j∙ e i=0
+( a2b2 ) e2 ∙ e2⏞e j ∙ e j=1]⏟̇
product
+
[ (a1 b1 ) e1∧ e1⏞ei∧e i=0
+( a1b2 ) e1∧e2+(a2 b1 ) e2∧ e1⏞e j∧e i=−e i∧e j
+ (a2 b2 ) e2∧ e2⏞e j∧e j ]
⏟wedge product
ab=( a1e1+a2 e2) ( b1 e1+b2 e2 )
ab=( a1e1+a2 e2) · (b1 e1+b2 e2 )+( a1 e1+a2e2 )⋀ (b1 e1+b2 e2 )
ab=( a1b1e1 · e1+a1 b2 e1 ·e2+a2 b1 e2· e1+a2b2e2 · e2 )+(a1b1 e1 ⋀ e1+a1b2e1 ⋀ e2+a2b1e2 ⋀ e1+a2b2 e2 ⋀ e2 )
ab=(a¿¿1 b1+a2 b2)+(a1 b2 e1 ⋀ e2−a2 b1 e1 ⋀ e2 )¿
ab=(a1 b1+a2 b2)+(a1b2−a2 b1)e1 ⋀ e2
48
ab=( a1b1+a2 b2 )+(a1 b2−a2b1)e12
The output of the geometric product contains two terms. The first term from the
output “(a1b1+a2 b2 )” is a scalar. The second term “e12” is bivector with a coefficient
of “(a1b2−a2 b1)”.
Geometric product of two vectors is not equal when we change the order of
vectors. Such that ab ≠ bathe exception would be if the vectors are parallel then
ab=ba.
Example:
W hen a=(2 e1+3 e2 ) and b=( 4 e1+5 e2 )
As per
V 1 V 2=[( a1b1 ) e1 ∙ e1⏞ei ∙ ei=1
+( a1b2 ) e1 ∙ e2⏞e i ∙e j=0
+( a2b1 ) e2 ∙ e1⏞e j∙ e i=0
+( a2b2 ) e2 ∙ e2⏞e j ∙ e j=1]⏟̇
product
+
[ (a1 b1 ) e1∧ e1⏞ei∧e i=0
+( a1b2 ) e1∧e2+(a2 b1 ) e2∧ e1⏞e j∧e i=−e i∧e j
+ (a2 b2 ) e2∧ e2⏞e j∧e j ]
⏟wedge product
From above formula
ab=(2 e1+3 e2 ) · ( 4 e1+5 e2 )+( 2 e1+3e2) ⋀ (4 e1+5 e2 )
ab=(8 e1 · e1+10e1 · e2+12e2 · e1+15 e2 · e2 )+( 8 e1 ⋀ e1+10e1 ⋀ e2+12 e2 ⋀e1+15e2 ⋀e2 )
ab=(8+15 )+(10 e1 ⋀ e2−12e1 ⋀ e2)
ab=23−2e1 ⋀ e2 ab ¿23−2e12
49
Reversing the order of the vectors, the outer productba is:
ba=(4 e1+5 e2 )· (2e1+3 e2 )+( 4 e1+5 e2 ) ⋀ (2 e1+3e2 )
ba=(8e1 · e1+12e1· e2+10 e2 · e1+15 e2 · e2 )+(8 e1 ⋀ e1+12e1 ⋀ e2+10 e2 ⋀ e1+15 e2 ⋀ e2 ) ba=8+15+2e1 ⋀ e2
ba=23+2 e12
The math confirms that the ab ≠ ba.
2.3.5 Inverse of Vector
If a vector geometric product A−1L A=1 then A−1L is called the left inverse of
vector A and if AA−1R=1 then A−1R is called the right inverse of vector A . Geometric
product is not commutative, therefore the left inverse and right inverse may or may
not be equal.
2.3.6 Versors
“One type of multivector that lends itself for inversion has the form
A=a1 a2 a3 ... an where a1a2a3 ...an are vectors, and Versor A is their collective
geometric product. Such multi-vectors are called versors.
“Versor A=a1 a2 a3 ... an geometric product of vectors.”
Reverse of versors A is A†=an . . . a3 a2 a1 .
Multiplying A† with A
A† A=(an ... a3 a2 a1 ) (a1a2a3 ...an )
A† A=¿
50
A† A=¿ (¿a1∨¿2 ¿ + ¿a2∨¿2+¿a3∨¿2+. ..+¿ an∨¿2¿¿¿) Furthermore Multiplying A withA†
A A†=(a1a2a3 ... an ) (an ...a3 a2 a1 )
A A†=¿
A A†=¿ (¿a1∨¿2 ¿ + ¿a2∨¿2+¿a3∨¿2+. ..+¿ an∨¿2¿¿¿) A† A=¿ A A† and it is scalar
A A−1=1
We can say A† A A−1=A†
¿¿
A−1=¿ A†
AA †
A−1 A ¿ A†
AA † A ¿ A† AA† A
= 1
For versors implies that A−1L∧A−1R are same.
Suppose A=a is a multivector, if writing A in reverse order A† = a.
A A† = ¿a∨¿2 ¿ A−1 = a−1 = a
¿a∨¿2¿
There for given ab we can derive b multiplying with a−1
aa−1 = 1 a−1 ab = b b= a
¿a∨¿2ab¿ similarly, we can obtaina= b
¿b∨¿2ab=a¿
Example: using versors and inverse we derive component of geometric product.
51
Assume
secret key s1 = 5 is defined as a vector a ¿ (2e1+3 e2 )
data value d1 = 9 is defined as a vector b=( 4 e1+5 e2 )
secret key s2 = 7 is defined as a vector c=( 3e1+4 e2)
W hen a=(2 e1+3 e2 ) , b=( 4 e1+5 e2 ) and c=( 3 e1+4 e2)
ab=(2e1+3 e2 ) · ( 4e1+5 e2 )+( 2 e1+3e2) ⋀ (4 e1+5e2 )
ab=23−2e12
abc=(23−2e12) (3e1+4e2 )
abc=61e1+98 e2
To derive value of b=a a−1 bcc−1 b=¿ (61 e1+98 e2 )(
3e1+4e2
25)
b=¿ 2 e1+3 e2
13(23−2 e12)
b=¿ 113
((46+6)e1+(69−4)e2)
b=4 e1+5e2 . This is foundation for new encryption cipher. How the geometric production
and inverse will play a big role in the development of new cipher using versors.
Versors gives a choice to have multiple vectors in the geometric product which
results two types of output. The intermediate result produced contains a scalar and
a multi-vector. The result of the vectors geometric product is a vector.
52
CHAPTER 3
PROBLEMS AND LIMITATIONS
In this chapter, I will present various security problems with Cloud and SSD
storage. I will present about various types of cyberattacks and discuss the importance
of randomness of encryption methods and its limitations. I evaluate existing
encryption methods and their performance on SSD in the Cloud and the performance
penalties in terms of IOPS. This section will show that encryption methods/techniques
affect workload performance. I used Amazon Web Services (AWS) for this
performance benchmarking. First, I studied the storage (SSD) performance impact
between various storage options provided by AWS without encryption. Next, I
benchmarked workloads with various block sizes, read/write ratio, and encryption
methods on VMs with regular, encrypted SSD, and software encrypted containers.
Also, this chapter will discuss existing encryption methods including homomorphic
encryption methods.
53
3.1 Defining the Problem
In the cloud computing environment, there are several security threats. Cloud
Storage SSDs brings their own strengths and weaknesses. Here I consider the causes,
conditions, and limitations of enterprise cloud storage that can generate security
concerns, to see if there are practical solution(s) to all stages ESD security. I will also
explain how these weaknesses are exploited using cyber-attacks. Currently various
encryption methods are used to handle this problem, but each has its limitations. I will
discuss the limitations and problems of existing and proposed encryption methods
including FHE.
3.1.1 Encryption Security Limitations and Problem
Practicality of Homomorphic Encryption: Practical Homomorphic Encryption
Survey [26] say “A significant amount of research on homomorphic cryptography
appeared in the literature over the last few years; yet the performance of existing
implementations of encryption schemes remains unsuitable for real time applications”.
Due to homomorphic encryption speeds are one of the main reasons for this
conclusion, such as because it takes ranges from 2.5 sec to 2.2 hours to generate the
key, the implementation is complex, noise creation can exceed thresholds, and
bigger key sizes ( 17MB to 2.25GB) require high memory resources; all this becomes
impractical in real systems [3] . Fully Homomorphic Encryption (FHE), is on the
“bleeding edge” of encryption technology. But currently there is no FHE available for
real time applications [26]. There is still a lot of work that needs to be done to have
“production ready” version of FHE.
Execution of Encryption method in the Cloud: The conventional encryption
methods have a couple of issues.
54
Large amount of data that needs to be transferred between the client and the
cloud.
If client is okay to have the encryption key on the cloud, that means the very
item used to decrypt the file will be readily available, in case an attacker gets
into the cloud system, which is clearly a security concern.
If the client chooses to not store the key in the cloud, to update a file; they must
download all the encrypted file, decrypt it, modify it, encrypt it again, and
upload the encrypted file back to the cloud. As the file grows it increases the
overhead on the resources.
This research will focus on deriving production ready secure, efficient, scalable,
and portable homomorphic encryption method focusing on the following section
Encryption Limitations.
3.1.2 Encryption Limitations:
Key Strength: If the data is encrypted, customers must use a key to manage the data
storage process. If the key was generated with low randomness, that will create
weaker security.
Encryption Algorithm: The degree of the system’s security depends on the strength
of the cryptography method and its implementation. Increased computing power
allows hackers to break encryption algorithms that were once considered state of the
art.
Encryption vs Performance: There is very little research on how various encryption
software methodologies impact performance of various workloads on SSD in the
cloud. The problem with these methods is that enterprises use the same encryption
software for all types of workloads and different storage systems. Encrypting and
55
doing regular application workload functions simultaneously will adversely impact the
read write performance of SSD drives.
3.2 Other problems contributed for research motivation
All of the following problems also motivated to do this work but mainly solving
problem mentioned in 3.1.1 section.
SSD Physics: Some SSD vendors implemented their FTL (Flash Translation
Layer) with errors, those errors may prevent full sanitization or may delete all the data
by overwriting the entire visible address space. Overwriting SSD address space is not
always sufficient to sanitize the drive because the data persists, and this is a time-
consuming process [31]. When a file is deleted, from the OS’s perspective it is deleted,
but on the SSD it may remain until garbage collection happens with the TRIM process
[11].
Persistence of Data: When an SSD write occurs, data writes to new cells, but the
data still exists in the old cells until a TRIM is executed [31]. If the key and encrypted
file are stored on the same system, there is a possibility to read the encryption key
from the SSD key storage area [32]. The SSD’s internal design and the way
IO(Input/Output) operations happen are different than HDD’s. Yet, most encryption
software for SSDs was developed using the same cryptographic algorithms that were
used for HDDs. However, this does not account for SSD’s ghost data.
Data Exposure: If the data is not encrypted, then there is a risk of exposing
personal data, this state can pose a security threats while data is at rest or traveling.
The data can be accessed from different devices like PCs, phones, and public
networks, which can each pose a security threat due to malware, adware, and non-
secured public networks if they can be accessed by hacker. Public cloud poses its own
56
security issues due to other cloud security threats like account hijacking, human error,
etc.
Account Hijacking: One of the major security issues for the cloud is account
hijacking, where someone gains access to account credentials and uses them for
nefarious purposes.
Human Error: Human error and negligence can pose a security threat. For
example, not removing the key or plain-text file from the cloud system. In Cloud
computing users must move the key between their system and the cloud. Security
issues can be caused, if the users are not following proper security procedures and
practices; such as writing passwords on sticky notes, forgetting passwords, sharing
passwords, and sharing keys in non-secure way, etc.
3.2.1 Cyber Attacks
There are various attacks can be performed by attackers. One must remember
while designing the encryption cipher should able to protect the data from these
attacks.
Ciphertext-Only: When an attacker has access to ciphertext and nothing else, such as
the key or plaintext, then using statistical methods they can guess the distribution of
characters and use them to reveal the plaintext or secret key. This is called a
Ciphertext-Only attack. This most difficult type of attack for the attacker, since the
attacker has the smallest amount of information [27].
Known-Plaintext: In this case if an attacker will have some of the plaintext/ciphertext
pairs and then they use them to derive the key. This is called a Known-Plaintext
attack. I will show using statistical methods and mathematical operations manipulation
and see how I can able to derive the keys.
57
Chosen-Plaintext: It is similar to a Known-Plaintext attack, but an attacker can
choose and manipulate the plaintext input to the encryption algorithm, then evaluate
the resulting cipher text to obtain the key.
Distinguishing-Attack: The goal of a distinguishing attack is to distinguish the
keystream of the cipher from a truly random sequence. An attacker can distinguish the
cipher output from random data faster than a brute force search is found. This sort of
information can be very valuable to an attacker to reveal the plain-text.
Birthday Attack: A Birthday-Attack is based on the statistical concept of the
Birthday Paradox where a match between two random items increases as the number
of elements to use increases. For example, if there are 23 people in a room the
probability of two people having same birthday increases to 50.7%. This concept is
expounded upon with determining the encryption key (Birthday Attack). While the
numbers are higher, the concept of matching the encryption key is statistically much
higher than the true randomness of the key.
Meet-in-the-Middle Attack: In this method the attacker builds a table with keys and
MACs (Message Authentication Code). A MAC is computed using 50% of the
possible keys of key length on the same plaintext. Then the attacker eavesdrops on
each transaction and compares the cipher with MAC table and reveals the key.
There are several more methods of attacks and cyber threats like spectra and
meltdown. The impact of an attacker finding a key could be devasting; this would
give attackers to access to personal, financial, medical information and prevent access
to this information from authorized users. All of these are a justification to constantly
increase the strength and complexity of ciphers which are an important part of
security [6].
58
3.2.2 Real Randomness
To generate an encryption key, real randomness is critical but extremely hard to
achieve on computer system. Pseudorandom numbers can be generated from the
system’s entropy resources: timing of keystrokes, exact movements of a mouse, and
fluctuations of hard-disk access time [61]. The key generated from randomness of
these sources may become suspect, if an attacker is able to measure those sources and
apply them to simulate the same random number generation; but this is difficult, due
to the amount of entropy generated from these resources.
Timing of a single keystroke will generate 1 to 2 bytes of random data and
cryptographers think that is not enough entropy to thwart off the threat of attacker
determining the key. Better typists have a consistent typing pace, where the timing
between each keystroke will be within milliseconds, limiting frequency of which
keystroke timing can be scanned, so timing of typing data may not be random. In this
example, the attacker may have access to resources such as the computer’s
microphone to hear the keystrokes and determine the timings (pace). Even generating
the randomness using quantum physics force specific patterns that may be prone to
attacks. This is because an attacker can use the RF (Radio Frequency) field to
influence these patterns [55]. Suppose I have a key with 128 bits of random data, this
can still be vulnerable because an attacker can try 2128 computations. This brute force
attack is of growing concern as computation speeds increase.
3.2.3 Storage Security Limitations
This thesis first evaluates the SSD storage security and modern encryption
software for securing the SSD. First, I will discuss the importance of reliability and
integration of the SSD and then I will address security. Cloud storage primarily uses
SSD as storage to achieve performance guarantees. Second, I studied SSD
59
characteristics to understand SSD strengths and performance metrics, when I use
various storage specific encryption methods. By using performance benchmarking, I
want to prove that encryption will impact the performance of read and write operation
of storage.
3.2.4 SSD System Level Induced Limitations
SSD physical structure poses reliability and scalability limitations. This can result
system level limitation like wear leveling (endurance), Bad Block Management, and
Performance. Understanding the SSD limitations can help to determine or derive
better security techniques for the device.
3.2.4.1 Physical Limitations Contribute to Logical (Software) Limitations
This chapter will describe the SSD physical limitations and how they will impact
logical SSD functions. The following four major components of SSD functions will
detail the physical and logical limitations.
3.2.4.2 Physical Level Address Map
In SSD, the address map is applied the same as traditional hard disk drives. The
SSD FTL maintains all the address table information. In figure 12, the top row is the
logical address space and the bottom row is the physical address space. From the
host’s perspective the writes and edits happen in plain sight.
60
Figure 12 - Address Mapping between physical to logical
Due to the limitations of SSD, it does not allow writes on the unused pages in the
block, it instead writes to a new page in a new block, which is assigned in the physical
block (in physical world it is the string). The old pages are not erased, but they are
marked as invalid pages. Writing and rewriting to a cell causes cells to be exposed to
multiple voltage impacts which deteriorates the cell walls, which reduces its life span.
To avoid deterioration of an individual or set of SSD blocks, each rewrite follows a
wear leveling algorithm to make sure all the cells deteriorate consistently. Also, when
the current physical block is full, then another free one is assigned to the logical block.
These changes add mapping addresses to the translation table (address mapping table),
which is also stored on the SSD. The data for this table may be stored on the SSD
itself, that could decrease the storage capacity of the device [11] [62].
Even with the best wear leveling algorithm, bad blocks will be created due to the
inherent limitations of SSD writes and erases. When the blocks are not reliable, they
are called bad blocks; information about these addresses are maintained by the BBM
(Bad Block Management) map. The limitation is keeping the BBM up to date, which
is important for reliability. If the BBM is not maintained with correct information
61
about bad blocks, then the system will try to write to those blocks. The particular data
which is written to bad blocks will not be reliable. Monitoring the BER (Bit Error
Rate) is also important to achieve a reliable system. ECC (Error Correction Code) is
used to maintain the BER, but the ECC engine may cause performance issues, if it is
not designed to perform in parallel for multiple channels. Correcting too many errors
though, will negatively impact the efficiency of the drive [63].
3.2.4.3 Physical Wear Leveling Limitation
TOX (ZrO2) is a dielectric material and its thickness is a limiting factor in SSD.
Floating gate cells will lose their charge over time through TOX, due to the thinness
of the TOX layer. Floating gate cells also experience wear and tear due to additional
stresses caused by voltage fluctuations. Electric charge for “program” (writes)
operations are transferred through the TOX in the form of oxide traps. The
concentration of the traps increases along with each write and erase operation, this
called oxide stress. When electrons leak from a floating gate, these traps are used as a
path for these electrons to travel toward the cell channel region [64]. The number of
electrons leaking through the border of TOX is lower than the electrons traveling
through SILC (stress-induced leakage current). If you have a close distance for SILC
between each tunneling step, it increases the leakage. The TOX thickness scalability
limitation is defined by important factors: the number of traps, SILC, and oxide
voltage of the floating gate cell during retention. It’s been determined that the TOX
thickness must be 8.0-7.5 nm [65].
The floating gate cells should be able to hold a charge for minimum of 10 years.
This was determined based on how much leakage is acceptable in a 10-year time span.
The TOX thickness requirement plays an important role in defining the acceptable
62
leakage. The number of cycles of program/erase operations applied to that cell also
depends on TOX thickness. After about 10 thousand program/erase cycles the cell
voltage threshold shifts upwards which would then require more voltage to do the
operations of the cell. Physically neighboring cells share the same sensing amplifier.
Because of this, a voltage shift in one cell will be used by neighboring cells. But this
could damage cells which do not require more voltage. The effects of cells going bad
will change the over-provisioned cell amount (each SSD is manufactured with more
storage, at least 25% more than the stated amount). Over-provisioned cells play a main
role on endurance, as they decrease the SSD life span also decreases [65].
3.2.4.4 Physical Limitation of Parallelism
When I discuss parallelism in terms of SSD, we are discussing parallelism of the
read, write, and erase operations. The performance of these operations in parallel will
be faster because multiple operations are processed at the same time. There are a
couple of ways to increase the parallelism, one would be increasing the dies per
channel, another would be increasing the number of channels. In increasing the dies
per channel method, this may cause channel overloading and it may not be helpful for
write performance. In increasing the number channels method, this can pose different
Error Correction Codes for each channel, for this it needs dedicated SRAM (Static
RAM). This option is scalable and can increase performance for the read and write
operations. Hence, memory components must be coordinated to operate in parallel.
The serial ‘interface’ is over flash packages which can cause a bottle neck for the
performance.
Other techniques to consider that may improve performance with parallelism: page
size, page spanning process, queueing methods, ganging multiple flash, interleaving
63
between flash, and the background cleaning process. With the page size technique, if
the page size is smaller this will make look up times faster and take less space than if
the page size table were larger. But this may not be good for performance if the data
blocks are not consistently accessed. With the page spanning process technique,
different flash packages can distribute the information to a single or multiple package.
If the data stays on the same package, the results will have faster performance;
otherwise it goes through different packages which will lower the performance. With
the separate queue technique, each package handles parallel requests simultaneously,
this means there is access to all the flash packages at the same time. This process is
scalable and flexible and wear-leveling is maintained equally. The drawback in this is
each queue needs to maintain its own ECC, SRAM, and it also complicates the FTL.
Handling too many ECCs may decrease performance. Ganging multiple flash
packages technique is when SSD algorithms combines multiple flash packages
together, then maintains for that group packages the same queues, ECCs, and FTL. It
handles multi-page requests with a reduced number of queues than the separate queue
technique uses. This processing helps with less overhead for the ECC, but too few
queues to work with, can cause a bottle neck for a busy system. With interleaving in
flash packages all processes occur within a single die to speed up the read and write
operations. To avoid the latency in this process, it can access all related blocks in one
place, which is faster than crossing between flash packages through a serial
connection. The drawback of this process is it may be writing to the same blocks over
and over. When we focus on interleaving the benefits of wear-leveling are lost.
Background cleaning process of SSD happens on packages when the system is not
busy. When the cleaning process occurs, crossing between different packages means
moving the erase blocks from one package to another through the serial connection.
64
This generally is slower than cleaning the same die, but it will maintain wear-leveling.
Each technique has its own pros and cons, so we need to carefully analyze which
technique is better depending on each workload situation [11] [66].
There is another form of parallelism which may improve performance, placing
continuously allocated data from one domain over a set of N domains (A set of flash
memories that share a specific set of resources like channels, queues, and ECCs; that
can be divided into sub-domains as packages) like a stripe using mapping policy.
Most flash memory packages support two-plane operations to read multiple pages
from two planes in parallel and the operation across the dies can be interleaved. Since
logical pages are normally striped over the flash memory array, reading multiple
logical continuous pages in parallel for read ahead can be performed efficiently [11].
Figure 13 - Flashes and their parallel architecture
Most of the SSD operations store two bits per MLC cell. It was theorized that
storing more (3 to 4) bits in each cell would increase the performance. But research
65
showed, the Vth voltage threshold required for the read, write, and erase operations
took longer for 3 and 4 bits than it took for 2 bits per cell. Strategy wise, running
NAND chips in parallel (Figure13) would give the best performance, but it has its own
limitations. More chips require more current flow, and that may not be possible due to
the limiting factor of the maximum allowed current. Also, you need to read these
strings using thousands of reading circuits with lots of sensors, which can make the
process too complex and is more error prone [11].
3.2.4.5 Physical Limitation of Workload Management
In the current market, the SSD for consumer and enterprise versions are different.
Vendors built according to the anticipated workloads. Depending on the workload
requirements, they are built and programed with different designs. The consumer
version does not need as complicated algorithms as does the enterprise version. In the
real world, the consumer version of SSD falls short of the needs of the enterprise
version (Figure 14), in that it does not have algorithms for zero tolerance of data loss,
the uptime reliability, the endurance, the performance, and the error correction code
handling; plus, it does not need to work with multiple I/O operations. Usually
enterprise SSD systems come as pure flash (SSD) storage or hybrid (combined HDD
and SSD) storage. Enterprise SSDs must be able to simultaneously handle workloads
like file, database, email, etc.; that are generated by multiple users with various traffic
patterns. These different traffic patterns are multi-threaded random workloads, they
are handled independently using multiple initiators. Additionally, for enterprise usage
it must maintain consistent I/O throughput (IOPS), integrity, and availability. The
SSD controller needs to be tested thoroughly before it can be placed into enterprise
usage to handle workloads 24/7/365.
66
Figure 14 - Consumer Vs Enterprise SSD
In the case of power failures or other disruptions in a data center the work-loads
must be protected, so enterprise SSD systems are designed to handle those situations
with the help of ECCs and CRCs (Cyclic Redundancy Check). Reliability of the
work-loads is very important, and SSD systems are built using redundancy techniques
(RAID) to cover any hardware failures. If an enterprise wanted to have the higher
performance, they can replace HDD storage with SSD, but it can become expensive.
The details will be discussed in the existing research section [11].
3.2.5 Existing research to mitigate the software limitations
Some of the main limitations in SSD are address mapping, parallelism
(performance), wear leveling, and workload management. The user will not have the
option to change the physical structure of the SSD. They will be limited to software
approaches to mitigate the physical limitations. This section explains the research that
has been done to mitigate these limitations. Most approaches have been focused on
67
improving processes within the FTL. The FTL is a core part of the SSD controller
that maintains a sophisticated address mappings ( Indirect address mappings between
‘physical block address’ and ‘logical block address’), log-like write mechanism, GC
(Garbage Collection), wear leveling, ECC, and over-provisioning [67].
3.2.5.1 Address Mapping
One of the FTL main functions is to maintain a mapping table of virtual addresses
to physical addresses. Write operations can only happen when the block is in a special
state called “Erased”. The erase operations happen at a much coarser spatial
granularity than write operations, since page-level erases are extremely time
consuming [68]. Page-level FTL mapping can provide compact and efficient
utilization of each block, but the issue is that this takes a large amount of printing
paging-table space (32MB SRAM large page table for 16GB Flash) and in some
situations the lookup time will also be higher than calculating the off-set in block-level
mapping. The block-level FTL mapping uses offset to calculate the page number, to
maintain page information it requires just a fraction of the printing page-table space.
However, looking up a page information in this mapping is more time- consuming
than it is in page-level mapping. It also forces the logical page to be mapped to a
physical page within each block. As a result, garbage collection overhead grows. Still
the block level address mapping is the better option to use because it uses a lot less
space [69]. Both schemes are opposite extremes in their weaknesses. This means page
level mapping uses more space for the mapping table while block level mapping
generates more garbage collection [11].
To address this issue, researchers implemented hybrid FTL, which combines
page-level and block-level address mapping in the SRAM. In this method, some of the
68
address table is stored on SRAM while the rest is stored on flash. This results into a
problem with the hybrid FTL approach, because random writes (need to look both
areas for addresses) induce costly garbage collection which it impacts the performance
on subsequent operations. Demand-based page-mapped FTL-DFTL (Demand-based
Flash Translation Layer) addresses this problem in their approach. DFTL stores only
the most recently used address translations on SRAM, while the rest are stored on
flash [69]. The reason for this storage strategy is that most enterprise-scale workloads
exhibit significant temporal locality. However, the DFTL does not support spatial
locality of workloads, which means frequent “evict out” operations will cause extra
erase operations and page mapping lookup overhead for workloads with less temporal
locality. DFTL limits the space to store the page table and it suffers from frequent
updates to the page mapping table in the SSD flash for write intensive workloads and
garbage collection [69]. The CFTL (Convertible Flash Translation Layer) approach
tries not to depend on the space of SRAM. CFTL is a hybrid FTL with efficient
caching strategies and can dynamically change according to data access patterns.
CFTL’s concept is to use read-intensive data managed by block level mapping and
write-intensive data managed by page level mapping. CFTL uses a hot data (data that
is accessed the most by users) identification method to change the page mapping table.
The CFTL uses a bloom-filters-based scheme which can capture recent and frequently
accessed information at a fine-grained level. CFTL considers temporal and spatial
locality of workloads for page level cache. If the page size is large, this means the
chance that a file is spanning to multiple pages is lower; hence, the consecutive field
of CFTL will be less effective [70]. SCFTL (Strategy Caching Flash Translation
Layer) deals with the large page size and the spanning issue of pages. SCFTL stores a
69
page-mapping table in several TPs (translation pages) containing thousands of
physical page numbers and mapped to consecutive logical addresses. SCFTL’s PMT
(page-mapping table) contains TPD (translation page directory) and CMT (cache
mapping table). TPD is in RAM and indexes CMT by the most significant bits of
logical addresses. The performance degradation from offloading the mapping table is
reduced by caching several mapping entries in the CMT. CMT integrates two spatial
locality exploitation techniques and a customized cache replacement policy to enhance
its efficiency of SCFTL. SCFTL performs multilevel page table lookups for address
maps. If there were a cache miss then the request goes to TPs, if a cache miss occurs
there too, then the requested block must get it from flash [71]. CA-SSD (Content
Aware SSD) is a modified FTL that adds minimal support in the form of additional
hardware for hash functions. It uses hashes as values in the mapping table instead of
page information. It also requires battery-backed RAM to store hashes. The drawback
of the approach of CA-SSD is that it depends on battery power and extra hardware
[72].
Implementing encryption on the above approaches will become cumbersome.
When the scholars studied address mapping enhancements, they may have not
considered encryption. The existing research results may not be the same with
encryption and that needs to be studied further.
3.2.5.2 Wear Leveling
Due to the locality in most workloads, writes are often performed over a subset of
blocks (e.g. file system metadata blocks). Some flash memory blocks may be
frequently overwritten and tend to wear out earlier than other blocks [11] [66]. FTLs
usually employ some wear-leveling mechanism to ‘shuffle’ cold blocks with hot
blocks to even out writes over flash memory blocks. There is has been some research
70
with some variations on how to approach wear-leveling in the form of managing
workloads. Researchers approached implementing CAFTL (content aware FTL) for
removing unnecessary duplicate writes to improve the efficiency of garbage
collection, wear-leveling, and reduce the write traffic to flash [73]. One of the
previous researchers came up with an approach to solve the wear-leveling issue by
reusing the flash blocks, which have been cycled to the specified worn out algorithm
SR-FTL (Smart Retirement FTL) [74]. Another approach is to use a dual-pool
algorithm to store cold data to the blocks that have been identified as more worn and
smartly leave them alone until wear leveling takes effect [75].
With all the bodies of research on wear leveling approaches, it is a complex (full
of unknown variables) process and there may never be a perfect solution. That’s
because there are no consistent workflows nor predictable usage of storage. So, the
researchers weigh the pros and cons for various approaches to evaluate the
performance versus endurance versus reliability with different workloads. But the
inherent nature of SSD is to move data around to maintain wear leveling. In doing so,
it leaves valuable data in the invisible address space, even though it is not retrievable
by normal operations, it is still there. Ideally, purging or overwriting the address space
is most desired, but it may create a lot of wear on an SSD. Encrypting the data allows
us to retain existing wear-leveling algorithms without exposing this valuable data.
3.2.5.3 Parallelism
The bandwidth and operation rate of any given flash chip is not enough to achieve
optimal performance. SSD has multiple flash arrays so we can run multiple I/O jobs
concurrently and this will improve the performance of the SSD. A single flash
memory package can only provide limited bandwidth (e.g. 32-40MB/sec). Writes are
slower than reads, other necessary background jobs like garbage collection, wear-
71
leveling, can incur latencies as high as milliseconds [66]. These limitations can be
addressed by SSD’s clever structure that is built with an array of flash memory
packages connected through multiple channels to flash memory controllers to provide
internal parallelism. The logical block addresses as the logical interface to the host
system, and it can stripe over multiple flash memory packages. This way the data
accesses can be conducted independently in parallel, it will provide high bandwidth in
aggregate and hide high latency operations, that combination can result in high
performance [73]. One way is to improve the sequential writes is by dividing the flash
array into banks; each bank will be able to read/write/erase independently. The
performance gains from internal parallelism are highly dependent on how the SSD
internal-mapping and resource management compete for critical hardware resources.
The workloads are in the form of mixing reads and writes, but they interfere with each
other, so proper address mapping management and design of applications is critical.
Most of the applications are designed for HDD storage. When we execute them to an
SSD, this may be not optimal. The critical issues in SSD parallelism include: thin
interface between the storage device and the host, workload access patterns,
asynchronous background operations generated by reads and writes, effect on read
ahead, ill-mapped data layout, and application designs [76]. There are different levels
of parallelisms in SSD: Channel, Package, Die, and Plane. The previous research [76]
concluded that read ahead is not affected by access patterns in MLC-SSD, writes
though are strongly correlated to access patterns. Small size random writes suffer from
high latencies and high interference between reads and writes [76]. Adding a disk
cache helped improve the performance for read and write operations. But background
operations like the erase operation can cause interference with reads and writes and
internal fragmentation is too high for excessive random writes. Studies on the four
72
levels of parallelism such as channel, chip, die, and plane have shown a direct impact
to SSD performance, but they provided limited information, considering that the SSD
structure is a block box. The advanced commands utilize only die and plane levels of
parallelism; they explore how allocation schemes can determine priority order for
multiple levels of parallelism for different types of application loads. The channel-
level parallelism should be given the highest priority order among the four levels and
it was observed that chip level parallelism keeps chips very busy. The service request
can only be handled when chips are idle [76].
Parallelism has the biggest impact on SSD performance. The advantages of
existing parallelism can still be viable even with the addition of encryption
methodologies for storage.
3.2.5.4 Workload Management Integrated with SSD
Performance is highly workload-dependent. Well-designed systems, databases,
and applications improve performance. The following are some of the classic
examples of integrating SSD to systems to achieve better performance [11].
Integrating the SSD into existing system is a complex process. Scalability (replacing
1GB of HDD with 1GB of SSD) is limited by cost effectiveness, because the gains in
performance don’t justify the added expenses. HybridDyn (Integration of HDD and
SSD storage) is an innovative storage design that is cost-effective and improves
performance and endurance. It handles incoming workloads by dynamically
partitioning and distributing them between SSD and HDD. This design showed better
performance than HDD alone [77]. Another research approach is LSM-tree-based
store with an open-channel SSD to utilize channel level parallelism. Level DB (a fast
key-value storage library in LSM-tree-based store) is extended as multi-threaded to
fully utilize the channel level parallelism with evaluating optimal I/O request
73
scheduling and dispatching. Evaluating the utilization of channel level parallelism’s
impact on I/O performance showed that it outperforms conventional SSDs [13].
Another system, Libra tracks the I/O consumption of each tenant; it recognizes the
application’s dynamic I/O usage profiles and provides I/O resources accordingly.
Libra based VOP (virtual I/O operations) captures the non-linear relationship between
SSD I/O bandwidth and I/O operations throughput; it does this while considering the
disk-IO (disk Input Output) cost model [78]. Hadoop workloads showed a
performance increase over HDD alone when an SSD was integrated into the
underlying storage system.
The research showed workloads performance always improved with adding SSD
or just SSD as the storage. SSD is faster than HDD, so adding it to the storage system
it was expected to improve performance. But, in some cases, the applications won’t
able to utilize the SSD performance fully due to the nature write guarantees. This
research studies the impact on performance of the different types of workloads with
the different encryption methodologies.
74
CHAPTER 4
STORAGE ENCRYPTION ANALYSIS
In this section, I showed how the SSD storage performance is affected by storage
type(t2 micro versus i1.xlarge) and encryption software methods I proved that in both
aspects there is performance penalties for workloads.
4.1 Measurement Environment
Each Amazon EC2 (Elastic Compute Cloud) instance can access disk storage from
disks that are physically attached to the host computer. This disk storage is referred to
as an instance store or EBS (Elastic Block Store) volumes. An instance store provides
temporary block-level storage for use with an instance. The size of an Amazon
instance store ranges from 8GB to 48TB, and varies by instance type (i.e., larger
instance types have larger instance stores) for HDD. Using regular SATA SSD, the
storage ranges from 8GB to 6.4TB. If the storage type is NVMe (Non-Volatile
Memory express) SSD, then the storage ranges from 8GB to 16TB.
Amazon EBS provides two volume types: Standard volumes and Provisioned
IOPS volumes, which differ in performance characteristics and price. Standard
volumes offer storage for applications with moderate or burst I/O requirements.
These volumes deliver approximately 100 IOPS on average but can burst up to
hundreds of IOPS. Provisioned IOPS volumes offer storage with consistent and low-
latency performance, which allows users to predictably scale to thousands of I/O
75
operations per second per Amazon EC2 instance. These volume-types are designed
for applications with I/O-intensive workloads. Backed by SSDs, Provisioned IOPS
volumes support up to 30 IOPS per GB, which enables a system to be provisioned up
to a maximum of 4,000 IOPS per volume. While it is possible to stripe multiple
volumes together to achieve up to 48,000 IOPS when attached to larger EC2
instances, but as per theory it may show as regular SSD disk volumes, so we did not
evaluate this type of VMs. When attached to an EBS-optimized instance, Provisioned
IOPS volumes are designed to deliver consistent performance within 10 percent of
the guaranteed rate throughput (Provisioned IOPS) 99.9% of the time. In addition, the
delivered IOPS rate depends on the block size of the various reads and writes.
Amazon Provisioned IOPS volumes process reads and writes in I/O block sizes of
16KB or less with every increase in I/O size above 16KB, linearly increasing. A
significant amount of data was produced during the experiments and it was used to
analyze the main concepts about SSD performance variations with different variables
including encryption methods.
The experiments in this study have been conducted on three different 64-bit VM
(Virtual Machine) instances in Amazon EC2; the first one was an Amazon Linux
AMI (HVM) 2014.03.1 and the remaining two VMs were Amazon Ubuntu Server
16.04 LTS (HVM). The first VM is an instance store (i2.xlarge) of an 800GB SSD,
which can provide up to 36,000 IOPS. The second VM (standard t2.micro) is an 8GB
instance store with 3,000 IOPS. And the third VM (standard t2.micro) is an 8GB
encrypted EBS General Purpose (SSD) Volume Type with 3,000 IOPS.
The first VM is drastically different from the other two (in: memory, vCPUs, and
processor model), I chose those VMs to analyze their unique SSD characteristics. The
second and third VMs are similar (having the same: ECUs, 1GB memory, vCPUs (1),
76
and processor (2.5 GHz, Intel Xeon Family)); the only difference between the two
VMs is one of them is a standard instance store SSD without encryption and the other
VM has an attached EBS SSD volume with encryption.
4.1.1 Selection of Encryption methods
I selected the following two software encryption methods; encrypted SSD and
regular SSD. The following explains each in very high level of them and what type of
algorithm I used in these evaluations.
Dm-crypt:
Dm-crypt is a disk encryption method compatible with Linux kernel version 2.6 or
later. It uses API routines. Devices are mapped to encrypted containers using a device
mapper [51]. This API uses AES-256 cryptographic method along with other
methods. Dm-crypt uses Linux Unified Key Setup (LUKS) to create encrypted
containers which are independent from outside platforms. LUKS was developed by
Clemens Fruhwirth in 2004 [52]. Using this method, the user can even encrypt the
root device. A passphrase is required to create encrypted containers.
There has been some research around the drawbacks of dm-crypt. For example, it has
been discovered that hackers can sidestep the passphrase to access encrypted
containers by hitting the ‘Enter’ key a couple of times. They can also delete the
containers, because deleting containers does not require a passphrase. An intruder can
determine critical components of the hidden containers relatively easily by utilizing
disk commands on the system [53] [54].
BestCrypt:
77
BestCrypt is an encryption software installed on the OS level that care create
encrypted containers or volumes downloaded and created encrypted volumes. Use
them to store secure data with encryption password. These volumes are mounted as
file system to store data. I applied AES encryption algorithm as option to gather
performance statistics [55].
Self-Encrypting Drive (SED):
When Full Drive Encryption (FDE) is applied on an SSD, it is called a Self-
Encrypting Drive (SED). FDE was developed in 2009. It is a literal encryption of the
entire system which includes all the partitions, system files, and operating system.
This encryption method assigns the process to use the hardware component of the
drive. This helps to enhance the security by utilizing the Opal Storage Specification
(which is a set of specification features of SEDs). SED needs a master password for
the SED and a user password for each user. They are stored in the BIOS and handled
by the hard disk controller. SED uses AES 128 and AES 256.
Researchers have found the following vulnerabilities of this method: Hot Plug Attack,
Hot Unplug Attack, Forced Restart Attack, and Key Capture Attack. They have also
shown that attackers can bypass the encryption and access data; this undermines the
purpose of securing the data [56].
4.1.2 Experimental Tools and Workloads
To evaluate the internal parallelism of SSDs by producing the necessary
workloads in this research, FIO (Flexible I/O) Synthetic Benchmarks were used1. FIO
is a tool that generates multi-threaded workloads with different configuration
variables to fully utilize the hardware, such as: a read/write ratio, a block size, and the 1 http://freecode.com/projects/fio
78
number of concurrent jobs. This process produces a report that contains the
bandwidth, the IOPS, the latency, plus many other measurements. I used various SSD
storage device with different I/O workloads to calculate their performance metrics;
each workload was run for 60 seconds using FIO. A sample FIO command is
provided below:
fio --filename=/dmcrypt/4krandreadwrite6040j8 --direct=1 --rw=randrw --size=1024m --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=60 --iodepth=8 --numjobs=8 --runtime=60 --group_reporting --name=4krandreadwrite60j8 --output=/home/output/4kdmcryptrandreadwrite60j8
Sample FIO Command
In the Sample FIO Command, the file size to be written is 1024MB (size=1024m)
in block sizes of 4K (bs=4k). The workload is split between 60 percent random read
(rwmixread=60) and 40 percent random write (=100-60% read) with 8 jobs
(numjobs=8) running in parallel for 60 seconds(runtime=60).
Experiments were executed independently on each virtual machine to fully utilize
the SSD parallelism capability while introducing variations in the block size (4k, 8k,
16k, 32k, 64k, and 128k), the number of parallel jobs (8), and the random read/write
ratio (100 percent reads, 100 percent writes, and 60/40 read/write workloads). These
factors were tested on an unencrypted SSD, different SSD’s with two software-based
encryption methods, and one fully Amazon encrypted SSD. Each experiment was
executed for a total of 60 seconds utilizing the FIO benchmark, version 2.1.7.
The FIO command were in the following order: 100% write, 60/40 read write,
and 100% read. Each one was executed for six different block sizes with 8 number of
jobs. To emulate an enterprise workload environment, I used random read/writes
workload environment. The research is about how these workloads get affected based
on the encryption method and its implementation. A queue depth of eight was
79
selected as sufficient, because only a handful of earlier trials were utilizing a depth
past eight.
4.2 SSD performance without Encryption
I completed lengthy experiments and exposed the knowledge of the internal
structure of SSDs, and background information regarding the storage options within
Amazon EC2. I am now positioned to evaluate the experimental results and answer
several related questions.
I created different types of VM instances using SSDs with different IOPS ranges.
Our research considered all of those to understand the internal characteristics of SSD.
Baseline metrics were created from those experiments to use for performance
comparisons with various encryption implementations.
4.2.1 Performance differences between Amazon EC2 VMs
There were significant differences in the performance between the two Amazon
EC2 instances. While this was expected, it was interesting to validate the actual
performance characteristics of the two different instances versus the specs that
Amazon provided about their VMs.
In Graph 1 and Graph 4, the performance of the i2.xlarge instance consistently
out-performed the t2.micro instance in all experimental runs with all block sizes. In
addition, this difference typically increases as the read/write ratio transition closer to
100 percent reads, regardless of whether evaluating a sequential or random
read/write. This is likely since the instance store volume is physically attached to the
computer to which the EC2 instance is running. Our experiments focused on random
read writes. One of the limitations in this comparison is that the total random reads
and writes were limited to 35,000 IOPS on the i2.xlarge instance and only 3,000
IOPS for the t2.micro instance. This prompted me have a more in-depth comparison
80
between t2.micro instance store and EBS storage volume to perform a more in-depth
comparison of the two different storage mechanisms. Section 4.3.1 I discuss the
results.
4.2.2 Did various block sizes significantly affect I/O throughput?
In both Amazon EC2 instances I observed that as the block size increases the
number of IOPS decreases along with the execution time to complete the required
reading and writing of data by FIO. This is most likely because as the block size
increases, there is less frequent overhead required to manage the writing of larger
blocks. In addition, as expected with increased block sizes, the reading or writing of
data is also completed in increasingly larger chunks. The metrics in Graph 1 plot the
ratio of reads and writes versus the number of IOPS completed for various levels of
block sizes. I can see that IOPS decreased as block size increased; the only exception
was that the 16K 100% read out performed the 8K 100% read.
0 10 20 30 40 50 60 70 80 90 1000
10000
20000
30000
40000
50000
60000
70000
i2.xlarge Block size can affect number of IOPS
4k - rand read
8k - rand read
16k - rand read
4k - rand write
8k - rand write
16k - rand write
Read Percentage
IOP
S
Graph 1 - IOPS Vs Block Size
81
4.2.3 Did various levels of parallelism affect I/O throughput?
Experiments were performed consisting of 8, 16, and 32 threads, or jobs,
operating in parallel on all block sizes. As seen in the Graph 2 (using a block size of
8K), I did not see any significant improvements between 8 threads, 16 threads, or 32
threads; but instead saw a drop in IOPS for the 16 thread and 32 thread simulations.
This may indicate the SSD is saturated after 8 threads and cannot provide any
increase in performance using parallelism. The main observation is that 8 threads or
jobs saturated the SSD parallelism and increasing the jobs did not help.
0 10 20 30 40 50 60 70 80 90 1000
5000
10000
15000
20000
25000
30000
35000
40000
45000i2.xlarge Number of jobs VS IOPS
16 jobs rand read
8 jobs rand read
32 jobs rand read
8 jobs rand write
16 jobs rand write
32 jobs rand write
8 jobs rand read write
16 jobs rand read write
32 jobs rand read write
Read Percentage
KB /
Sec
Graph 2 -Parallelism Vs Throughput
4.2.4 Did random and sequential jobs have a different IOPS?
In Graph 3 , I observed there was no significant difference between the observed
behavior of sequential reads and writes versus those of random reads and writes. The
i2.xlarge instance has been optimized by Amazon for random reads and writes; as it
even performed better than the corresponding sequential reads and writes. This occurs
around 55 percent reads and 45 percent writes and continue until about 90 percent
reads, where sequential outperforms random reads/writes again. The results showed
82
that at 100 percent sequential write it was significantly slower than the equivalent
random write. I hypothesize this is related to garbage collection or trying to understand
the changing of write mode at the FTL level. However, there is no such gain for
random reads/writes on the t2.micro machine. As can be seen in Graph 3, the total
random reads/writes are capped around 3,000 IOPS for 4k or 8k block sizes. This
performance is expected per the performance metrics provisioned by Amazon for the
EBS volume attached to this instance. Additionally, at no time does random
read/write operations outperform sequential read/write operations. This type of
performance is more in line with what is expected from a traditional SSD.
0 10 20 30 40 50 60 70 80 90 1000
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
i2.xlarge Random vs Sequential 8 jobs rand read
8 jobs se-quential read
8 jobs rand write
8 jobs seq write
8 jobs rand read write
8 jobs seq read and write
Read Percentage
KB /
Sec
Graph 3 - Random Versus Sequential Operations
4.2.5 SSD Random Workload Analysis on t2.micro VM
From Section 5.3.1 to 5.3.4, I observed that random and sequential operations are
very close in IOPS. Amazon provides different numbers of IOPS for various types of
SSD VMs. I chose Amazon t2.micro (16.04 LTS HVM, SSD volume type VM
instance store in EC2) as the VM machine. And I used the block sizes (4k, 8k, 16k,
83
32k, 64k, and 128k) on random reads, writes, and read/writes to establish baseline
metrics. These metrics will be used for comparing with different encryption methods
workloads. These experiments were done using random workloads for 100 percent
reads, writes and 60/40 read/writes (Mixed).
4k 8k 16k 32k 64k 128k0
500
1000
1500
2000
2500
3000
3500
IOPS and Random Workloads Without Encryptions
Read IOPS
WriteIOPS
Mixed IOPS
Block Size
IOP
S
Graph 4 - t2.micro Block Size Versus IOPS
4k 8k 16k 32k 64k 128k0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Block Size and No Encrypted SSD Performance
Read IO
Write IO
Mixed IO
Block Size
KB
/Se
c
Graph 5 - t2.micro Block Size Versus KB/Sec
In Graph 4 and Graph 5, it was observed workloads for 100 percent reads, writes
and 60/40 read/writes showed similar IOPS (maximum IOPS Amazon provisioned)
for 4k, 8k, and 16k block sizes. Once it reached 32k the IOPS decreased 40%, 64k
IOPS decreased 60%, and 128k decreased to 85% of the 4k block size IOPS, but as
84
seen in Graph 5, overall reading and writing of data to the disk increased because of
increased block size. I hypothesize that this is related to the block size overhead, but
the increase is not proportional to the block size data input. Also, another important
SSD characteristics I observed was that reads were faster than writes as shown in
Graph 5 for the block sizes: 32k, 64k, and 128k (which were less impacted by
Amazon maximum provisioned IOPS). This type of performance is more in line with
what is expected from a traditional SSD. Going back to Graph 4, the evidence of
Amazon’s data capping is clear at 4k, 8k, and 16k, plus 32k mixed, because the IOPS
hovers around 3,110. I used these metrics as baseline for future comparisons.
4.3 SSD performance with Encryption
In chapter 5.3, I established set of baseline metrics. I then ran the same
experiments with various encryption methods, block sizes, and workloads. I chose not
to vary the number of jobs based on the data described in section 5.3.3, which showed
little difference between 8 jobs versus 16 or 32 jobs. So, I set the number of
jobs/threads to 8 for all block sizes and all workloads. These experiments were
conducted on two different software encryption methods (BestCrypt and Dm-crypt)
and one encrypted SSD by Amazon. Amazon EBS volumes were encrypted with
unique 256-bit key using the AES-256 algorithm. Also, when you snapshot (a way of
cloning storage volumes) these volumes share the same key2. Customers maintain
these keys using their own key management infra-structure.
To execute the experiments, I created a working environment by creating a VM in
Amazon EC2 and installing encryption software and FIO benchmarking software. I
used the same process for both software-based encryption methods. For the encrypted
2 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html
85
SSD, I created a VM and attached encrypted EBS SSD volume to it. The following
graphs (Graph 6 – 16) will show the different encryption methods and their
performance patterns for the different the block sizes versus IOPS and KB/Sec.
4.3.1 Did various block sizes significantly affect IOPS
I observed that as the block size increases the number of IOPS decreases along
with the execution time to complete the required reading and writing of data by FIO
in all types of encryption software methods. The performance metrics from Graph 6,
7, 8 showed a similar decrease of IOPS for all types of encryption methods.
4k 8k 16k 32k 64k 128k0
500
1000
1500
2000
2500
3000
3500
Encrypted SSD Block Size Versus IOPS
Read IOPS WriteIOPS Mixed IOPS
Block Size
IOP
S
Graph 6 - Encrypted SSD Block Size Versus IOPS
86
4k 8k 16k 32k 64k 128k0
50100150200250300350400450500
BestCrypt Block Size Versus IOPS
Read IOPS
WriteIOPS
Mixed IOPS
Block Size
IOPS
Graph 7 - Best Crypt Block Size Vs IOPS
4k 8k 16k 32k 64k 128k0
200
400
600
800
1000
1200
1400
dm-crypt Block Size Versus IOPS
Read IOPS
WriteIOPS
MixedIOPS
Block Size
KB
/Se
c
Graph 8 - Dm-crypt Block Size Vs IOPS
87
One of the main characteristics of SSD is reads outperform writes, but when I use
encryption software they showed the opposite results; writes performed better than
reads (Graph 6 versus , Graph 7, Graph 8 - This is a very significant discovery about
doing encryption on SSDs. This finding may indicate that when using software-based
encryption on an SSD, the decryption (read) process takes more time than the
encryption (write) process.
4.3.2 Did various block sizes affect Performance Throughput
In the previous section, I observed that IOPS decreased as block size increased in
all encryption methods.
4k 8k 16k 32k 64k 128k0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Encrypted EBS SSD Volume Block Size Versus throughput
Read IO
Write IO
Mixed IO
Block Size
KB
/Se
c
Graph 9 - Encrypted EBS SSD Volume Block Size Versus throughput
88
4k 8k 16k 32k 64k 128k0
50000
100000
150000
200000
250000
300000
BestCrypt Block Size Versus Throughput
Read IO
Write IO
Mixed IO
Block Size
KB
/Sec
Graph 10 - BestCrypt Block Size Versus Throughput
4k 8k 16k 32k 64k 128k0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
Dm-Crypt Block Size Versus Throughput
Read IO
Write IO
Mixed IO
Block Size
KB
/Se
c
Graph 11 -Dm-Crypt Block Size Versus Throughput
In the Graph 9, I observed there was no significant difference between reads and
writes versus unencrypted SSD throughput. Also, I observed that at 32k and higher I
do not see any significant throughput increase. In Graph 10 10, using the software
encryption method Best Crypt, I observed the 32k block size had the lowest
performance of all other block sizes. In Graph 11, using software Dm-crypt encryption
I observed that the throughput has a linear increase as the block size increases.
89
4.3.3 Did various Encryptions Versus Performance Throughput
The experiments showed that there is a significant difference in between SSD’s
that use encryption software methods and encrypted SSD.
Without En-cryption
Encrypted EBS SSD
Dm-crypt BestCrypt0
500
1000
1500
2000
2500
3000
3500
Encryption Methods Versus IOPS
Read IOPS
Write IOPS
Mixed IOPS
Encryption Methods
IOP
S
Graph 12 - Encryption Methods versus IOPS
Without En-cryption
Encrypted EBS SSD
Dm-crypt BestCrypt0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
Encryption Methods Versus Throughput
Read IO
Write IO
Mixed IO
Encryption Methods
KB
/Se
c
Graph 13 - Encryption Methods versus Throughput
I did various workload performance metrics experiments for all sizes (4k, 8k, 16k,
32k, 64k, and 128k) of blocks. In Graph 12 and Graph 13, I observed that an
encrypted SSD outperformed software-based encryption methods (graphs show only
for 4k block size). In Graph 12, the encrypted volume showed very similar
performance to regular unencrypted SSD.
90
4.3.4 Reads, Writes and Mixed workloads Versus Block Sizes.
4k 8k 16k 32k 64k 128k0
500
1000
1500
2000
2500
3000
3500
Read workloads - Block Sizes Versus Throughput
Without Encryption
Encrypted EBS SSD
Dm-crypt
BestCrypt
Block Size
IOP
S
Graph 14 - Read workloads for various Block Sizes
4k 8k 16k 32k 64k 128k0
500
1000
1500
2000
2500
3000
3500
Write workloads - Block Sizes Versus Throughput
Without Encryption
Encrypted EBS SSD
Dm-crypt
BestCrypt
Block Size
IOP
S
Graph 15 – Write workloads IOPS for various Block Sizes
91
4k 8k 16k 32k 64k 128k0
500
1000
1500
2000
2500
3000
3500
Mixed Workloads - Block Sizes Versus Throughput
Without Encryption
Encrypted EBS SSD
Dm-crypt
BestCrypt
Block Size
IOP
S
Graph 16 - Mixed Workloads IOPS for Various Block Sizes
Graph 14, Graph 15, and Graph 16 indicates that as the block size increased, the
IOPS decreased. For software encryption methods, block sizes of 64k and 128k had
lower performance than the 4k, 8k, 16k, and 32k. For 128k block size, the encryption
method Best Crypt, 100% reads showed such a low performance that it could be
measured in just a single digit (only 9 IOPS).This was by far the lowest performance
of all the encryption methods.
4.4 Fully Homomorphic Encryption Limitations
4.4.1 FHE with Vector Space
The first simple FHE cipher using multi vectors, called EDCHE (Enhanced Data –
Centric Homomorphic Encryption) was presented by DaSilva. It uses geometric
algebra and multivector spaces Rn , where n is 2 or 3. And these vectors represent the
dimensions of vector space 2D and 3D respectively. When using a 3D vector space, it
will generate an encrypted file that is 8 to 10 times the size of original plaintext file
92
CITATION DASILVA ¿1033[28]. This makes it hard for users to justify this method
for their applications. When creating the most robust secure algorithms, the
cryptographer needs to keep in mind that the algorithms should be simple, efficient,
secure, practical, and able to accommodate computer resources. This gives an
opportunity to develop a new FHE cipher to fulfill these requirements.
4.4.2 Previous homomorphic encryption using multivector technique.
64 128 256 512 10240
20
40
60
80
100
120
140
Key Size and Time in sec on Regular SSD
AES-Crypt Ecnryption Xlg-Crypt Encryption AES Crypt Decryption xlg Crypt Decryption
Graph 17 - Multivector Based Homomorphic Encryption
Graph 17 shows that I used different key sizes ranging from 64 bits to 1024
bits for encryption and decryption. I observed that when comparing the performance
in terms of time; xlg-crypt underperformed than AES-Crypt for full file encryption
and decryption. Xlg separates itself from AES because it is fully homomorphic
encryption and does not need to encrypt/decrypt an entire file on every update. Due
to this unique characteristic, xlg-crypt will outperform AES-Crypt on smaller
updates.
Even though xlg-crypt takes more time to encrypt than AES-Crypt, it offers
additional security features. Such as the unique nature of xlg allows a client to work
with all, some, or even none of the encrypted files from the server. This allows the
system to only expose necessary parts of encrypted files to the client keeping the rest
of the files encrypted and secured on the server. When using Symmetric encryption
93
methods, during any update process the decrypted (plain text) version of the file
exists until it is deleted. As xlg-crypt is homomorphic encryption I do not need to
have any plain text file on VM due to its characteristics.
32 64 128 256 5120
100
200
300
400
500
600
700
800
900
Encrypted file size in MB on Regular SSD VM
AES-Crypt xlg-Crypt
Graph 18 - Multivector based encrypted file sizes
I also observed that when I encrypt a 100 MB file AES-Crypt created a 101 MB
encrypted file while xlg-crypt created an 801MB encrypted file. In general, the xlg-
crypt I have observed 8 times the encrypted file generated from original plain text
file. This is due to xlg-crypt math algorithm calculations it creates bigger encrypted
file. Its output file is 8 times larger, due to that it takes longer time to decrypt. Xlg
uses 3 dimensional and infinite filed space and it causes this growth. Even though it
takes more space, each update does not require a rewrite of cells in the SSD, this is
more aligned with endurance concerns on SSD storage devices.
94
CHAPTER 5
RVTHE
In this chapter I will present a new symmetric homomorphic encryption method
“Reduced Vector Technique Homomorphic Encryption”. This section will discuss its
design, mathematical implementation, and homomorphic properties.
RVTHE (Reduced Vector Technique Homomorphic Encryption) is a new
symmetric homomorphic encryption method and this chapter descripts its design and
homomorphic properties.
5.1 Design of RVTHE
Design of RVTHE depends on Versors. Mathematically a Versor can be represented as:
A=a1 a2 a3 ... an
In the design of RVTHE we will have n−1 vectors as the number of keys and one vector as the data. For example, if n=5 then there would be 4 vectors of keys and 1 vector of data.
Vectors Example 1 Example2a1 Key3 Dataa2 Key1 Key1a3 Data Key4a4 Key2 Key2a5 Key4 Key3
Table 2 - Key and data location in versors
The location of each key and the data are flexible, their locations are determined by the designer.
To reduce the generated cipher text, we must choose two term vectors.
95
5.1.1 RVTHE Encryption and Decryption
Each key is a random generated number that is converted into base 10. We divide
each key and the data into two parts and use them as two terms (coefficients) for each
vector.
5.1.2 Encryption of RVTHE
Once we design the keys and data locations. We perform a geometric product
operation from the first two vectors and that will generate an intermediate ciphertext.
Next, we perform geometric product operation between the intermediate cyphertext
and the next vector, repeat this calculation for each vector. This will generate cipher
text.
E(d1)=s1 s2 s3d1 …sn<<add formula number>>
From above s1 s2 s3 …sn are keys and d1 is data and E is encryption.
5.1.3 Decryption of RVTHE
For the decryption process finding the inverse of the key vectors is critical. First,
we perform a geometric product operation between the cipher text and the inverse of
the key, that will generate an intermediate ciphertext. Next, we perform a geometric
product operation between the intermediate cyphertext and the next vector inverse,
repeat this calculation for each vector. This will generate the plain text.
D(c1)=s3−1 s2
−1 s1−1 s1 s2 s3 d1… sn sn
−1
96
From above s3−1 s2
−1 s1−1… sn
−1 inverse vectors fors1 s2 s3 …sn keys and c1 is data
and D is decryption.
In our implementation we chose to use three vectors, two vectors for keys ( s1 s2¿
and one vector for data(d1).
5.2 Mathematical Implementation of RVTHE Using Versors
In the versors example from section 3.3, while using the vector inverse, we
derived the vector ‘b’ value. If we chose to present the same example in terms of
encryption methods then the a , b ,∧c from the math become s1 , d1 ,∧s2 in the scheme
RVTHE, and they represent the first secret key, the data value, and the second secret
key, respectfully.
RVTHE’s mathematical representation choosing two secret keys and one data is
shown in the format of s1 d1 s2 . In other words, we chose only three vectors a1 , a2 , a3
in our implementation.
Encryption method is represented as ‘E’ and Decryption method represented as
‘D’.
Assume
a1=s1=a
a2=d1=b
a3=s2=c
And assigning these values, where e1∧e2are the vectors.
97
a=(2 e1+3e2 )
b=( 4 e1+5 e2 )
c=( 3 e1+4e2)
Then
For Encryption of d1 which E(d1 ¿=abc
ab=(2e1+3 e2 ) · ( 4e1+5 e2 )+( 2 e1+3e2) ⋀ (4 e1+5e2 )
ab=23−2e12
abc=(23−2e12) (3e1+4e2 )
abc=61e1+98e2
For Decryption to derive d1
D(E(d ¿¿1))=D (abc )=a a−1 bc c−1=d1=¿¿ b
b=¿ (61 e1+98 e2 )(3e1+4 e2
25)
b=¿ 2e1+3e2
13(23−2 e12)
b=¿ 1
13((46+6)e1+(69−4)e2)
b=4 e1+5e2 .
This Encryption implementation is based on Versors providing a new way to
utilize the Geometric Product of Algebra.
5.3 Homomorphism of RVTHE
In this section I will show the properties of Homomorphism of RVTHE.
Homomorphism will have addition, subtraction, multiplication and division properties
[79].
98
5.3.1 Addition
We represent
data 1 ( d1 ) , data2 (d2 ) ,
secret key 1 ( s1 )∧secret key 2(s2)Prove the following
E(d1+d2 ¿=E ( d1 )+E (d2)
Example:
When d1=8∧d2=6 thend1+d2=16
s1=( 2e1+3e2 ), d1=( 4 e1+4 e2 ) and s2=( 3e1+4 e2)
Applying regular geometric product
V 1 V 2=[( a1b1 ) e1 ∙ e1⏞ei ∙ ei=1
+( a1b2 ) e1 ∙ e2⏞e i ∙e j=0
+( a2b1 ) e2 ∙ e1⏞e j∙ e i=0
+( a2b2 ) e2 ∙ e2⏞e j ∙ e j=1]⏟̇
product
+[( a1 b1 ) e1∧ e1⏞ei∧ei=0
+(a1b2) e1∧ e2+( a2 b1 ) e2∧ e1⏞e j∧e i=−ei∧e j
+( a2 b2 ) e2∧ e2⏞e j∧e j ]
⏟wedge product
Then Encryption of
E(d1 ¿=s1 d1 s2
E(d1 ¿=( (2e1+3 e2 ) ( 4 e1+4 e2 ) (3e1+4 e2 ))
E(d1 ¿=44e1+92e2
Then Encryption of
E(d2 ¿=s1 d2 s2
E(d2 ¿=( (2e1+3 e2 ) ( 3 e1+3 e2 ) ( 3 e1+4 e2))
99
E(d2 ¿=33 e1+69 e2
E(d1+d2 ¿=( (2 e1+3 e2 ) (7 e1+7e2 ) (3e1+4 e2 ))
E(d1+d2 ¿=¿ 77 e1+161e2
E(d1 ¿ + E(d2 ¿=77 e1+161 e2
It proves E(d1+d2 ¿=E ( d1 )+E (d2)
5.3.2 Subtraction
E(d1−d2¿=E (d1 )−E (d2)
Example:
When d1=8∧d2=6 thend1−d2=2
s1=( 2e1+3e2 )
d1=( 4 e1+4 e2 )
s2=( 3e1+4 e2)
Then Encryption of
E(d1 ¿=44e1+92e2
E(d2 ¿=33 e1+69 e2
E(d1−d2¿=((2 e1+3 e2 ) (e1+e2 ) (3e1+4 e2 ))
E(d1−d2¿=¿ 11e1+23 e2
E(d1 ¿−¿ E(d2 ¿=11 e1+23 e2
It proves E(d1−d2¿=E (d1 )−E (d2)
5.3.3 Multiplication
In vectors we have scalar multiplication.
d1=8∧scalar r1=2then E (r1 d1)=r 1 E(d1)
100
s1=( 2e1+3e2 ) ,d1=( 4 e1+4 e2 ) and s2=( 3e1+4 e2) Then Encryption of
E(r1 d1¿=(( 2e1+3 e2 ) (8 e1+8e2 ) (3 e1+4 e2 ))
E(r1 d1¿=88 e1+184 e2
E(d1 ¿=44 e1+92 e2
r1 E(d1)¿=88 e1+184e2
This proves E(r1 d1¿=r1 E(d1)¿ for scalar multiplication.
5.3.4 Division
In vectors we have scalar division.
d1=8∧scalar r1=1 /2 then E (r1 d1)=r1 E(d1)
s1=( 2e1+3e2 ) ,d1=( 4 e1+4 e2 ) and s2=( 3e1+4 e2) Then Encryption of
E(r1 d1¿=(( 2e1+3 e2 ) (2 e1+2e2 ) (3 e1+4e2 ))
E(r1 d1¿=22 e1+46e2
E(d1 ¿=44 e1+92 e2
r1 E(d1)¿=22 e1+46 e2
This proves E(r1 d1¿=r1 E(d1)¿ for scalar division.
Design of RVTHE depends on Versors. Mathematically a Versor can be represented as:
A=a1 a2 a3 ... an
5.4 Security of RVTHE
There is a need to make sure this encryption method is good enough in terms of
security. The design of RVTHE depends on versors (two-dimensional vectors),
Geometric Product, and inverse. The dimensions of vectors contribute an extra layer
101
of security. This can be accomplished by using simple mathematical manipulations on
known information. The security of RVTHE is derived from applying mathematical
manipulations on known-plaintext and known-ciphertext and try to derive keys.
<< This section should have more substantative discussion than just one paragraph.>>
CHAPTER 6
IMPLEMENTATION AND EVALUATION OF
RVTHE
In this chapter I will discuss how I converted RVTHE into an executable program.
This section deeply discusses the implementation of RVTHE into various applications
and compares it with AES-Crypt encryption method in terms of speed of encryption
and decryption performance. I evaluated the RVTHE security at high level by
analyzing mathematical operations and performing statistical evaluations on cipher-
texts.
I evaluated encryption, decryption and the ability to update/append real time files
without decrypting and re-encrypting on the RVTHE scheme. I ran these evaluations
on a cloud system provided by one of the leading cloud providers Amazon (AWS
EC2).
6.1 Implementation of RVTHE
AES-Crypt is one of most widely known methods for encrypting individual real-
time files. It offers a high speed and security. I developed the same executable
102
program as one package for both encryption, decryption and append. So, I created an
executable crypto program in ‘C’ language based on the RVTHE scheme like AES-
Crypt program. I executed that on real-time files for encryption, decryption, and
appending new data to the end of the encrypted file without decrypting the original
ciphertext.
I used the following command to run encryption and decryption.
AES-Crypt:
time aescrypt -e -p key plaintext_file_name
time aescrypt -d -p key plaintext_file_name.aes
RVTHE:
time xlg -e -x key1 -y key2 plaintext_file_name
time xlg -d -x key1 -y key2 plaintext_file_name.xlg
time xlg -a -x key1 -y key2 “data” plaintext_file_name.xlg
In the above commands, the ‘-e’ indicates encryption, ‘-d’ indicates decryption,
and ‘-a’ indicates append.
When we used a 512-bit key for the AES-Crypt program we then split that key
into two 256-bit keys for RVTHE (xlg) program. I did this for all key sizes starting
from 64 to 1028-bit key sizes. For evaluation, evaluate I chose 256-bit key size.
6.2 Experimental Systems
Our evaluations have been conducted on a 64-bit Amazon EC2 virtual machine
SSD instance. I chose an instance type of t2-micro, which has 1 vCPU, 1GB
Memory, and 8GB maximum storage. I specifically selected a VM with SSD storage,
103
because SSD has high performance and has become an industry standard for cloud
computing.
AES-Crypt is one of most widely known methods for encrypting individual real-
time files. It offers a high speed and security. I choose it to develop baseline statistics
on speed and output file size after encryption. It is also using an AES algorithm and it
is a symmetric method like RVTHE. I compared them in the context of encryption
speeds, decryption speeds and encrypted file size (disk storage used by cipher text).
In addition, I will also explain the additional security and efficiency benefits that
are unique to a homomorphic encryption method.
6.3 Experimental Evaluations
I ran both executables (AES-Crypt and RVTHE) on various key sizes and files
sizes. The key sizes were 64, 128, 256, 512, and 1024-bit and file sizes were 1MB,
10MB, and 100MB. From that we measured the speed for encryption and decryption
plus the storage size of the encrypted file on the cloud server.
6.3.1 Time measurements on various key sizes
104
64 128 256 512 10240
2
4
6
8
10
12
Key Size and Time in sec on Regular SSD for 100MB file
AES-Crypt Ecnryption RVTHE Encryption AES Crypt Decryption RVTHE Decryption
Key Sizes in Bits
Tim
e in S
ec
Graph 19 - Key size Vs Encryption/Decryption time in Sec
Graph 19 show the key size versus time to create encryption and decryption using
RVTHE and AES-Crypt on regular SSD. For Encryption I, did not observe a sizeable
increase in time for any key size for either encryption method. Across the board,
RVTHE required less time for encryption than AES-Crypt. For decryption the fastest
method was RVTHE at 64-bit. However, the decryption proves for the RVTHE
method took longer as the cipher-text size got bigger. The most commonly used key
size is 256-bit; for that both encryption method speeds are almost same.
Note: Having homomorphic features as in RVTHE means that full file decryption
should be rare. In other words, file updates do not require the full file to be decrypted.
105
6.3.2 Time measurements on various file sizes
1 10 1000
1
2
3
4
5
6
Key Size and Time in sec on Regular SSD -used in paper
AES-Crypt Ecnryption RVTHE Encryption AES Crypt Decryption RVTHE Decryption
File size in MB
Tim
e in S
ec
Graph 20 - File size and Encryption/Decryption times
I chose 256-bit key size for all tests. Encryption and decryption for both RVTHE
and AES-Crypt methods took more time with larger file sizes.
1 10 1000
0.010.020.030.040.050.060.070.08
Key Size and Time in sec on Regular SSD
AES-Crypt Ecnryption Rate RVTHE Encryption Rate AES Crypt Decryption Rate RVTHE Decryption Rate
File size in MB
Tim
e in
Sec
Graph 21-Key Size and Time on Regular SSD
In Graph 20 and Graph 21 showed across the board, RVTHE required less time
for encryption than AES-Crypt. Also, RVTHE performed at about the same rate
106
regardless of the file sizes for encryption. As with encryption, the RVTHE decryption
process performed at about the same regardless of file sizes.
6.3.3 Size measurements on Encrypted Files
64 128 256 512 10240
50
100
150
200
250
Encrypted file size in MB on Encrypted Volume in SSD
AES-Crypt xlg-Crypt
Key Size
Siz
e o
f t
he f
ile i
n M
B
Graph 22 - Encrypted file sizes in MB
Graph 22 shows the output file generated from encryption process is always the
double the of original file for RVTHE whereas AES-Crypt has only 10% penalty.
When you decrypt a 1GB file using AES-Crypt you need an extra 1.2GB of space.
In case of full file decryption, you need 2.2GB of extra space for RVTHE needed but
it will be rare as it allows computation on cipher-text. Security Evaluation of RVTHE
6.4 Security Evaluation of RVTHE
<<Move this the end of chapter 5; security discussion first; then performance>>
There are various attacks that can be performed by attackers. I evaluate RVTHE in
two major type of attacks to show designing the encryption cipher of RVTHE is very
secure.
Ciphertext-Only:
Assume that an attacker has access to the cipher-text produced by RVTHE and
nothing else. In such cases, it is not possible to find the plaintext or secret key by
107
using mathematical and statistical operations. I will show you a high-level evaluation
of this.
I represent data 1 ( d1 ) , data2 (d2 ) , secret key 1 ( s1 )∧secret key 2(s2)
Example:
When d1=8∧d2=6 thend1+d2=14s1=( 2e1+3e2 ), d1=( 4 e1+4 e2 ) , d2=( 3 e1+3e2 )and s2=( 3e1+4 e2)
Applying regular geometric product
V 1 V 2=[( a1b1 ) e1 ∙ e1⏞ei ∙ ei=1
+( a1b2 ) e1 ∙ e2⏞e i ∙e j=0
+( a2b1 ) e2 ∙ e1⏞e j∙ e i=0
+( a2b2 ) e2 ∙ e2⏞e j ∙ e j=1]⏟̇
product
+[( a1 b1 ) e1∧ e1⏞ei∧ei=0
+(a1b2) e1∧ e2+( a2 b1 ) e2∧ e1⏞e j∧e i=−ei∧e j
+( a2 b2 ) e2∧ e2⏞e j∧e j ]
⏟wedge product
Then Encryption of
E(d1 ¿=s1 d1 s2
E(d1 ¿=( (2e1+3 e2 ) ( 4 e1+4 e2 ) (3e1+4 e2 ))
E(d1 ¿=44e1+92e2 = C1
Then Encryption of
E(d2 ¿=s1 d2 s2
E(d2 ¿=( (2e1+3 e2 ) ( 3 e1+3 e2 ) ( 3 e1+4 e2))
E(d2 ¿=33e1+69e2 = C2
Cipher-text c1 and c2is produced by data 1 ( d1 ) , data2 (d2 )by while applying
statistical methods. It is very hard to evaluate because the cipher texts are stored with
108
two dimensional. Even applying statistical and mathematical operations such as
additions and subtractions, I do not see way to derive the keys.
C1 + C2=77 e1+161 e2
C1−C2=11e1+23 e2
Known-Plaintext:
In this case, if an attacker has some of the plaintext/ciphertext pairs, then they use
them to derive the key. This is called a Known-Plaintext attack. I will demonstrate
this using statistical methods and mathematical operations manipulation to show I can
derive the keys.
I represent data 1 ( d1 ) , data2 (d2 ) , secret key 1 ( s1 )∧secret key 2(s2)Example:
When d1=8∧d2=6 thend1+d2=14s1=( 2e1+3e2 ), d1=( 4 e1+4 e2 ) , d2=( 3 e1+3e2 )and s2=( 3e1+4 e2)
Applying regular geometric product
V 1 V 2=[( a1b1 ) e1 ∙ e1⏞ei ∙ ei=1
+( a1b2 ) e1 ∙ e2⏞e i ∙e j=0
+( a2b1 ) e2 ∙ e1⏞e j∙ e i=0
+( a2b2 ) e2 ∙ e2⏞e j ∙ e j=1]⏟̇
product
+[( a1 b1 ) e1∧ e1⏞ei∧ei=0
+(a1b2) e1∧ e2+( a2 b1 ) e2∧ e1⏞e j∧e i=−ei∧e j
+( a2 b2 ) e2∧ e2⏞e j∧e j ]
⏟wedge product
Then Encryption of
E(d1 ¿=s1 d1 s2
E(d1 ¿=( (2e1+3 e2 ) ( 4 e1+4 e2 ) (3e1+4 e2 ))
E(d1 ¿=44e1+92e2 = C1
109
Then Encryption of
E(d2 ¿=s1 d2 s2
E(d2 ¿=( (2e1+3 e2 ) ( 3 e1+3 e2 ) ( 3 e1+4 e2))
E(d2 ¿=33e1+69e2 = C2
Cipher-textC1¿data1 (d1 ) , C2 ¿data2 (d2 )
Performing the following operations on ciphertexts.
C1 + C2=77e1+161 e2
C1−C2=11e1+23 e2
while applying statistical method is very hard to evaluate as I have two keys while
designing RVTHE. Even applying statistical methods and mathematical operations
such as additions and subtractions, I do not able to derive the secret keys because
math is implemented with two keys and two-dimensional vectors. There is a pattern
11,33,44, and 77 but no way I can guess plaintext or keys.
110
CHAPTER 7
LESSONS LEARNED AND FUTURE WORK
7.1 Challenges and Lessons Learned
This section will present the research flow, some of the challenges and issues that
were faced during the study of encryption methods and how I overcame those
challenges.
1. The basis of new research expands on previous work and improves it by
eliminating shortcomings and risks in the same domain. When I began
working on my research, I first educated myself about SSDs using surveys or
previous literature. To further understand these SSDs, I did various
benchmarks to calculate its performance on AWS. After these tests, I was able
to prove that modern SSD’s perform better under random workloads than
previous SSDs. Because there weren’t any deficiencies found to investigate, it
was difficult to find a problem for my research. I felt lost, so I took a step back
and started to think about what area of research I wanted to pursue, and I
started researching existing literature related to SSD security.
2. I deleted all the data from local SSD drive and was able to recreate the file
using freely available recovery software. That itself gave me the first step to
investigate the sanitization of SSD. I demonstrated that SSD has limitations
about sanitization. This gave me some hope to start finding a problem case
which is very critical for any research. I read previous research papers related
to Cloud security and Storage security. Most of the previous work in the
111
domain of security on Storage and Cloud was using encryption as primary
method to protect the data. All this proved that we can’t leave plain text on the
SSD. Still I was not sure whether my findings were enough to pursue this as a
research problem for further study.
3. I thought about finding the various encryption methods for Cloud Storage. Did
research and read about various encryption methods and their weaknesses.
While I am reading about these methods, I learned that there is some overhead
for encrypting the data in the Cloud. I conveyed this to my advisor, and he
provided needed confidence and motivation to do research in this area. With
that support, I started my research vigorously.
4. I started investigating encryption methods and running them in the Cloud to
see how much performance overhead and security issues existed. It was harder
to choose what encryption software should be used for evaluating performance
benchmarking. Over few months, I chose some encryption methods to perform
performance benchmarking. I found that performance decreases between 20 -
50% performance tradeoffs, when we use encryption software. Also, I found
that hidden folders exist for encrypted containers. Now I knew that there are
issues related to encryption methods. My next step was expanding my
knowledge in encryption methods. I met my Professor and he suggested that I
should consider homomorphic encryption. I started reading about
homomorphic encryption method from previously written literature.
Homomorphic encryption allows computation on encrypted files without
decrypting and my thought was that this method would incur less overhead. At
this time, I felt that I was one step closer and confident about my direction. I
had a conversation with my advisor, and he explained about multi-vectors can
112
be used to achieve encryption. With that, I knew my next steps, which was to
learn mathematics in the area of geometric algebra.
5. Mathematical knowledge is needed to create a secure and efficient cipher. My
last work in math was about 25 years ago. Trying to understand what my
advisor was telling me about this new primitive of multi-vectors was not easy,
to say the least. For many months, I started studying geometric algebra to gain
knowledge in that area. Finding an implementation of this math as an
encryption cipher was challenging. By this time, the implementation of cipher
using multi-vectors was completed by another researcher for his master’s
thesis. I read his thesis paper and found and converted his design into
executable program. This multi-vector approach was creating a ciphertext file
which was eight times of the original plaintext file. This big ciphertext file
increased the time of encryption and decryption and this method was going to
be hard to be used on bigger files. I had conversation with my advisor about
the results and he asked me to consider using compression techniques. I
experimented the program on storage with deduplication enabled thinking that
this might reduce the output file size, but the speeds of encryption and
decryption were very low. I learned that compression or any type of
deduplication will add overhead on the top of the encryption. My main
challenge was to improve the multi-vector-based encryption. I was thinking of
this challenge all the time. It took many months and I was getting discouraged,
but I decided to keep on learning. One night around 3:00AM, I woke up with
an idea on how to decrease the size of the encrypted file using versors. I
proved the math by hand and make sure it was feasible. Immediately, I put
113
these thoughts on the paper with all homomorphic properties and sent an email
to my advisor. I learned to keep pushing myself.
6. Once I found out that I can decrease the encrypted file size, it was a challenge
to design the RVTHE because it implements a scalar and multi-vector as
intermediate products before generating the final output vector. I developed
RVTHE into an executable program in C language. It was challenging to find
a comparable encryption method. I found AES-Crypt, which is symmetric and
uses AES encryption, which was perfect match to compare RVTHE.
I enjoyed the process but at times it was very stressful. I experienced how joyful when
we find a solution for a problem. This process made me grow into better person.
7.2 Contributions
<<Move this to conclusion chapter>>This chapter provides the challenges of
coming up with a new encryption method. Next, I discussed about how I designed,
implemented and calculated the performance and security strength of this new
encryption cipher RVTHE.
The goal and purpose of this study is to explain what encryption can do for
devices in terms of tradeoffs between performance and security. The main thought
behind this thesis is that there is a way to have high security and performance without
having to compromise either. This challenge prompted me to study how different
types of encryptions, like Best Crypt, Dm-crypt, and SED, can affect SSD security
and performance. The above-mentioned encryption software has their own drawbacks.
The study has proved these encryptions have performance differences for sequential,
random reads and writes. Most enterprise workloads are random. However, little
research has been done for various workloads like random read writes. With my
114
experiments, I showed that random workload performs better on newer SSD storage
systems. I then evaluated how modern SSDs handle random workloads when using
encryption. Evaluating different workloads with many types of functions for different
encryptions will produce various performance metrics. I went through selected
methods (BestCrypt, Dm-crypt, and encrypted Elastic Block Store volume from
Amazon) to analyze their strengths and weaknesses for SSDs evaluating both security
and performance. This helped to came to conclusion that use of homomorphic
encryption is best suited for each workload in the Cloud.
By applying fully homomorphic encryption, it is possible to achieve cyber security
as it allows a series of new computations on encrypted data. Technically, I can start a
zero bytes file or data and encrypt it and apply homomorphic encryption on that zero
bytes file and never expose or leave a data footprint on the disk. Rivest et al in [20]
first mentioned this idea and Gentry first proposed fully homomorphic encryption [2]
using binary circuits on encrypted data and performed basic mathematical operations.
All other scholars inspired by Gentry’s approach improved his scheme or contributed
new approaches. His theoretical approach of homomorphic encryption providing a
new way to solve security encryption, but his solution was not ready to be applied
easily and thus was impractical.
RVTHE is a new homomorphic symmetric encryption scheme based on Clifford
Geometric Algebra. The foundation for this encryption used mathematics extensions
of versors, geometric product and inverse in the form of language. Geometric Algebra
is a very critical part for developing its design and framework. The design of this new
cipher is simple, but combining versors, geometric product, and inverse will generate
a strong cipher. This is a very powerful and substantial cipher which fulfils the
requirements to build a new cipher. These requirements include the security of the
system defending from various attacks with smaller updates to cipher-text. In this
115
work, I showed how to design the following design principles. Its application in real
world showing the performance and mathematical approach of security defense
towards attacks. I created a measurable benchmark to calculate encryption speeds.
First, I did experiments to understand encryption performance penalties on SSD in the
Cloud and then created an experimental environment to calculate RVTHE
performance.
7.3 Success of work
In this work, I developed a Reduced Vector Technique Homomorphic Encryption
(RVTHE) and it is a symmetric and somewhat homomorphic encryption. RVTHE
was developed based on using Versors and Clifford Geometric Algebra properties.
The evaluation of our implementation shows I can edit/append a file in .001 sec. In
the case of full file encryption, RVTHE is 75% faster on encryption and 25% slower
on decryption compared with encryption software ‘AES-Crypt’. RVTHE generated
ciphertext size that was reduced to 25% from previous approaches, which used multi-
vectors and Clifford Geometric Algebra; RVTHE has the potential to be used on real
workloads. It is a great success as it is faster, more efficient and only takes twice the
size for cipher text.
7.4 Future Work
In cloud computing, homomorphic encryption provides secure computing on
encrypted data. It is an encryption method that allows the users to compute in the
Cloud, without converting the ciphertext into plaintext. In recent years, there is lot of
research and interest in the domain of homomorphic encryption; most of them focus
on asymmetric homomorphic encryptions. Very little research has happened for
symmetric homomorphic encryption. Some applications can use symmetric
116
homomorphic encryption very well. We proposed a very simple cryptographic
primitive which had low time for encryption and decryption.
RVTHE encryption method is a symmetric homomorphic encryption which
supports addition, subtraction, scalar multiplication, and scalar division. RVTHE is
developed using Clifford Geometric Algebra as a foundation. It uses Vectors, Versors,
and the Inverse of a Vector. RVTHE showed promising preliminary results that it is
feasible to apply on real files. The converted RVTHE algorithm was implemented into
a program and executed on various file types and sizes. I also added an appending
feature for the program. A comparison was conducted between AES-Crypt and
RVTHE in terms of time to encrypt, decrypt and generated output file size.
RVTHE was designed with three vectors in the current work. In future research it
can be extended to various designs including more vectors as secret keys in the
algorithm. Then the performance of the algorithm can be calculated. Experimenting
with new designs for encryption and decryption of various file sizes and file types is a
great way to explore the RVTHE. At this time, I added the code for addition, but it
can be expanded to deletion, scalar multiplication, and scalar division. These additions
to the program can be tested to see how that enhances the user experience computing
on encrypted data. RVTHE can expand to various application, OS’s, and hardware.
We can also explore using RVTHE in various applications and databases such as
password stores. Also, leveraging multithreading for computation of encryption and
decryption will improve the performance.
There is a possibility for exploring in the area of scalability and availability of
algorithm. RVTHE has been implemented and applied on various types of files such
as: .txt, .doc, .pdf, .xlsx, and .jpeg. It worked as expected without any noise and
117
holding integrity, but it will be helpful to expand this program for databases and other
application updates like add, update and delete operations. It would be nice to see if
this approach can be used in various device level encryption systems and see if it is
possible to expand to all types of devices including mobile and IoT.
When using Cloud computing, we can segregate the application and computation
part. The RVTHE encryption method can be utilized to perform heavy computation
by outsourcing computation to the Cloud withholding security. I performed a high-
level security analysis for RVTHE but doing a depth security analysis of RVTHE will
enhance its security strength. Any encryption adds overhead on the performance of
reads and writes on SSD.
The theory in Geometric Algebra is vast and there is always room for
improvements to find various encryption methods through a deep learning in the area
of geometric algebra. Geometric algebra can be studied in more depth along with new
encryption algorithms and new ciphers like RVTHE.
Overall, the future work can be presented as the use of the RVTHE encryption
method with different types of technologies and systems because RVTHE is a
symmetric cipher like AES. We can include hardware systems and other features. In
cloud computing, the homomorphic encryption provides secure computing. It does
this by allowing users to compute in the Cloud without converting the cipher text into
plain text. RVTHE (an implementation of homomorphic encryption) satisfies that
requirement efficiently. We can explore using RVTHE in various applications and
databases such as password stores. Geometric algebra can be studied in more depth
and new encryption algorithms/ciphers can still be created.
118
CHAPTER 8
CONCLUSION
SSD contains the following data functions: read, write, erase, purging and securing.
These functions are processed differently than the same functions on an HDD. This
study started showing SSD storage has performance differences for sequential reads,
sequential writes, and random read writes. Most of the enterprise workloads are a mix
of random reads and random writes. This research showed SSD has been changed to
handle and perform better for random workloads over the years. In SSD, writing,
purging, and securing functions are drastically different. Previous research has shown
some data is nearly impossible to completely delete on some SSD. Research showed
that deleted data from an SSD can be restored via recovery(block-level) software.
This ultimately prompted me to study different types of encryptions. These
encryptions include TrueCrypt, DiskCryptor, BestCrypt, Dm-crypt, VeraCrypt, tomb,
BitLocker, and SED. Each of the encryptions was studied to see their weaknesses and
strengths. I selected two software-based encryption methods (BestCrypt, Dm-crypt)
and selected encrypted volumes (Elastic Block Store volume from Amazon) for
further study to analyze their strengths and weaknesses for SSDs security and
performance. I chose sequential read, sequential write, sequential mixed, random read,
random write, random mixed for different blocks sizes (4k,8k,16k,32k,64k,128k).
After evaluating different workloads with different block sizes and percentage of read
119
write ratios for BestCrypt, Dm-crypt encryptions and encrypted Elastic Block Store
volume from Amazon produced various performance metrics. This research presented
IOPS (Input/output Operations Per Second) performance metrics to show how each of
encryption methods impacted different workloads. Results proved that an encryption
can have 20-50% performance decrease on SSD for TrueCrypt and Best Crypt
software. The results showed how modern encryption software methods impact
storage devices such as SSDs in the Cloud. This proved that traditional symmetric
encryption has high performance penalties on workloads.
Any existing symmetric encryption software utilizes the systems resources like
memory and CPU when they encrypt and decrypt, which causes delays in operations.
Securing the data involves two stages: data at rest and data while transiting. “Data at
rest” is the stage before or after sending data to the Cloud. “Data while transiting” is
the stage between sending the data between the Client and the Cloud. The further
study of encryption methods leads to homomorphic encryption approaches. So, we
can’t ignore the possibilities and potential of homomorphic cryptography in cloud
computing environments. Simple homomorphic encryption methods can be made
feasible in cloud computing without sacrificing the security and enhancing the user
experience while performing the operations as need on encrypted data. I conducted a
study of previous existing homomorphic encryption literature and found that most of
them are asymmetric and are very slow. There is no homomorphic encryption which
is faster and can easily implemented on real systems. RVTHE is the solution to this
problem.
Using properties from Clifford Geometric Algebra including Versors, Vectors and
Inverse of Vector, it is possible to design a homomorphic cipher that has simple
120
structure, versality, flexibility of key assignments, and a great speed that rivals
previous approaches.
In conclusion, homomorphic encryption provides secure cloud computing. It does
this by allowing users to compute in the cloud without converting the cipher text into
plain text. RVTHE (an implementation of homomorphic encryption) satisfies this
requirement efficiently.
121
REFERENCES
[1] E. Aïmeur and D. Schőnfeld, "The ultimate invasion of privacy: Identity
theft," in 2011 Ninth Annual International Conference on Privacy, Security
and Trust, 2011.
[2] C. Gentry, "Fully Homomorphic Encryption Using Ideal Lattices," in
Proceedings of the Forty-first Annual ACM Symposium on Theory of
Computing, New York, NY, USA, 2009.
[3] C. Gentry and S. Halevi, "Implementing Gentryś Fully-homomorphic
Encryption Scheme," in Proceedings of the 30th Annual International
Conference on Theory and Applications of Cryptographic Techniques:
Advances in Cryptology, Berlin, 2011.
[4] O. Dictionaries, "Definition of security," [Online]. Available:
https://en.oxforddictionaries.com/definition/security.
[5] R. Kissel, R. Kissel, R. Blank and A. Secretary, "Glossary of key
information security terms," in NIST Interagency Reports NIST IR 7298
Revision 1, National Institute of Standards and Technology, 2011.
[6] N. Ferguson, B. Schneier and T. Kohno, Cryptography Engineering:
122
Design Principles and Practical Applications, Wiley Publishing, 2010.
[7] S. Mauw and M. Oostdijk, "Foundations of Attack Trees," in
Proceedings of the 8th International Conference on Information Security
and Cryptology, Berlin, 2006.
[8] C. E. Shannon, "Communication theory of secrecy systems," The Bell
System Technical Journal, vol. 28, pp. 656-715, Oct 1949.
[9] "Intro-Samsung Elec. Datasheet (K9LBG08U0M).," 2007.
[10] J.-U. Kang, J.-S. Kim, C. Park, H. Park and J. Lee, "A Multi-channel
Architecture for High-performance NAND Flash-based Storage System," J.
Syst. Archit., vol. 53, pp. 644-658, sep 2007.
[11] R. Micheloni, A. Marelli and K. Eshghi, Inside Solid State Drives
(SSDs), Springer Publishing Company, Incorporated, 2012.
[12] B. Bosen, "Full Drive Encryption with Samsung Solid State Drives,"
nov 2010.
[13] P. Wang, G. Sun, S. Jiang, J. Ouyang, S. Lin, C. Zhang and J. Cong,
"An Efficient Design and Implementation of LSM-tree Based Key-value
Store on Open-channel SSD," in Proceedings of the Ninth European
Conference on Computer Systems, New York, NY, USA,, 2014.
[14] D. E. Denning and P. J. Denning, "Data Security," ACM Comput. Surv.,
vol. 11, pp. 227-249, 9 1979.
123
[15] M. Tebaa, S. E. Hajji and A. E. Ghazi, "Homomorphic encryption
method applied to Cloud Computing," in 2012 National Days of Network
Security and Systems, 2012.
[16] S. I. M. O. N. SINGH, The code book : the science of secrecy from
ancient Egypt to quantum cryptography, NEW YORK : ANCHOR
BOOKS, 2000.
[17] J. Nechvatal, E. Barker, L. Bassham, W. Burr and M. Dworkin, "Report
on the development of the Advanced Encryption Standard (AES)," 2000.
[18] N. I. of Standards and T. (NIST), "FIPS Publication 46-2: Data
Encryption Standard," 1993.
[19] J. Nechvatal, E. Barker, D. Dodson, M. Dworkin, J. Foti and E. Roback,
"Status report on the first round of the development of the Advanced
Encryption Standard," Journal of Research of the National Institute of
Standards and Technology, vol. 104, 1999.
[20] R. L. Rivest, L. Adleman and M. L. Dertouzos, "On Data Banks and
Privacy Homomorphisms," Foundations of Secure Computation, Academia
Press, pp. 169-179, 1978.
[21] L. N. Childs, A Concrete Introduction to Higher Algebra, Volume1,
Springer, 1979.
[22] S. Burris and H. P. Sankappanavar, A Course in Universal Algebra-
With 36 Illustrations, 2006.
124
[23] A. Acar, H. Aksu, A. S. Uluagac and M. Conti, "A Survey on
Homomorphic Encryption Schemes: Theory and Implementation," CoRR,
vol. abs/1704.03578, 2017.
[24] A. López-Alt, E. Tromer and V. Vaikuntanathan, "On-the-fly
Multiparty Computation on the Cloud via Multikey Fully Homomorphic
Encryption," in Proceedings of the Forty-fourth Annual ACM Symposium
on Theory of Computing, New York, NY, USA, 2012.
[25] M. Tebaa and S. E. Hajji, "Secure Cloud Computing through
Homomorphic Encryption," CoRR, vol. abs/1409.0829, 2014.
[26] C. Moore, M. OŃeill, E. OŚullivan, Y. Doröz and B. Sunar, "Practical
homomorphic encryption: A survey," in 2014 IEEE International
Symposium on Circuits and Systems (ISCAS), 2014.
[27] B. Schneier, Applied Cryptography (2Nd Ed.): Protocols, Algorithms,
and Source Code in C, New York, NY, USA,: John Wiley & Sons, Inc.,
1995.
[28] D. W. H. A. D. A. Silva, "Fully Homomorphic Encryption over exterior
product spaces," 2017.
[29] K. Zhao, W. Zhao, H. Sun, X. Zhang, N. Zheng and T. Zhang, "LDPC-
in-SSD: Making Advanced Error Correction Codes Work Effectively in
Solid State Drives," in Presented as part of the 11th USENIX Conference
on File and Storage Technologies (FAST 13), San, 2013.
125
[30] P. Huang, P. Subedi, X. He, S. He and K. Zhou, "FlexECC: Partially
Relaxing ECC of MLC SSD for Better Cache Performance," in
Proceedings of the 2014 USENIX Conference on USENIX Annual
Technical Conference, Berkeley, 2014.
[31] M. Wei, L. M. Grupp, F. E. Spada and S. Swanson, "Reliably Erasing
Data from Flash-based Solid State Drives," in Proceedings of the 9th
USENIX Conference on File and Stroage Technologies, Berkeley, 2011.
[32] J. Reardon, S. Capkun and D. Basin, "Data Node Encrypted File
System: Efficient Secure Deletion for Flash Memory," in Proceedings of
the 21st USENIX Conference on Security Symposium, Berkeley, 2012.
[33] Y. Choi, D. Lee, W. Jeon and D. Won, "Password-based Single-file
Encryption and Secure Data Deletion for Solid-state Drive," in Proceedings
of the 8th International Conference on Ubiquitous Information
Management and Communication, New York, NY, USA,, 2014.
[34] N. I. of Standards and Technology, FIPS PUB 46-3: Data Encryption
Standard (DES), pub-NIST:adr,: pub-NIST, 1999.
[35] K. Bhargavan and G. Leurent, "On the Practical (In-)Security of 64-bit
Block Ciphers: Collision Attacks on HTTP over TLS and OpenVPN," in
Proceedings of the 2016 ACM SIGSAC Conference on Computer and
Communications Security, New York, NY, USA,, 2016.
126
[36] M. A. Wright, "Feature: The Advanced Encryption Standard," Netw.
Secur., vol. 2001, pp. 11-13, oct 2001.
[37] N. Ferguson, J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner and
D. Whiting, "Improved Cryptanalysis of Rijndael," in Proceedings of the
7th International Workshop on Fast Software Encryption, London, 2001.
[38] A. Biryukov, O. Dunkelman, N. Keller, D. Khovratovich and A.
Shamir, Key Recovery Attacks of Practical Complexity on AES Variants
With Up To 10 Rounds, 2009.
[39] B. Schneier, "Description of a New Variable-Length Key, 64-bit Block
Cipher (Blowfish)," in Fast Software Encryption, Cambridge Security
Workshop, London, 1994.
[40] A. Biryukov and D. Wagner, Slide Attacks, L. Knudsen, Ed., Berlin,
Heidelber: Springer Berlin Heidelberg, 1999, pp. 245-259.
[41] B. Schneier, J. Kelsey, D. Whiting, D. Wagner and C. Hall, "On the
Twofish Key Schedule," in Proceedings of the Selected Areas in
Cryptography, London, 1999.
[42] N. Ferguson, J. Kelsey, B. Schneier and D. Whiting, "A Twofish
Retreat: Related-Key Attacks Against Reduced-Round Twofish," 2000.
[43] J. J. G. Ortiz and K. J. Compton, "A Simple Power Analysis Attack on
the Twofish Key Schedule," CoRR, vol. abs/1611.07109, 2016.
127
[44] R. Anderson, E. Biham and L. Knudsen, Serpent: A Proposal for the
Advanced Encryption Standard, 1998.
[45] User:Dake commonswiki, "File:Serpent-linearfunction.png," 2005.
[Online]. Available: https://commons.wikimedia.org/wiki/File:Serpent-
linearfunction.png.
[46] M. Hermelin, J. Y. Cho and K. Nyberg, "Multidimensional Linear
Cryptanalysis of Reduced Round Serpent," in Proceedings of the 13th
Australasian Conference on Information Security and Privacy, Berlin,
2008.
[47] J. Rizzo and T. Duong, "Practical Padding Oracle Attacks," in
Proceedings of the 4th USENIX Conference on Offensive Technologies,
Berkeley, 2010.
[48] M. Liskov, R. L. Rivest and D. Wagner, "Tweakable Block Ciphers," in
Proceedings of the 22Nd Annual International Cryptology Conference on
Advances in Cryptology, London, 2002.
[49] L. Martin, "XTS: A Mode of AES for Encrypting Hard Disks," IEEE
Security and Privacy, vol. 8, pp. 68-69, may 2010.
[50] D. A. McGrew and J. Viega, "The Security and Performance of the
Galois/Counter Mode (GCM) of Operation," in Proceedings of the 5th
International Conference on Cryptology in India, Berlin, 2004.
128
[51] Dm-crypt, "Dm-crypt," [Online]. Available:
https://wiki.archlinux.org/index.php/dm-crypt/Device_encryption.
[Accessed 10 12 2016].
[52] C. Fruhwirth, "LUKS- Wikipedia," [Online]. Available:
https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup. [Accessed 2018].
[53] L. s. weakness, "https://thehackernews.com/2016/11/hacking-linux-
system.html," https://thehackernews.com/2016/11/hacking-linux-
system.html. [Online].
[54] d.-c. plausible-deniability, "https://blog.linuxbrujo.net/posts/plausible-
deniability-with-luks/," https://blog.linuxbrujo.net/posts/plausible-
deniability-with-luks/. [Online].
[55] M. Bauer, "Paranoid Penguin: BestCrypt: Cross-platform Filesystem
Encryption," Linux J., vol. 2002, pp. 9--, jun 2002.
[56] B. Daniel and K. Fowler, "Bypassing Self-Encrypting Drives (SED) in
Enterprise Environments," Europe,,, 2015.
[57] packetizer, "AES Crypt or AES-Crypt," 2018. [Online]. Available:
https://www.aescrypt.com.
[58] C. Gentry, "A fully homomorphic encryption scheme," 2009.
[59] W. Wang, Y. Hu, L. Chen, X. Huang and B. Sunar, "Accelerating fully
homomorphic encryption using GPU," in 2012 IEEE Conference on High
Performance Extreme Computing, 2012.
129
[60] J. Vince, Geometric Algebra: An Algebraic System for Computer
Games and Animation, 1st ed., Springer Publishing Company,
Incorporated, 2009.
[61] D. Davis, R. Ihaka and P. Fenstermacher, Cryptographic Randomness
from Air Turbulence in Disk Drives, Y. G. Desmedt, Ed., Berlin, Heidelber:
Springer Berlin Heidelberg, 1994, pp. 114-120.
[62] J. Kim, J. M. Kim, S. H. Noh, S. L. Min and Y. Cho, "A Space-efficient
Flash Translation Layer for CompactFlash Systems," IEEE Trans. on
Consum. Electron., vol. 48, pp. 366-375, may 2002.
[63] R. Micheloni, A. Marelli and R. Ravasio, Error Correction Codes for
Non-Volatile Memories, 1st ed., Springer Publishing Company,
Incorporated, 2010.
[64] J. H. Stathis, "Reliability Limits for the Gate Insulator in CMOS
Technology," IBM J. Res. Dev., vol. 46, pp. 265-286, mar 2002.
[65] P. Olivo, T. N. Nguyen and B. Ricco, "High-field-induced degradation
in ultra-thin SiO2 films," IEEE Transactions on Electron Devices, vol. 35,
pp. 2259-2267, dec 1988.
[66] N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. Manasse and
R. Panigrahy, "Design Tradeoffs for SSD Performance," in USENIX 2008
Annual Technical Conference, Berkeley, 2008.
130
[67] A. Birrell, M. Isard, C. Thacker and T. Wobber, "A Design for High-
performance Flash Disks," New York, NY, USA,, 2007.
[68] F. Chen, D. A. Koufaty and X. Zhang, "Understanding Intrinsic
Characteristics and System Implications of Flash Memory Based Solid
State Drives," in Proceedings of the Eleventh International Joint
Conference on Measurement and Modeling of Computer Systems, New
York, NY, USA,, 2009.
[69] A. Gupta, Y. Kim and B. Urgaonkar, "DFTL: A Flash Translation Layer
Employing Demand-based Selective Caching of Page-level Address
Mappings," in Proceedings of the 14th International Conference on
Architectural Support for Programming Languages and Operating Systems,
New York, NY, USA,, 2009.
[70] D. Park, B. Debnath and D. H. C. Du, "A Workload-Aware Adaptive
Hybrid Flash Translation Layer with an Efficient Caching Strategy," in
2011 IEEE 19th Annual International Symposium on Modelling, Analysis,
and Simulation of Computer and Telecommunication Systems, 2011.
[71] P. Thontirawong, M. Ekpanyapong and P. Chongstitvatana, "SCFTL:
An efficient caching strategy for page-level flash translation layer," in 2014
International Computer Science and Engineering Conference (ICSEC),
2014.
131
[72] A. Gupta, R. Pisolkar, B. Urgaonkar and A. Sivasubramaniam,
"Leveraging Value Locality in Optimizing NAND Flash-based SSDs," in
Proceedings of the 9th USENIX Conference on File and Stroage
Technologies, Berkeley, 2011.
[73] F. Chen, T. Luo and X. Zhang, "CAFTL: A Content-aware Flash
Translation Layer Enhancing the Lifespan of Flash Memory Based Solid
State Drives," in Proceedings of the 9th USENIX Conference on File and
Stroage Technologies, Berkeley, 2011.
[74] P. Huang, G. Wu, X. He and W. Xiao, "An Aggressive Worn-out Flash
Block Management Scheme to Alleviate SSD Performance Degradation," in
Proceedings of the Ninth European Conference on Computer Systems, New
York, NY, USA,, 2014.
[75] L.-P. Chang, "On Efficient Wear Leveling for Large-scale Flash-
memory Storage Systems," in Proceedings of the 2007 ACM Symposium on
Applied Computing, New York, NY, USA,, 2007.
[76] Y. Hu, H. Jiang, D. Feng, L. Tian, H. Luo and C. Ren, "Exploring and
Exploiting the Multilevel Parallelism Inside SSDs for Improved
Performance and Endurance," IEEE Transactions on Computers, vol. 62,
pp. 1141-1155, jun 2013.
132
[77] Y. Kim, A. Gupta, B. Urgaonkar, P. Berman and A. Sivasubramaniam,
"HybridStore: A Cost-Efficient, High-Performance Storage System
Combining SSDs and HDDs," in 2011 IEEE 19th Annual International
Symposium on Modelling, Analysis, and Simulation of Computer and
Telecommunication Systems, 2011.
[78] D. Shue and M. J. Freedman, "From Application Requests to Virtual
IOPs: Provisioned Key-value Storage with Libra," in Proceedings of the
Ninth European Conference on Computer Systems, New York, NY, USA,,
2014.
[79] F. Armknecht, C. Boyd, C. Carr, K. Gjøsteen, A. Jäschke, C. A. Reuter
and M. Strand, A Guide to Fully Homomorphic Encryption, 2015.
[80] S. S. W. Jr, Cryptanalysis of number theoretic ciphers, CRC Press,
2002.
[81] C. Swenson, Modern cryptanalysis: techniques for advanced code
breaking., John Wiley & Sons, 2008.
[82] J. Yi-ming and L. Sheng-li, "The Analysis of Security Weakness in
BitLocker Technology," in Proceedings of the 2010 Second International
Conference on Networks Security, Wireless Communications and Trusted
Computing - Volume 01, Washington, 2010.
[83] J. Suter, Geometric Algebra Primer, 2013.
133
[84] K.-D. Suh, B.-H. Suh, Y.-H. Lim, J.-K. Kim, Y.-J. Choi, Y.-N. Koh, S.-
S. Lee, S.-C. Kwon, B.-S. Choi, J.-S. Yum and others, "A 3.3 V 32 Mb
NAND flash memory with incremental step pulse programming scheme,"
IEEE Journal of Solid-State Circuits, vol. 30, pp. 1149-1156, 1995.
[85] D. Stehlé, Floating-Point LLL: Theoretical and Practical Aspects,
Springer, 2010, pp. 179-213.
[86] D. Stehlé and R. Steinfeld, "Faster Fully Homomorphic Encryption,"
{IACR} Cryptology ePrint Archive, vol. 2010, p. 299, 2010.
[87] D. Stehlé and R. Steinfeld, "Faster Fully Homomorphic Encryption," in
ASIACRYPT, 2010.
[88] R. Snyder, "Some Security Alternatives for Encrypting Information on
Storage Devices," in Proceedings of the 3rd Annual Conference on
Information Security Curriculum Development, New York, NY, USA,,
2006.
[89] B. Schneier, Secrets & Lies: Digital Security in a Networked World, 1st
ed., New York, NY, USA: John Wiley & Sons, Inc., 2000.
[90] V. Rijmen and B. Preneel, "Improved Characteristics for Differential
Cryptanalysis of Hash Functions Based on Block Ciphers," in Fast
Software Encryption: Second International Workshop. Leuven, Belgium,
14-16 December 1994, Proceedings, 1994.
134
[91] N. Palaniswamy, D. M. Dipesh, J. N. D. Kumar and S. G. Raaja,
"Notice of Violation of IEEE Publication Principles Enhanced Blowfish
algorithm using bitmap image pixel plotting for security improvisation," in
2010 2nd International Conference on Education Technology and
Computer, 2010.
[92] E. OŚullivan and F. Regazzoni, "Efficient Arithmetic for Lattice-based
Cryptography: Special Session Paper," in Proceedings of the Twelfth
IEEE/ACM/IFIP International Conference on Hardware/Software
Codesign and System Synthesis Companion, New York, NY, USA, 2017.
[93] R. Olsson, Performance differences in encryption software versus
storage devices, 2012, p. 36.
[94] D. Mittal, D. Kaur and A. Aggarwal, "Secure Data Mining in Cloud
Using Homomorphic Encryption," in 2014 IEEE International Conference
on Cloud Computing in Emerging Markets (CCEM), 2014.
[95] K. Minematsu, "Improved Security Analysis of XEX and LRW Modes,"
in Proceedings of the 13th International Conference on Selected Areas in
Cryptography, Berlin, 2007.
[96] D. N. G. C. R. Micheloni, VLSI-Design of Non-Volatile Memories,
New York,,: (Springer), 2005.
135
[97] D. Micciancio, The Geometry of Lattice Cryptography, A. Aldini and
R. Gorrieri, Eds., Berlin, Heidelber: Springer Berlin Heidelberg, 2011, pp.
185-210.
[98] L. Martin, "XTS: A Mode of AES for Encrypting Hard Disks," IEEE
Security Privacy, vol. 8, pp. 68-69, may 2010.
[99] J.-D. Lee, S.-H. Hur and J.-D. Choi, "Effects of floating-gate
interference on NAND flash memory cell operation," IEEE Electron Device
Letters, vol. 23, pp. 264-266, may 2002.
[100] S. K. Lai, J. Lee and V. K. Dham, "Electrical properties of nitrided-
oxide systems for use in gate dielectrics and EEPROM," in 1983
International Electron Devices Meeting, 1983.
[101] D. Kahng and S. M. Sze, "A floating gate and its application to memory
devices," The Bell System Technical Journal, vol. 46, pp. 1288-1295, jul
1967.
[102] C. Gentry, S. Halevi and N. P. Smart, "Homomorphic Evaluation of the
AES Circuit," in Proceedings of the 32Nd Annual Cryptology Conference
on Advances in Cryptology --- CRYPTO 2012 - Volume 7417, New York,
NY, USA, 2012.
[103] N. Ferguson, J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner and
D. Whiting, "Improved Cryptanalysis of Rijndael," in Proceedings of the
7th International Workshop on Fast Software Encryption, London, 2001.
136
[104] Y. Doröz, J. Hoffstein, J. Pipher, J. H. Silverman, B. Sunar, W. Whyte
and Z. Zhang, "Fully Homomorphic Encryption from the Finite Field
Isomorphism Problem," {IACR} Cryptology ePrint Archive, vol. 2017, p.
548, 2017.
[105] Diskcryptor, "Diskcryptor," 2011. [Online]. Available:
https://diskcryptor.net/wiki/Main_Page.
[106] W. Dai, Y. Doröz, Y. Polyakov, K. Rohloff, H. Sajjadpour, E. Savas
and B. Sunar, "Implementation and Evaluation of a Lattice-Based Key-
Policy ABE Scheme," {IEEE} Trans. Information Forensics and Security,
vol. 13, pp. 1169-1184, 2018.
[107] A. Czeskis, D. J. S. Hilaire, K. Koscher, S. D. Gribble, T. Kohno and B.
Schneier, "Defeating Encrypted and Deniable File Systems: TrueCrypt
V5.1a and the Case of the Tattling OS and Applications," Berkeley, 2008.
[108] J. H. Cheon and D. Stehlé, "Fully Homomorphic Encryption over the
Integers Revisited," {IACR} Cryptology ePrint Archive, vol. 2016, p. 837,
2016.
[109] J. H. Cheon and D. Stehlé, "Fully Homomophic Encryption over the
Integers Revisited," in EUROCRYPT (1), 2015.
[110] K. K. Chauhan, A. K. S. Sanger and A. Verma, "Homomorphic
Encryption for Data Security in Cloud Computing," in 2015 International
Conference on Information Technology (ICIT), 2015.
137
[111] N. Chan, M. F. Beug, R. Knoefler, T. Mueller, T. Melde, M.
Ackermann, S. Riedel, M. Specht, C. Ludwig and A. T. Tilke, "Metal
control gate for sub-30nm floating gate NAND memory," in 2008 9th
Annual Non-Volatile Memory Technology Symposium (NVMTS), 2008.
[112] A. Chakraborti, C. Chen and R. Sion, "POSTER: DataLair: A Storage
Block Device with Plausible Deniability," in Proceedings of the 2016 ACM
SIGSAC Conference on Computer and Communications Security, New
York, NY, USA,, 2016.
[113] Z. Brakerski, C. Gentry and V. Vaikuntanathan, "(Leveled) Fully
Homomorphic Encryption Without Bootstrapping," in Proceedings of the
3rd Innovations in Theoretical Computer Science Conference, New York,
NY, USA, 2012.
[114] E. Biham, O. Dunkelman and N. Keller, "The Rectangle Attack -
Rectangling the Serpent," in Advances in Cryptology – Proceedings of
EUROCRYPT 2001, LNCS 2045, 2001.
[115] E. Biham, "New Types of Cryptanalytic Attacks Using Related Keys,"
in Advances in Cryptology --- Eurocrypt'93, Berl, 1994.
[116] D. Benarroch, Z. Brakerski and T. Lepoint, "FHE over the Integers:
Decomposed and Batched in the Post-Quantum Regime," in Proceedings,
Part II, of the 20th IACR International Conference on Public-Key
Cryptography --- PKC 2017 - Volume 10175, New York, NY, USA, 2017.
138
Appendix A – Cloud Storage SSDIn this Appendix, I will guide you through tThe steps that walk through createing
various types of VMs created selectingwith various types of SSD in Amazon Cloud.
Login into Amazon Cloud and select EC2. Launch instance and select instance type
select i2.xlarge and t2.micro. Both are SSD storage VMs.
1. Visit Amazon Cloud Services EC2 website at
https://us-west-2.console.aws.amazon.com/console.
2. Create a VM following instruction from Amazon.
In the first evaluation I compared between these two types of VMS and proved
storage optimized VMs sure will have better performance. First, I created two type of
VMs in AWS. Instance type i2.xlarge follows:
Instance type t2.micro follows
139
3. Installed FIO benchmark tool.
root@ip-172-31-17-80: /home/ubuntu/Desktop#wget http://brick.kernel.dk/snaps/fio-2.1.10.tar.gz .
root@ip-172-31-17-80:/home/ubuntu/Desktop# gunzip fio-2.1.10.tar.gzroot@ip-172-31-17-80:/home/ubuntu/Desktop# tar -xf fio-2.1.10.tar
Run the following command to calculate benchmarks for performance
fio --filename=/dmcrypt/4krandreadwrite6040j8 --direct=1 --rw=randrw --size=1024m --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=60 --iodepth=8 --numjobs=8 --runtime=60 --group_reporting --name=4krandreadwrite60j8--output=/home/output/4kdmcryptrandreadwrite60j8
Sample Generated output:
Used this output gathered IOPS information for 4k to 1024kb block sizes for 1GB
files. Also calculated the time for sequential and random read writes. I used this
performance metrics to understand SSD characteristics in terms of performance in the
Cloud.
140
Appendix B – Cloud Storage and EncryptionsIn this Appdenix, we demonstrate what are problems in …
After evaluating the storage optimized SSD VM versus regular SSD, and the results
showed storage optimized SSD outperformed regular SSD. After that I evaluated
regular SSD, hardware encrypted SSD and software encryption creating container
performance running the FIO benchmarks to understand the performance of the
penalties of encryption software in Cloud environment.
1. Visit Amazon Cloud Services EC2 website at
https://us-west-2.console.aws.amazon.com/console.
All the VM types are t2.micro (Variable ECUs, 1 vCPU, 2.5 GHz, Intel Xeon
Family, 1 GiB memory, EBS only). Ubuntu Server 16.04 LTS (HVM), SSD Volume
Type - ami-efd0428f
2. Create two VMs following instruction from Amazon.
Instance type t2.micro with regular SSD
141
3. Created Instance type t2.micro with encrypted SSD.
4. Installed an encryption Best crypt to 3GB volume and Dm-Crypt software on
3GB volume on one of the t2.micro regular SSD.
root@ip-172-31-17-80:yum install gcc kernel-devel kernel-headers dkmsroot@ip-172-31-17-80: wget -O /etc/yum.repos.d/bestcrypt.repo https://www.jetico.com/packages/el/bestcrypt.repo
root@ip-172-31-17-80: yum install bestcrypt bestcrypt-panelroot@ip-172-31-17-80: bctool new /root/BestCrypt -a Rijndael -s 3gb -d password root@ip-172-31-17-80: bctool format /root/BestCrypt -t ext3root@ip-172-31-17-80: Enter password:
142
root@ip-172-31-17-80:/sys/block/xvda/queue# apt-get install cryptsetupReading package lists... DoneBuilding dependency treeReading state information... Donecryptsetup is already the newest version.0 upgraded, 0 newly installed, 0 to remove and 50 not upgraded.root@ip-172-31-17-80:/sys/block/xvda/queue# fallocate -l 2048M /root/dmcryptroot@ip-172-31-17-80:/sys/block/xvda/queue# cryptsetup -y luksFormat /root/dmcrypt
WARNING!========This will overwrite data on /root/dmcrypt irrevocably.
Are you sure? (Type uppercase yes): yroot@ip-172-31-17-80:/sys/block/xvda/queue# cryptsetup -y luksFormat /root/dmcrypt
WARNING!========This will overwrite data on /root/dmcrypt irrevocably.
Are you sure? (Type uppercase yes): yesroot@ip-172-31-17-80:/sys/block/xvda/queue# cryptsetup -y luksFormat /root/dmcrypt
WARNING!========This will overwrite data on /root/dmcrypt irrevocably.
Are you sure? (Type uppercase yes): YESEnter passphrase:Verify passphrase:Passphrases do not match.root@ip-172-31-17-80:/sys/block/xvda/queue# cryptsetup -y luksFormat /root/dmcrypt
WARNING!========This will overwrite data on /root/dmcrypt irrevocably.
Are you sure? (Type uppercase yes): YESEnter passphrase:Verify passphrase:root@ip-172-31-17-80:/sys/block/xvda/queue# df -hFilesystem Size Used Avail Use% Mounted onudev 492M 12K 492M 1% /dev
143
tmpfs 100M 384K 99M 1% /run/dev/xvda1 7.8G 3.7G 3.7G 50% /none 4.0K 0 4.0K 0% /sys/fs/cgroupnone 5.0M 0 5.0M 0% /run/locknone 497M 68K 497M 1% /run/shmnone 100M 8.0K 100M 1% /run/userroot@ip-172-31-17-80:/sys/block/xvda/queue# cd /rootroot@ip-172-31-17-80:~# ls -lastotal 2097208 4 drwx------ 8 root root 4096 Apr 9 20:40 . 4 drwxr-xr-x 22 root root 4096 Apr 9 09:06 .. 8 -rw------- 1 root root 6914 Apr 9 13:17 .bash_history 4 -rw-r--r-- 1 root root 3106 Feb 20 2014 .bashrc 4 drwxr-xr-x 3 root root 4096 Apr 9 10:11 BestCrypt 4 drwx------ 2 root root 4096 Apr 9 09:07 .cache 4 drwxr-xr-x 3 root root 4096 Apr 9 10:02 .config 4 drwx------ 3 root root 4096 Apr 9 10:02 .dbus2097156 -rw-r--r-- 1 root root 2147483648 Apr 9 20:41 dmcrypt 4 drwxr-xr-x 2 ubuntu ubuntu 4096 Apr 9 13:09 plain 4 -rw-r--r-- 1 root root 140 Feb 20 2014 .profile 4 drwx------ 2 root root 4096 Apr 9 09:06 .ssh 4 -rw------- 1 root root 648 Apr 9 10:41 .viminforoot@ip-172-31-17-80:~# file /root/dmcrypt/root/dmcrypt: LUKS encrypted file, ver 1 [aes, xts-plain64, sha1] UUID: 5302390d-a47a-47cc-99a7-a846d164197croot@ip-172-31-17-80:~# cryptsetup luksOpen /root/dmcrypt dmcryptEnter passphrase for /root/dmcrypt:root@ip-172-31-17-80:~# df -hFilesystem Size Used Avail Use% Mounted onudev 492M 12K 492M 1% /devtmpfs 100M 388K 99M 1% /run/dev/xvda1 7.8G 3.7G 3.7G 50% /none 4.0K 0 4.0K 0% /sys/fs/cgroupnone 5.0M 0 5.0M 0% /run/locknone 497M 68K 497M 1% /run/shmnone 100M 8.0K 100M 1% /run/userroot@ip-172-31-17-80:~# mkfs.ext4 -j /dev/mapper/dmcryptmke2fs 1.42.9 (4-Feb-2014)Filesystem label=OS type: LinuxBlock size=4096 (log=2)Fragment size=4096 (log=2)Stride=0 blocks, Stripe width=0 blocks131072 inodes, 523776 blocks26188 blocks (5.00%) reserved for the super userFirst data block=0Maximum filesystem blocks=53687091216 block groups32768 blocks per group, 32768 fragments per group
144
8192 inodes per groupSuperblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912
Allocating group tables: doneWriting inode tables: doneCreating journal (8192 blocks): doneWriting superblocks and filesystem accounting information: done
root@ip-172-31-17-80:~# df -hFilesystem Size Used Avail Use% Mounted onudev 492M 12K 492M 1% /devtmpfs 100M 388K 99M 1% /run/dev/xvda1 7.8G 3.7G 3.7G 50% /none 4.0K 0 4.0K 0% /sys/fs/cgroupnone 5.0M 0 5.0M 0% /run/locknone 497M 68K 497M 1% /run/shmnone 100M 8.0K 100M 1% /run/userroot@ip-172-31-17-80:~# mkdir dmcryptmkdir: cannot create directory ‘dmcrypt’: File existsroot@ip-172-31-17-80:~# pwd/rootroot@ip-172-31-17-80:~# cd /root@ip-172-31-17-80:/# mkdir dmcryptroot@ip-172-31-17-80:/# mount /dev/mapper/dmcrypt /dmcryptroot@ip-172-31-17-80:/# df -hFilesystem Size Used Avail Use% Mounted onudev 492M 12K 492M 1% /devtmpfs 100M 388K 99M 1% /run/dev/xvda1 7.8G 3.7G 3.7G 50% /none 4.0K 0 4.0K 0% /sys/fs/cgroupnone 5.0M 0 5.0M 0% /run/locknone 497M 68K 497M 1% /run/shmnone 100M 8.0K 100M 1% /run/user/dev/mapper/dmcrypt 2.0G 3.0M 1.9G 1% /dmcryptroot@ip-172-31-17-80:/#
5. Run FIO benchmarks on these four types of VMs: SSD, Encrypted SSD, Dm-
Crypt container, and Bestcrypt container.
fio --filename=/dmcrypt/4krandreadwrite6040j8 --direct=1 --rw=randrw --size=1024m --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=60 --iodepth=8 --numjobs=8 --runtime=60 --group_reporting --name=4krandreadwrite60j8--output=/home/output/4kdmcryptrandreadwrite60j8
145
6. Sample Generated output:
Used this similar output gathered IOPS information for 4k to 1024kb block sizes
for 1GB files. Also calculated the time for sequential and random read writes for
all VMs. I used this performance metrics to understand SSD characteristics versus
encryption software performance penalties in the Cloud. It proved there is a
performance overhead for software-based encryption versus regular or encrypted
SSDs.
7. Hidden encrypted containers information.
146
In the above command df-h simply can be run by anyone and will show the
encrypted container information may be security concern.
147
Appendix C – Multi-Vector Based EncryptionIn this Appendix, we demonstrate…..
After that I performed survey of Homomorphic encryption techniques. I converted
Multi-Vector based homomorphic encryption proposed by David Williams Honorio
Araujo Da Silva for his Masters’ thesis. Converted that into executable program and
run it on AWSVM on file level encryption. This is a symmetric encryption that’s why
I chose AES-Crypt symmetric encryption software to compare the results. file level
encryption and compared the results. Written similar type of program as AES-Crypt.
1. //2. // main.c3. // XLogos with MPQ4. //
5. #include <stdio.h>6. #include "test_xlg.h"7. #include "test_xlm.h"8. #include "test_xlg_massive_encryption.h"
9. int main(int argc, const char * argv[]) {
10.//====================== TEST XLM ======================11.//test_xlm_set_xlz();12.//test_xlm_set_int();13.//test_xlm_import();14.//test_xlm_encryption_decryption();15.//test_xlm_pack_unpack();
16.//xlg_test_pair_unpair();17.//xlg_test_compression();
18.//====================== TEST XLG ======================19.//test_xlg_encrypt_decrypt_str();20.//test_xlg_encrypt_decrypt_int();21.//test_xlg_encrypt_decrypt_file();22.//test_xlm_encode();23.//test_xlm_decode();
24.encrypt_decrypt_file(argc,argv);
148
25.return 0;26.}
27.//28.// xlg.h
29.#ifndef xlg_h30.#define xlg_h
31.#include <stdio.h>32.#include "xlm.h"33.#include <time.h>
34.struct xlg_t{35.xlm_t key1;36.xlm_t key2;37.xlm_t key1_inverse;38.xlm_t key2_inverse;39.};40.typedef struct xlg_t xlg_t;
41.//============================== INIT and SET ==============================
42.void xlg_init(xlg_t *xlg);43.void xlg_generate_keys(xlg_t *xlg, int key_size);44.void xlg_set_keys(xlg_t *xlg, xlm_t k1, xlm_t k2);
45.//============================== OPERATIONS ==============================
46.void xlg_encrypt(xlm_t *dest_cypher, xlm_t message, xlg_t xlg);
47.void xlg_decrypt(xlm_t *dest_decrypt, xlm_t cypher, xlg_t xlg);
48.//============================== UTILS ==============================
49.void xlg_clear(xlg_t *xlg);50.void xlg_print(xlg_t xlg);
51.#endif /* xlg_h */
52.//53.// xlg.c
54.#include "xlg.h"
55.//============================== INIT and SET ==============================
56.void xlg_init(xlg_t *xlg){57.xlm_init(&xlg->key1);58.xlm_init(&xlg->key2);59.xlm_init(&xlg->key1_inverse);60.xlm_init(&xlg->key2_inverse);61.}
62.void xlg_generate_keys(xlg_t *xlg, int key_size){63.gmp_randstate_t state;
149
64.gmp_randinit_default(state);65.time_t t;66.gmp_randseed_ui(state, time(&t));
67.mpz_t key1_z;68.mpz_init(key1_z);69.mpz_urandomb(key1_z, state, key_size);70.xlm_t key1_m;71.xlm_init(&key1_m);72.xlm_set_z(&key1_m, key1_z);
73.mpz_t key2_z;74.mpz_init(key2_z);75.mpz_urandomb(key2_z, state, key_size);76.xlm_t key2_m;77.xlm_init(&key2_m);78.xlm_set_z(&key2_m, key2_z);
79.xlg_set_keys(xlg, key1_m, key2_m);
80.//Clean up81.mpz_clear(key1_z);82.mpz_clear(key2_z);83.xlm_clear(&key1_m);84.xlm_clear(&key2_m);85.gmp_randclear(state);86.}
87.void xlg_set_keys(xlg_t *xlg, xlm_t k1, xlm_t k2){88.//Set keys89.xlm_set(&xlg->key1,k1);90.xlm_set(&xlg->key2,k2);
91.//Set inverse keys92.xlm_t key1_inverse;93.xlm_init(&key1_inverse);94.xlm_set(&key1_inverse,k1);95.xlm_inverse(&key1_inverse);96.xlm_set(&xlg->key1_inverse, key1_inverse);97.xlm_clear(&key1_inverse);
98.xlm_t key2_inverse;99.xlm_init(&key2_inverse);100.xlm_set(&key2_inverse,k2);101.xlm_inverse(&key2_inverse);102.xlm_set(&xlg->key2_inverse, key2_inverse);103.xlm_clear(&key2_inverse);
104.}
105.//============================== OPERATIONS ==============================
106.void xlg_encrypt(xlm_t *dest_cypher, xlm_t message, xlg_t xlg){107.//Encrypt108.xlm_t gp1_encryption;109.xlm_init(&gp1_encryption);110.gmp_printf("Key: %Qd + %Qd\n", xlg.key1.m0, xlg.key1.m1);111.gmp_printf("Message: %Qd + %Qd\n", message.m0, message.m1);112.xlm_geometric_product(&gp1_encryption,&xlg.key1,&message);
150
113.xlm_t cypher;114.xlm_init(&cypher);115.xlm_geometric_product_bivector(&cypher,&gp1_encryption,&xlg.key2)
;116.gmp_printf("Encrypted Values: %Qd + %Qd\n", cypher.m0,
cypher.m1);117.xlm_set(dest_cypher,cypher);
118.//Clean up119.xlm_clear(&gp1_encryption);120.xlm_clear(&cypher);121.}
122.void xlg_decrypt(xlm_t *dest_decrypt, xlm_t cypher, xlg_t xlg){123.//Decrypt124.xlm_t gp1_decryption;125.xlm_init(&gp1_decryption);126.gmp_printf("\nKey1 Inverse: %Qd + %Qd\n", &xlg.key1_inverse.m0,
&xlg.key1_inverse.m1);127.gmp_printf("Key2 Inverse: %Qd + %Qd\n", &xlg.key2_inverse.m0,
&xlg.key2_inverse.m1);128.xlm_geometric_product(&gp1_decryption,&cypher,&xlg.key2_inverse);129.gmp_printf("Decrypted Values: %Qd + %Qd\n", gp1_decryption.m0,
gp1_decryption.m1);130.xlm_t decrypt;131.xlm_init(&decrypt);132.xlm_geometric_product_bivector_vector(&decrypt,&xlg.key1_inverse,
&gp1_decryption);133.gmp_printf("Decrypted Values: %Qd + %Qd\n", decrypt.m0,
decrypt.m1);134.xlm_set(dest_decrypt,decrypt);
135.//Clean up136.xlm_clear(&gp1_decryption);137.xlm_clear(&decrypt);
138.}
139.//============================== UTILS ==============================
140.void xlg_clear(xlg_t *xlg){141.xlm_clear(&xlg->key1);142.xlm_clear(&xlg->key2);143.xlm_clear(&xlg->key1_inverse);144.xlm_clear(&xlg->key2_inverse);145.}
146.void xlg_print(xlg_t xlg){147.mpz_t key1;148.mpz_init(key1);149.xlm_get_z(&key1, xlg.key1);
150.mpz_t key2;151.mpz_init(key2);152.xlm_get_z(&key2, xlg.key2);
153.gmp_printf("key 1 => %Zd\n", key1);154.gmp_printf("key 2 => %Zd\n", key2);
151
155.mpz_clear(key1);156.mpz_clear(key2);157.}
158.//159.// xlm.h
160.#ifndef xlm_h161.#define xlm_h
162.#include <stdio.h>163.#include <stdlib.h>164.#include <string.h>165.#include <gmp.h>166.#include <math.h>167.#include "xlg_compression.h"
168.struct xlm_t {169.mpq_t m0;170.mpq_t m1;
171.};172.typedef struct xlm_t xlm_t;
173.//============================== INIT and SET ==============================
174.void xlm_init(xlm_t * dest);175.void xlm_set(xlm_t *dest, xlm_t src);176.void xlm_set_z(xlm_t * dest, mpz_t z);177.void xlm_set_si(xlm_t * dest, signed long int si);178.void xlm_import_str(xlm_t * dest,char* str);179.void xlm_import_str_w_size(xlm_t * dest,char* str, long size);
180.//============================== XLM EXPORT ==============================
181.void xlm_get_z(mpz_t *dest, xlm_t xlm);182.signed long int xlm_get_si(xlm_t xlm);183.char* xlm_export_str(xlm_t xlm, long *buffer_size);
184.//============================== UTILS ==============================
185.void xlm_print(xlm_t m);186.void xlm_clear(xlm_t * m);
187.void xlm_pack(mpz_t dst, xlm_t src);188.void xlm_unpack(xlm_t *dst, mpz_t src);189.size_t xlm_out_raw(FILE* stream, xlm_t src);190.size_t xlm_inp_raw(xlm_t *dst,FILE* stream);
191.//============================== OPERATIONS ==============================
192.void xlm_geometric_product(xlm_t *dest, xlm_t * m0, xlm_t * m1);193.void xlm_geometric_product_bivector(xlm_t *dest, xlm_t * m0,
152
xlm_t * m1);194.void xlm_geometric_product_bivector_vector(xlm_t *dest, xlm_t *
m0, xlm_t * m1);195.void xlm_clifford_conjugation(xlm_t *m);196.void xlm_reverse(xlm_t *m);197.void xlm_amplitude_squared(xlm_t * m);198.void xlm_amplitude_squared_reversed(xlm_t * m);199.void xlm_rationalize(xlm_t *m);200.void xlm_scalar_div(xlm_t * m, mpq_t scalar);201.void xlm_inverse(xlm_t *m);202.void xlm_lambda_0(mpq_t *m0, xlm_t * mv1, xlm_t * mv2);203.void xlm_lambda_1(mpq_t *m1, xlm_t * mv1, xlm_t * mv2);204.void xlm_lambda_0_bivector(mpq_t *m0, xlm_t * mv1, xlm_t *
mv2);205.void xlm_lambda_1_bivector(mpq_t *m1, xlm_t * mv1, xlm_t *
mv2);206.void xlm_lambda_0_bivector_vector(mpq_t *m0, xlm_t * mv1, xlm_t
* mv2);207.void xlm_lambda_1_bivector_vector(mpq_t *m1, xlm_t * mv1, xlm_t
* mv2);
208.#endif /* xlm_h */
209.//210.// xlm.c
211.#include "xlm.h"
212.//============================== INIT and SET ==============================
213.void xlm_init(xlm_t * dest){214.mpq_init(dest->m0);215.mpq_init(dest->m1);
216.}
217.void xlm_set(xlm_t *dest, xlm_t src){218.mpq_set(dest->m0, src.m0);219.mpq_set(dest->m1, src.m1);220.}
221.void xlm_set_z(xlm_t * dest, mpz_t z){222.//Init base and reminder223.mpz_t base;224.mpz_init(base);225.mpz_t reminder;226.mpz_init(reminder);
227.//Compute values228.mpz_div_ui(base,z,2);229.mpz_mod_ui(reminder,z,2);
230.//Get reminder in mpq231.mpq_t reminder_mpq;232.mpq_init(reminder_mpq);
153
233.mpq_set_z(reminder_mpq,reminder);
234.mpq_set_z(dest->m0,base);235.mpq_set_z(dest->m1, base);
236.mpq_add(dest->m1,dest->m1,reminder_mpq);
237.//Adjust coefficients238.if(mpz_cmp_ui(reminder,0) == 0){239.mpq_t mpq_1;240.mpq_init(mpq_1);241.mpq_set_ui(mpq_1,1,1);242.mpq_add(dest->m1, dest->m1, mpq_1);243.mpq_clear(mpq_1);244.}
245.mpz_clear(base);246.mpz_clear(reminder);247.mpq_clear(reminder_mpq);248.}
249.void xlm_set_si(xlm_t * dest, signed long int si){250.mpz_t z;251.mpz_init_set_si(z,si);252.xlm_set_z(dest,z);253.mpz_clear(z);254.}
255.void xlm_import_str(xlm_t * dest,char* str){256.mpz_t z;257.mpz_init(z);258.mpz_import(z,sizeof(str),1,sizeof(str[0]), 0, 0,str);259.xlm_set_z(dest,z);260.mpz_clear(z);261.}
262.void xlm_import_str_w_size(xlm_t * dest,char* str, long size){263.mpz_t z;264.mpz_init(z);265.mpz_import(z,size,1,sizeof(str[0]),0, 0,str);266.xlm_set_z(dest,z);267.mpz_clear(z);268.}
269.//============================== XLM GET ==============================
270.void xlm_get_z(mpz_t *dest, xlm_t xlm){271.mpz_t mpz_m0;mpz_init(mpz_m0);272.mpz_t mpz_m1;mpz_init(mpz_m1);
273.mpz_set_q(mpz_m0,xlm.m0);274.mpz_set_q(mpz_m1,xlm.m1);
275.mpz_add(*dest, mpz_m0, mpz_m1);
276.mpz_clear(mpz_m0);277.mpz_clear(mpz_m1);
154
278.}
279.signed long int xlm_get_si(xlm_t xlm){280.mpz_t z;281.mpz_init(z);282.xlm_get_z(&z,xlm);283.signed long int si = mpz_get_si(z);284.mpz_clear(z);285.return si;286.}
287.char* xlm_export_str(xlm_t xlm, long *buffer_size){288.mpz_t z;289.mpz_init(z);290.xlm_get_z(&z,xlm);
291.//Alloc memory to destination buffer292.long size =sizeof(char);293.long nail = 0;294.long numb = 8*size - nail;295.long count = (mpz_sizeinbase (z, 2) + numb-1) / numb;296.char* buffer;297.buffer = malloc(count * size);
298.if(*buffer_size != NULL){299.*buffer_size =count * size;300.}
301.//Export to buffer302.mpz_export(buffer, NULL, 1, size, 0, nail, z);303.mpz_clear(z);
304.return buffer;305.}
306.//============================== UTILS ==============================
307.void xlm_clear(xlm_t * m){308.mpq_clear(m->m0);309.mpq_clear(m->m1);
310.}
311.void xlm_print(xlm_t m){312.gmp_printf("%+Qd e0 ", m.m0);313.gmp_printf("%+Qd e1 \n", m.m1);
314.}
315.void xlm_pack(mpz_t dst, xlm_t src){316.mpz_t m0_m1;317.mpz_t m0,m1;318.mpz_inits(m0_m1,m0,m1,NULL);
319.//Get mpz values of coefficients320.mpz_set_q(m0,src.m0);321.mpz_set_q(m1,src.m1);
155
322.//Set absolute values323.mpz_abs(m0,m0);324.mpz_abs(m1,m1);
325.//Pair coefficients326.xlg_pair(dst, m0, m1);
327.//Pack signs of coefficients328.unsigned int sings = 0;329.sings = sings + (int)((mpq_cmp_si(src.m0,0,0)<0)? pow(2,7):0);330.sings = sings + (int)((mpq_cmp_si(src.m1,0,0)<0)? pow(2,6):0);
331.mpz_mul_ui(dst,dst,256);332.mpz_add_ui(dst,dst,sings);
333.mpz_clears(m0_m1,m0,m1,NULL);334.}
335.void xlm_unpack(xlm_t *dst, mpz_t src){336.mpz_t m0_m1;337.mpz_t m0,m1;338.mpz_inits(m0_m1,m0,m1,NULL);
339.//Get sings340.mpz_t signs_z;341.mpz_init(signs_z);342.mpz_mod_ui(signs_z,src,256);343.mpz_div_ui(src,src,256);344.unsigned long signs = mpz_get_ui(signs_z);
345.//Unpair coefficients
346.xlg_unpair(m0, m1, src);
347.//Adjust sign348.if((signs & 1) > 0)349.mpz_mul_si(m0,m0,-1);350.if((signs & 2) > 0)351.mpz_mul_si(m1,m1,-1);
352.//Set coefficients353.mpq_set_z(dst->m0,m0);354.mpq_set_z(dst->m1,m1);
355.mpz_clear(signs_z);356.mpz_clears(m0_m1,m0,m1,NULL);
156
357.}
358.size_t xlm_out_raw(FILE* stream, xlm_t src){
359.mpz_t blades[2];360.for (int i = 0; i < 2; i++) {361.mpz_init(blades[i]);362.}
363.mpz_set_q(blades[0],src.m0);364.mpz_set_q(blades[1],src.m1);
365.size_t size = 0;366.for (int i = 0; i < 2; i++) {367.size += mpz_out_raw(stream,blades[i]);368.}
369.for (int i = 0; i < 2; i++) {370.mpz_clear(blades[i]);371.}372.return size;373.}
374.size_t xlm_inp_raw(xlm_t *dst,FILE* stream){
375.mpz_t blades[2];376.for (int i = 0; i < 2; i++) {377.mpz_init(blades[i]);378.}379.size_t rsize = 0;380.for (int i = 0; i < 2; i++) {381.rsize += mpz_inp_raw(blades[i],stream);382.}
383.mpq_set_z(dst->m0,blades[0]);384.mpq_set_z(dst->m1,blades[1]);
385.for (int i = 0; i < 2; i++) {386.mpz_clear(blades[i]);387.}
388.return rsize;389.}
390.//============================== OPERATIONS ==============================
391.void xlm_geometric_product(xlm_t * dest, xlm_t * m0, xlm_t * m1){392.xlm_lambda_0(&dest->m0,m0,m1);393.xlm_lambda_1(&dest->m1,m0,m1);
394.}
157
395.void xlm_geometric_product_bivector(xlm_t * dest, xlm_t * m0, xlm_t * m1){
396.xlm_lambda_0_bivector(&dest->m0,m0,m1);397.xlm_lambda_1_bivector(&dest->m1,m0,m1);
398.}
399.void xlm_geometric_product_bivector_vector(xlm_t * dest, xlm_t * m0, xlm_t * m1){
400.xlm_lambda_0_bivector_vector(&dest->m0,m0,m1);401.xlm_lambda_1_bivector_vector(&dest->m1,m0,m1);
402.}
403.void xlm_lambda_0(mpq_t *m, xlm_t * mv1, xlm_t *mv2){404.mpq_t ma;405.mpq_t mb;
406.mpq_init(ma);407.mpq_init(mb);408.mpq_mul(ma,mv1->m0,mv2->m0);409.mpq_mul(mb,mv1->m1,mv2->m1);410.mpq_add(*m,*m,ma);411.mpq_add(*m,*m,mb);412.mpq_clear(ma);413.mpq_clear(mb);
414.}
415.void xlm_lambda_1(mpq_t *m, xlm_t * mv1, xlm_t *mv2){416.mpq_t ma;417.mpq_t mb;
418.mpq_init(ma);419.mpq_init(mb);420.mpq_mul(ma,mv1->m0,mv2->m1);421.mpq_mul(mb,mv1->m1,mv2->m0);422.mpq_add(*m,*m,ma);423.mpq_sub(*m,*m,mb);424.mpq_clear(ma);425.mpq_clear(mb);
426.}
427.void xlm_lambda_0_bivector(mpq_t *m, xlm_t * mv1, xlm_t *mv2){428.mpq_t ma;429.mpq_t mb;
430.mpq_init(ma);431.mpq_init(mb);
158
432.mpq_mul(ma,mv1->m0,mv2->m0);433.mpq_mul(mb,mv1->m1,mv2->m1);434.mpq_add(*m,*m,ma);435.mpq_add(*m,*m,mb);436.mpq_clear(ma);437.mpq_clear(mb);
438.}
439.void xlm_lambda_1_bivector(mpq_t *m, xlm_t * mv1, xlm_t *mv2){440.mpq_t ma;441.mpq_t mb;
442.mpq_init(ma);443.mpq_init(mb);444.mpq_mul(ma,mv1->m0,mv2->m1);445.mpq_mul(mb,mv1->m1,mv2->m0);446.mpq_add(*m,*m,ma);447.mpq_sub(*m,*m,mb);448.mpq_clear(ma);449.mpq_clear(mb);
450.}
451.void xlm_lambda_0_bivector_vector(mpq_t *m, xlm_t * mv1, xlm_t *mv2){
452.mpq_t ma;453.mpq_t mb;
454.mpq_init(ma);455.mpq_init(mb);456.mpq_mul(ma,mv1->m0,mv2->m0);457.mpq_mul(mb,mv1->m1,mv2->m1);458.mpq_add(*m,*m,ma);459.mpq_sub(*m,*m,mb);460.mpq_clear(ma);461.mpq_clear(mb);
462.}
463.void xlm_lambda_1_bivector_vector(mpq_t *m, xlm_t * mv1, xlm_t *mv2){
464.mpq_t ma;465.mpq_t mb;
466.mpq_init(ma);467.mpq_init(mb);468.mpq_mul(ma,mv1->m1,mv2->m0);469.mpq_mul(mb,mv1->m0,mv2->m1);470.mpq_add(*m,*m,ma);471.mpq_add(*m,*m,mb);472.mpq_clear(ma);473.mpq_clear(mb);
159
474.}
475.void xlm_clifford_conjugation(xlm_t *m) {476.mpq_t minus_one;477.mpq_init(minus_one);478.mpq_set_si(minus_one,1,-1);
479.//mpq_mul(m->m0,m->m0,minus_one);480.//mpq_mul(m->m1,m->m1,minus_one);481.//mpq_mul(m->m1,m->m1,minus_one);
482.mpq_clear(minus_one);483.}
484.void xlm_reverse(xlm_t *m) {485.mpq_t minus_one;486.mpq_init(minus_one);487.mpq_set_si(minus_one,1,-1);
488.//mpq_mul(m->m0,m->m0,minus_one);489.// mpq_mul(m->m0,m->m0,minus_one);490.mpq_mul(m->m1,m->m1,minus_one);
491.mpq_clear(minus_one);492.}
493.void xlm_amplitude_squared(xlm_t * m) {494.gmp_printf("Input1: %Qd\n", m->m0);495.gmp_printf("Input2: %Qd\n", m->m1);
496.//Compute clifford congugation of m and store it on clifford_conj497.xlm_t clifford_conj;498.xlm_init(&clifford_conj);499.xlm_set(&clifford_conj, *m);500.xlm_clifford_conjugation(&clifford_conj);501.gmp_printf("Clifford Conjugate: %Qd\n", clifford_conj.m0);502.gmp_printf("Clifford Conjugate: %Qd\n", clifford_conj.m1);503.//Compute geometric product of m and cg and store it on
amplitude_squared504.xlm_t amplitude_squared;505.xlm_init(&litude_squared);506.xlm_geometric_product(&litude_squared, m, &clifford_conj);
507.gmp_printf("Amplitude squared: %Qd\n", amplitude_squared.m0);508.//Make the pointer content equal amplitude_squared509.xlm_clear(m);510.xlm_init(m);511.xlm_set(m, amplitude_squared);
512.//Clean up
160
513.xlm_clear(&clifford_conj);514.xlm_clear(&litude_squared);515.}
516.void xlm_amplitude_squared_reversed(xlm_t * m) {517.//Compute amplitude squared of m518.xlm_t amplitude_squared_reversed;519.xlm_init(&litude_squared_reversed);520.xlm_set(&litude_squared_reversed, *m);521.xlm_amplitude_squared(&litude_squared_reversed);
522.//Compute reverse of mv_amplitude_squared and store it on mv_amplitude_squared
523.xlm_reverse(&litude_squared_reversed);
524.//Make the pointer content equal amplitude_squared_reversed525.xlm_clear(m);526.xlm_init(m);527.xlm_set(m, amplitude_squared_reversed);
528.//Clean up529.xlm_clear(&litude_squared_reversed);530.}
531.void xlm_rationalize(xlm_t *m) {532.//Compute amplitude squared of m and store it on
mv_amplitude_squared533.xlm_t mv_amplitude_squared;534.xlm_init(&mv_amplitude_squared);535.xlm_set(&mv_amplitude_squared, *m);536.xlm_amplitude_squared(&mv_amplitude_squared);
537.//Compute amplitude squared reversed of m and store it on mv_amplitude_squared_reversed
538.xlm_t mv_amplitude_squared_reversed;539.xlm_init(&mv_amplitude_squared_reversed);540.xlm_set(&mv_amplitude_squared_reversed, *m);541.xlm_amplitude_squared_reversed(&mv_amplitude_squared_reversed);
542.//Compute geometric product of mv_amplitude_squared and mv_amplitude_squared_reversed and store it on mv_geometric_product
543.xlm_t mv_geometric_product;544.xlm_init(&mv_geometric_product);545.xlm_geometric_product(&mv_geometric_product,&mv_amplitude_squared
,&mv_amplitude_squared_reversed);
546.//Make the pointer content equal mv_geometric_product547.xlm_clear(m);548.xlm_init(m);549.xlm_set(m,mv_geometric_product);
550.//Clean up551.xlm_clear(&mv_amplitude_squared);552.xlm_clear(&mv_amplitude_squared_reversed);553.xlm_clear(&mv_geometric_product);554.}
555.void xlm_scalar_div(xlm_t *m, mpq_t scalar) {
161
556.mpq_div(m->m0, m->m0, scalar);557.mpq_div(m->m1, m->m1, scalar);
558.}
559.void xlm_inverse(xlm_t *m){560.//Compute clifford congugation of m and store it on clifford_conj561.xlm_t clifford_conj;562.xlm_init(&clifford_conj);563.xlm_set(&clifford_conj, *m);564.xlm_clifford_conjugation(&clifford_conj);
565.//Compute amplitude squared reversed of m and store it on mv_amplitude_squared_reversed
566.xlm_t mv_amplitude_squared;567.xlm_init(&mv_amplitude_squared);568.xlm_set(&mv_amplitude_squared, *m);569.xlm_amplitude_squared(&mv_amplitude_squared);
570.//Rationalize571.xlm_t mv_rationalize;572.xlm_init(&mv_rationalize);573.xlm_set(&mv_rationalize, *m);574.xlm_rationalize(&mv_rationalize);
575.xlm_t mv_geometric_product;576.xlm_init(&mv_geometric_product);577.xlm_set(&mv_geometric_product, *m);578.//Perform scalar div on geometric product579.xlm_scalar_div(&mv_geometric_product, mv_amplitude_squared.m0);
580.//Make the pointer content equal mv_geometric_product581.xlm_clear(m);582.xlm_init(m);583.xlm_set(m, mv_geometric_product);
584.//Clean up585.xlm_clear(&clifford_conj);586.//xlm_clear(&mv_amplitude_squared_reversed);587.//xlm_clear(&mv_geometric_product);588.xlm_clear(&mv_rationalize);589.}
590.//591.// xlg_massive_encryption.h
592.#ifndef xlg_massive_encryption_h593.#define xlg_massive_encryption_h
594.#include <stdio.h>595.#include "xlg.h"
596.void xlg_encrypt_file(char* src, char* dst, xlg_t xlg);
162
597.void xlg_decrypt_file(char* src, char* dst, xlg_t xlg);598.void xlg_append_encypted_data(char* dst_path, char* data_buffer,
xlg_t xlg);599.void xlg_encode(xlg_t xlg);600.void xlg_decode(xlg_t xlg);
601.#endif /* xlg_massive_encryption_h */
602.//603.// xlg_massive_encryption.c
604.#include "xlg_massive_encryption.h"605.int BUFFER_SIZE = 1024*10;
606.void xlg_encrypt_file(char* src, char* dst, xlg_t xlg){607.FILE *src_file = fopen(src, "rb");608.FILE *dst_file= fopen(dst, "wb");
609.while (!feof(src_file)) {610.//Read file611.long nread = 1;612.char buffer[BUFFER_SIZE];613.buffer[0] = 1;614.while(nread<BUFFER_SIZE-1 && !feof(src_file)){615.int c = getc(src_file);616.if(c!= EOF){617.buffer[nread]=c;618.nread++;619.}620.}621.//Import622.xlm_t message;623.xlm_init(&message);624.xlm_import_str_w_size(&message,buffer,nread);
625.gmp_printf("Message Values: %Qd + %Qd\n", message.m0, message.m1);
626.//Encrypt627.xlm_t cypher_xlm;628.xlm_init(&cypher_xlm);629.xlg_encrypt(&cypher_xlm, message, xlg);630.gmp_printf("Encrypted Values: %Qd + %Qd\n", cypher_xlm.m0,
cypher_xlm.m1);631.//Write to file632.xlm_out_raw(dst_file,cypher_xlm);
633.//Clean up634.xlm_clear(&message);635.xlm_clear(&cypher_xlm);636.}637.fclose(src_file);638.fclose(dst_file);639.}
163
640.void xlg_decrypt_file(char* src, char* dst, xlg_t xlg){641.FILE *src_file= fopen(src, "rb");642.FILE *dst_file= fopen(dst, "wb");
643.while(!feof(src_file)){644.//Read File645.xlm_t cypher_xlm;646.xlm_init(&cypher_xlm);647.size_t nread = xlm_inp_raw(&cypher_xlm, src_file);648.if(nread <=0){649.xlm_clear(&cypher_xlm);650.break;651.}
652.//Decrypt653.xlm_t decrypt;654.xlm_init(&decrypt);655.xlg_decrypt(&decrypt,cypher_xlm,xlg);656.gmp_printf("Decrypted Values: %Qd + %Qd\n", decrypt.m0,
decrypt.m1);
657.//Export658.long size;659.char* buffer = xlm_export_str(decrypt,&size);
660.//Write file661.long nwrite = 1;662.while(nwrite<size){663.putc(buffer[nwrite++], dst_file);664.}
665.//Clean up666.free(buffer);667.xlm_clear(&cypher_xlm);668.xlm_clear(&decrypt);669.}
670.fclose(src_file);671.fclose(dst_file);672.}
673.void xlg_append_encypted_data(char* dst_path, char* data_buffer, xlg_t xlg){
674.FILE *dst_file= fopen(dst_path, "ab");
675.char * buffer = malloc(strlen(data_buffer)+2);676.memset(buffer,0,strlen(data_buffer)+2);677.buffer[0]=1;678.buffer = strcat(buffer, data_buffer);
679.//Import680.xlm_t data;681.xlm_init(&data);682.xlm_import_str_w_size(&data,buffer,strlen(data_buffer)+2);
683.//Encrypt684.xlm_t cypher_xlm;685.xlm_init(&cypher_xlm);686.xlg_encrypt(&cypher_xlm, data, xlg);
164
687.//Write to file688.xlm_out_raw(dst_file,cypher_xlm);
689.//Clean up690.xlm_clear(&data);691.xlm_clear(&cypher_xlm);692.free(buffer);
693.fclose(dst_file);694.}
695.void xlg_encode(xlg_t xlg){
696.while (!feof(stdin)) {
697.long nread = 1;698.char buffer[BUFFER_SIZE];699.buffer[0] = 1; // THANKS
HANES!!!700.while(nread<BUFFER_SIZE-1 && !feof(stdin)){701.int c = getchar();702.if(c!= EOF){703.buffer[nread]=c;704.nread++;705.}706.}
707.//Import708.xlm_t message;709.xlm_init(&message);710.xlm_import_str_w_size(&message,buffer,nread);
711.//Encrypt712.xlm_t cypher_xlm;713.xlm_init(&cypher_xlm);714.xlg_encrypt(&cypher_xlm, message, xlg);
715.gmp_fprintf(stdout,"%Qd\n", cypher_xlm.m0);716.gmp_fprintf(stdout,"%Qd\n", cypher_xlm.m1);
717.xlm_clear(&message);718.xlm_clear(&cypher_xlm);719.}720.}
721.void xlg_decode(xlg_t xlg){722.FILE *stream;723.char *line = NULL;724.size_t len = 0;725.size_t read;
726.stream = stdin;727.if (stream == NULL)728.exit(0);
165
729.int count = 0;730.mpq_t m0;731.mpq_t m1;
732.xlm_t cypher;733.while ((read = getline(&line, &len, stdin)) != -1) {734.if(count%2 == 0){735.mpq_init(m0);736.mpq_set_str(m0,line,10);737.count++;738.}739.else if(count%2 == 1){740.mpq_init(m1);741.mpq_set_str(m1,line,10);742.count++;
743.xlm_init(&cypher);744.mpq_set(cypher.m0,m0);745.mpq_set(cypher.m1,m1);
746.//Decrypt747.xlm_t decrypt;748.xlm_init(&decrypt);749.xlg_decrypt(&decrypt,cypher,xlg);
750.//Export751.long size;752.char* buffer = xlm_export_str(decrypt,&size);
753.long nwrite = 1; //Thanks Hanes754.while(nwrite<size){755.putchar(buffer[nwrite++]);756.}
757.count =0;758.xlm_clear(&cypher);759.xlm_clear(&decrypt);760.free(buffer);761.mpq_clear(m0);762.mpq_clear(m1);
763.}764.}
765.free(line);766.fclose(stream);767.}
166
Compilation:
8. Download math libraries and install them in the VM.
sudo apt-get install libmath-mpfr-perl
9. Install AES-Crypt executable
wget https://www.aescrypt.com/download/v3/linux/AESCrypt-GUI-3.11-Linux-x86_64-Install.gzgunzip AESCrypt-GUI-3.11-Linux-x86_64-Install.gzchmod +x AESCrypt-GUI-3.11-Linux-x86_64-Install./AESCrypt-GUI-3.11-Linux-x86_64-Install
10. Compile the code with following command.
gcc main.c xlg_compression.c xlm.c xlg.c xlg_massive_encryption.c -o xlg -lgmp -w
11. Using the following commands compared against AES-Crypt.
AES-Crypt:
time aescrypt -e -p key plaintext_file_name
time aescrypt -d -p key plaintext_file_name.aes
12. Using the following commands compared against AES-Crypt.
AES-Crypt:
time aescrypt -e -p key plaintext_file_name
time aescrypt -d -p key plaintext_file_name.aes
RVTHE:
time xlg -e -x key1 -y key2 plaintext_file_name
time xlg -d -x key1 -y key2 plaintext_file_name.xlg
time xlg -a -x key1 -y key2 “data” plaintext_file_name.xlg
167
Appendix D – Demonstrate RVTHE performance and improvement on cipher size
In this section Appendix, we demon….
once I design RVTHE then I converted into similar executable program like AES-
168
Crypt and run it on AWS VMs on file level encryption. RVTHE and AES-Crypt both
are symmetric encryptions and it is very much comparable to each other. file level
encryption and compared the results. Written similar type of program as AES-Crypt.
1. //2. // main.c
3. #include <stdio.h>4. #include "test_xlg.h"5. #include "test_xlm.h"6. #include "test_xlg_massive_encryption.h"
7. int main(int argc, const char * argv[]) {8. encrypt_decrypt_file(argc,argv);9. //10. // xlg.h11. // XLogos with MPQ
12. #ifndef xlg_h13. #define xlg_h
14. #include <stdio.h>15. #include "xlm.h"16. #include <time.h>
17. struct xlg_t{18. xlm_t key1;19. xlm_t key2;20. xlm_t key1_inverse;21. xlm_t key2_inverse;22. };23. typedef struct xlg_t xlg_t;
24. //============================== INIT and SET ==============================
25. void xlg_init(xlg_t *xlg);26. void xlg_generate_keys(xlg_t *xlg, int key_size);27. void xlg_set_keys(xlg_t *xlg, xlm_t k1, xlm_t k2);
28. //============================== OPERATIONS ==============================
29. void xlg_encrypt(xlm_t *dest_cypher, xlm_t message, xlg_t xlg);30. void xlg_decrypt(xlm_t *dest_decrypt, xlm_t cypher, xlg_t xlg);
31. //============================== UTILS ==============================
32. void xlg_clear(xlg_t *xlg);33. void xlg_print(xlg_t xlg);
34. #endif /* xlg_h */
169
35. //36. // xlg.c
37. #include "xlg.h"
38. //============================== INIT and SET ==============================
39. void xlg_init(xlg_t *xlg){40. xlm_init(&xlg->key1);41. xlm_init(&xlg->key2);42. xlm_init(&xlg->key1_inverse);43. xlm_init(&xlg->key2_inverse);44. }
45. void xlg_generate_keys(xlg_t *xlg, int key_size){46. gmp_randstate_t state;47. gmp_randinit_default(state);48. time_t t;49. gmp_randseed_ui(state, time(&t));
50. mpz_t key1_z;51. mpz_init(key1_z);52. mpz_urandomb(key1_z, state, key_size);53. xlm_t key1_m;54. xlm_init(&key1_m);55. xlm_set_z(&key1_m, key1_z);
56. mpz_t key2_z;57. mpz_init(key2_z);58. mpz_urandomb(key2_z, state, key_size);59. xlm_t key2_m;60. xlm_init(&key2_m);61. xlm_set_z(&key2_m, key2_z);
62. xlg_set_keys(xlg, key1_m, key2_m);
63. //Clean up64. mpz_clear(key1_z);65. mpz_clear(key2_z);66. xlm_clear(&key1_m);67. xlm_clear(&key2_m);68. gmp_randclear(state);69. }
70. void xlg_set_keys(xlg_t *xlg, xlm_t k1, xlm_t k2){71. //Set keys72. xlm_set(&xlg->key1,k1);73. xlm_set(&xlg->key2,k2);
74. //Set inverse keys75. xlm_t key1_inverse;76. xlm_init(&key1_inverse);77. xlm_set(&key1_inverse,k1);78. xlm_inverse(&key1_inverse);79. xlm_set(&xlg->key1_inverse, key1_inverse);80. xlm_clear(&key1_inverse);
81. xlm_t key2_inverse;
170
82. xlm_init(&key2_inverse);83. xlm_set(&key2_inverse,k2);84. xlm_inverse(&key2_inverse);85. xlm_set(&xlg->key2_inverse, key2_inverse);86. xlm_clear(&key2_inverse);
87. }
88. //============================== OPERATIONS ==============================
89. void xlg_encrypt(xlm_t *dest_cypher, xlm_t message, xlg_t xlg){90. //Encrypt91. xlm_t gp1_encryption;92. xlm_init(&gp1_encryption);93. xlm_geometric_product(&gp1_encryption,&xlg.key1,&message);
94. xlm_t cypher;95. xlm_init(&cypher);96. xlm_geometric_product_bivector(&cypher,&gp1_encryption,&xlg.key2);97. xlm_set(dest_cypher,cypher);
98. //Clean up99. xlm_clear(&gp1_encryption);100.xlm_clear(&cypher);101.}
102.void xlg_decrypt(xlm_t *dest_decrypt, xlm_t cypher, xlg_t xlg){103.//Decrypt104.xlm_t gp1_decryption;105.xlm_init(&gp1_decryption);106.xlm_geometric_product(&gp1_decryption,&cypher,&xlg.key2_inverse);107.xlm_t decrypt;108.xlm_init(&decrypt);109.xlm_geometric_product_bivector_vector(&decrypt,&xlg.key1_inverse,&
gp1_decryption);110.xlm_set(dest_decrypt,decrypt);
111.//Clean up112.xlm_clear(&gp1_decryption);113.xlm_clear(&decrypt);
114.}
115.//============================== UTILS ==============================
116.void xlg_clear(xlg_t *xlg){117.xlm_clear(&xlg->key1);118.xlm_clear(&xlg->key2);119.xlm_clear(&xlg->key1_inverse);120.xlm_clear(&xlg->key2_inverse);121.}
122.void xlg_print(xlg_t xlg){123.mpz_t key1;124.mpz_init(key1);125.xlm_get_z(&key1, xlg.key1);
126.mpz_t key2;127.mpz_init(key2);128.xlm_get_z(&key2, xlg.key2);
171
129.gmp_printf("key 1 => %Zd\n", key1);130.gmp_printf("key 2 => %Zd\n", key2);
131.mpz_clear(key1);132.mpz_clear(key2);133.}
134.//135.// xlm.h
136.#ifndef xlm_h137.#define xlm_h
138.#include <stdio.h>139.#include <stdlib.h>140.#include <string.h>141.#include <gmp.h>142.#include <math.h>143.#include "xlg_compression.h"
144.struct xlm_t {145.mpq_t m0;146.mpq_t m1;
147.};148.typedef struct xlm_t xlm_t;
149.//============================== INIT and SET ==============================
150.void xlm_init(xlm_t * dest);151.void xlm_set(xlm_t *dest, xlm_t src);152.void xlm_set_z(xlm_t * dest, mpz_t z);153.void xlm_set_si(xlm_t * dest, signed long int si);154.void xlm_import_str(xlm_t * dest,char* str);155.void xlm_import_str_w_size(xlm_t * dest,char* str, long size);
156.//============================== XLM EXPORT ==============================
157.void xlm_get_z(mpz_t *dest, xlm_t xlm);158.signed long int xlm_get_si(xlm_t xlm);159.char* xlm_export_str(xlm_t xlm, long *buffer_size);
160.//============================== UTILS ==============================
161.void xlm_print(xlm_t m);162.void xlm_clear(xlm_t * m);
163.void xlm_pack(mpz_t dst, xlm_t src);164.void xlm_unpack(xlm_t *dst, mpz_t src);165.size_t xlm_out_raw(FILE* stream, xlm_t src);166.size_t xlm_inp_raw(xlm_t *dst,FILE* stream);
167.//============================== OPERATIONS
172
==============================168.void xlm_geometric_product(xlm_t *dest, xlm_t * m0, xlm_t * m1);169.void xlm_geometric_product_bivector(xlm_t *dest, xlm_t * m0, xlm_t
* m1);170.void xlm_geometric_product_bivector_vector(xlm_t *dest, xlm_t *
m0, xlm_t * m1);171.void xlm_clifford_conjugation(xlm_t *m);172.void xlm_reverse(xlm_t *m);173.void xlm_amplitude_squared(xlm_t * m);174.void xlm_amplitude_squared_reversed(xlm_t * m);175.void xlm_rationalize(xlm_t *m);176.void xlm_scalar_div(xlm_t * m, mpq_t scalar);177.void xlm_inverse(xlm_t *m);178.void xlm_lambda_0(mpq_t *m0, xlm_t * mv1, xlm_t * mv2);179.void xlm_lambda_1(mpq_t *m1, xlm_t * mv1, xlm_t * mv2);180.void xlm_lambda_0_bivector(mpq_t *m0, xlm_t * mv1, xlm_t * mv2);181.void xlm_lambda_1_bivector(mpq_t *m1, xlm_t * mv1, xlm_t * mv2);182.void xlm_lambda_0_bivector_vector(mpq_t *m0, xlm_t * mv1, xlm_t
* mv2);183.void xlm_lambda_1_bivector_vector(mpq_t *m1, xlm_t * mv1, xlm_t
* mv2);
184.#endif /* xlm_h */
185.//186.// xlm.c
187.#include "xlm.h"
188.//============================== INIT and SET ==============================
189.void xlm_init(xlm_t * dest){190.mpq_init(dest->m0);191.mpq_init(dest->m1);
192.}
193.void xlm_set(xlm_t *dest, xlm_t src){194.mpq_set(dest->m0, src.m0);195.mpq_set(dest->m1, src.m1);196.}
197.void xlm_set_z(xlm_t * dest, mpz_t z){198.//Init base and reminder199.mpz_t base;200.mpz_init(base);201.mpz_t reminder;202.mpz_init(reminder);
203.//Compute values204.mpz_div_ui(base,z,2);205.mpz_mod_ui(reminder,z,2);
173
206.//Get reminder in mpq207.mpq_t reminder_mpq;208.mpq_init(reminder_mpq);209.mpq_set_z(reminder_mpq,reminder);
210.mpq_set_z(dest->m0,base);211.mpq_set_z(dest->m1, base);
212.mpq_add(dest->m1,dest->m1,reminder_mpq);
213.//Adjust coefficients214.if(mpz_cmp_ui(reminder,0) == 0){215.mpq_t mpq_1;216.mpq_init(mpq_1);217.mpq_set_ui(mpq_1,0,1);218.mpq_add(dest->m1, dest->m1, mpq_1);219.mpq_clear(mpq_1);220.}
221.mpz_clear(base);222.mpz_clear(reminder);223.mpq_clear(reminder_mpq);224.}
225.void xlm_set_si(xlm_t * dest, signed long int si){226.mpz_t z;227.mpz_init_set_si(z,si);228.xlm_set_z(dest,z);229.mpz_clear(z);230.}
231.void xlm_import_str(xlm_t * dest,char* str){232.mpz_t z;233.mpz_init(z);234.mpz_import(z,sizeof(str),1,sizeof(str[0]), 0, 0,str);235.xlm_set_z(dest,z);236.mpz_clear(z);237.}
238.void xlm_import_str_w_size(xlm_t * dest,char* str, long size){239.mpz_t z;240.mpz_init(z);241.mpz_import(z,size,1,sizeof(str[0]),0, 0,str);242.xlm_set_z(dest,z);243.mpz_clear(z);244.}
245.//============================== XLM GET ==============================
246.void xlm_get_z(mpz_t *dest, xlm_t xlm){247.mpz_t mpz_m0;mpz_init(mpz_m0);248.mpz_t mpz_m1;mpz_init(mpz_m1);
249.mpz_set_q(mpz_m0,xlm.m0);250.mpz_set_q(mpz_m1,xlm.m1);
251.mpz_add(*dest, mpz_m0, mpz_m1);
174
252.mpz_clear(mpz_m0);253.mpz_clear(mpz_m1);254.}
255.signed long int xlm_get_si(xlm_t xlm){256.mpz_t z;257.mpz_init(z);258.xlm_get_z(&z,xlm);259.signed long int si = mpz_get_si(z);260.mpz_clear(z);261.return si;262.}
263.char* xlm_export_str(xlm_t xlm, long *buffer_size){264.mpz_t z;265.mpz_init(z);266.xlm_get_z(&z,xlm);
267.//Alloc memory to destination buffer268.long size =sizeof(char);269.long nail = 0;270.long numb = 8*size - nail;271.long count = (mpz_sizeinbase (z, 2) + numb-1) / numb;272.char* buffer;273.buffer = malloc(count * size);
274.if(*buffer_size != NULL){275.*buffer_size =count * size;276.}
277.//Export to buffer278.mpz_export(buffer, NULL, 1, size, 0, nail, z);279.mpz_clear(z);
280.return buffer;281.}
282.//============================== UTILS ==============================
283.void xlm_clear(xlm_t * m){284.mpq_clear(m->m0);285.mpq_clear(m->m1);
286.}
287.void xlm_print(xlm_t m){288.gmp_printf("%+Qd e0 ", m.m0);289.gmp_printf("%+Qd e1 \n", m.m1);
290.}
291.void xlm_pack(mpz_t dst, xlm_t src){292.mpz_t m0_m1;293.mpz_t m0,m1;294.mpz_inits(m0_m1,m0,m1,NULL);
295.//Get mpz values of coefficients
175
296.mpz_set_q(m0,src.m0);297.mpz_set_q(m1,src.m1);
298.//Set absolute values299.mpz_abs(m0,m0);300.mpz_abs(m1,m1);
301.//Pair coefficients302.xlg_pair(dst, m0, m1);
303.//Pack signs of coefficients304.unsigned int sings = 0;305.sings = sings + (int)((mpq_cmp_si(src.m0,0,0)<0)? pow(2,7):0);306.sings = sings + (int)((mpq_cmp_si(src.m1,0,0)<0)? pow(2,6):0);
307.mpz_mul_ui(dst,dst,256);308.mpz_add_ui(dst,dst,sings);
309.mpz_clears(m0_m1,m0,m1,NULL);310.}
311.void xlm_unpack(xlm_t *dst, mpz_t src){312.mpz_t m0_m1;313.mpz_t m0,m1;314.mpz_inits(m0_m1,m0,m1,NULL);
315.//Get sings316.mpz_t signs_z;317.mpz_init(signs_z);318.mpz_mod_ui(signs_z,src,256);319.mpz_div_ui(src,src,256);320.unsigned long signs = mpz_get_ui(signs_z);
321.//Unpair coefficients
322.xlg_unpair(m0, m1, src);
323.//Adjust sign324.if((signs & 1) > 0)325.mpz_mul_si(m0,m0,-1);326.if((signs & 2) > 0)327.mpz_mul_si(m1,m1,-1);
328.//Set coefficients329.mpq_set_z(dst->m0,m0);330.mpq_set_z(dst->m1,m1);
176
331.mpz_clear(signs_z);332.mpz_clears(m0_m1,m0,m1,NULL);333.}
334.size_t xlm_out_raw(FILE* stream, xlm_t src){
335.mpz_t blades[2];336.for (int i = 0; i < 2; i++) {337.mpz_init(blades[i]);338.}
339.mpz_set_q(blades[0],src.m0);340.mpz_set_q(blades[1],src.m1);
341.size_t size = 0;342.for (int i = 0; i < 2; i++) {343.size += mpz_out_raw(stream,blades[i]);344.}
345.for (int i = 0; i < 2; i++) {346.mpz_clear(blades[i]);347.}348.return size;349.}
350.size_t xlm_inp_raw(xlm_t *dst,FILE* stream){
351.mpz_t blades[2];352.for (int i = 0; i < 2; i++) {353.mpz_init(blades[i]);354.}355.size_t rsize = 0;356.for (int i = 0; i < 2; i++) {357.rsize += mpz_inp_raw(blades[i],stream);358.}
359.mpq_set_z(dst->m0,blades[0]);360.mpq_set_z(dst->m1,blades[1]);
361.for (int i = 0; i < 2; i++) {362.mpz_clear(blades[i]);363.}
364.return rsize;365.}
366.//============================== OPERATIONS ==============================
367.void xlm_geometric_product(xlm_t * dest, xlm_t * m0, xlm_t * m1){368.xlm_lambda_0(&dest->m0,m0,m1);
177
369.xlm_lambda_1(&dest->m1,m0,m1);
370.}
371.void xlm_geometric_product_bivector(xlm_t * dest, xlm_t * m0, xlm_t * m1){
372.xlm_lambda_0_bivector(&dest->m0,m0,m1);373.xlm_lambda_1_bivector(&dest->m1,m0,m1);
374.}
375.void xlm_geometric_product_bivector_vector(xlm_t * dest, xlm_t * m0, xlm_t * m1){
376.xlm_lambda_0_bivector_vector(&dest->m0,m0,m1);377.xlm_lambda_1_bivector_vector(&dest->m1,m0,m1);
378.}
379.void xlm_lambda_0(mpq_t *m, xlm_t * mv1, xlm_t *mv2){380.mpq_t ma;381.mpq_t mb;
382.mpq_init(ma);383.mpq_init(mb);384.mpq_mul(ma,mv1->m0,mv2->m0);385.mpq_mul(mb,mv1->m1,mv2->m1);386.mpq_add(*m,*m,ma);387.mpq_add(*m,*m,mb);388.mpq_clear(ma);389.mpq_clear(mb);
390.}
391.void xlm_lambda_1(mpq_t *m, xlm_t * mv1, xlm_t *mv2){392.mpq_t ma;393.mpq_t mb;
394.mpq_init(ma);395.mpq_init(mb);396.mpq_mul(ma,mv1->m0,mv2->m1);397.mpq_mul(mb,mv1->m1,mv2->m0);398.mpq_add(*m,*m,ma);399.mpq_sub(*m,*m,mb);400.mpq_clear(ma);401.mpq_clear(mb);
402.}
403.void xlm_lambda_0_bivector(mpq_t *m, xlm_t * mv1, xlm_t *mv2){404.mpq_t ma;405.mpq_t mb;
178
406.mpq_init(ma);407.mpq_init(mb);408.mpq_mul(ma,mv1->m0,mv2->m0);409.mpq_mul(mb,mv1->m1,mv2->m1);410.mpq_add(*m,*m,ma);411.mpq_add(*m,*m,mb);412.mpq_clear(ma);413.mpq_clear(mb);
414.}
415.void xlm_lambda_1_bivector(mpq_t *m, xlm_t * mv1, xlm_t *mv2){416.mpq_t ma;417.mpq_t mb;
418.mpq_init(ma);419.mpq_init(mb);420.mpq_mul(ma,mv1->m0,mv2->m1);421.mpq_mul(mb,mv1->m1,mv2->m0);422.mpq_add(*m,*m,ma);423.mpq_sub(*m,*m,mb);424.mpq_clear(ma);425.mpq_clear(mb);
426.}
427.void xlm_lambda_0_bivector_vector(mpq_t *m, xlm_t * mv1, xlm_t *mv2){
428.mpq_t ma;429.mpq_t mb;
430.mpq_init(ma);431.mpq_init(mb);432.mpq_mul(ma,mv1->m0,mv2->m0);433.mpq_mul(mb,mv1->m1,mv2->m1);434.mpq_add(*m,*m,ma);435.mpq_sub(*m,*m,mb);436.mpq_clear(ma);437.mpq_clear(mb);
438.}
439.void xlm_lambda_1_bivector_vector(mpq_t *m, xlm_t * mv1, xlm_t *mv2){
440.mpq_t ma;441.mpq_t mb;
442.mpq_init(ma);443.mpq_init(mb);444.mpq_mul(ma,mv1->m1,mv2->m0);445.mpq_mul(mb,mv1->m0,mv2->m1);446.mpq_add(*m,*m,ma);447.mpq_add(*m,*m,mb);448.mpq_clear(ma);
179
449.mpq_clear(mb);
450.}
451.void xlm_clifford_conjugation(xlm_t *m) {452.mpq_t minus_one;453.mpq_init(minus_one);454.mpq_set_si(minus_one,1,-1);
455.//mpq_mul(m->m0,m->m0,minus_one);456.//mpq_mul(m->m1,m->m1,minus_one);457.//mpq_mul(m->m1,m->m1,minus_one);
458.mpq_clear(minus_one);459.}
460.void xlm_reverse(xlm_t *m) {461.mpq_t minus_one;462.mpq_init(minus_one);463.mpq_set_si(minus_one,1,-1);
464.//mpq_mul(m->m0,m->m0,minus_one);465.// mpq_mul(m->m0,m->m0,minus_one);466.mpq_mul(m->m1,m->m1,minus_one);
467.mpq_clear(minus_one);468.}
469.void xlm_amplitude_squared(xlm_t * m) {470.//suni gmp_printf("Input1: %Qd\n", m->m0);471.//suni gmp_printf("Input2: %Qd\n", m->m1);
472.//Compute clifford congugation of m and store it on clifford_conj473.xlm_t clifford_conj;474.xlm_init(&clifford_conj);475.xlm_set(&clifford_conj, *m);476.xlm_clifford_conjugation(&clifford_conj);477.//suni gmp_printf("Clifford Conjugate: %Qd\n", clifford_conj.m0);478.// suni gmp_printf("Clifford Conjugate: %Qd\n", clifford_conj.m1);479.//Compute geometric product of m and cg and store it on
amplitude_squared480.xlm_t amplitude_squared;481.xlm_init(&litude_squared);482.xlm_geometric_product(&litude_squared, m, &clifford_conj);
483.//suni gmp_printf("Amplitude squared: %Qd\n", amplitude_squared.m0);
484.//Make the pointer content equal amplitude_squared485.xlm_clear(m);
180
486.xlm_init(m);487.xlm_set(m, amplitude_squared);
488.//Clean up489.xlm_clear(&clifford_conj);490.xlm_clear(&litude_squared);491.}
492.void xlm_amplitude_squared_reversed(xlm_t * m) {493.//Compute amplitude squared of m494.xlm_t amplitude_squared_reversed;495.xlm_init(&litude_squared_reversed);496.xlm_set(&litude_squared_reversed, *m);497.xlm_amplitude_squared(&litude_squared_reversed);
498.//Compute reverse of mv_amplitude_squared and store it on mv_amplitude_squared
499.xlm_reverse(&litude_squared_reversed);
500.//Make the pointer content equal amplitude_squared_reversed501.xlm_clear(m);502.xlm_init(m);503.xlm_set(m, amplitude_squared_reversed);
504.//Clean up505.xlm_clear(&litude_squared_reversed);506.}
507.void xlm_rationalize(xlm_t *m) {508.//Compute amplitude squared of m and store it on
mv_amplitude_squared509.xlm_t mv_amplitude_squared;510.xlm_init(&mv_amplitude_squared);511.xlm_set(&mv_amplitude_squared, *m);512.xlm_amplitude_squared(&mv_amplitude_squared);
513.//Compute amplitude squared reversed of m and store it on mv_amplitude_squared_reversed
514.xlm_t mv_amplitude_squared_reversed;515.xlm_init(&mv_amplitude_squared_reversed);516.xlm_set(&mv_amplitude_squared_reversed, *m);517.xlm_amplitude_squared_reversed(&mv_amplitude_squared_reversed);
518.//Compute geometric product of mv_amplitude_squared and mv_amplitude_squared_reversed and store it on mv_geometric_product
519.xlm_t mv_geometric_product;520.xlm_init(&mv_geometric_product);521.xlm_geometric_product(&mv_geometric_product,&mv_amplitude_squared,
&mv_amplitude_squared_reversed);
522.//Make the pointer content equal mv_geometric_product523.xlm_clear(m);524.xlm_init(m);525.xlm_set(m,mv_geometric_product);
526.//Clean up527.xlm_clear(&mv_amplitude_squared);528.xlm_clear(&mv_amplitude_squared_reversed);
181
529.xlm_clear(&mv_geometric_product);530.}
531.void xlm_scalar_div(xlm_t *m, mpq_t scalar) {532.mpq_div(m->m0, m->m0, scalar);533.mpq_div(m->m1, m->m1, scalar);
534.}
535.void xlm_inverse(xlm_t *m){536.//Compute clifford congugation of m and store it on clifford_conj537.xlm_t clifford_conj;538.xlm_init(&clifford_conj);539.xlm_set(&clifford_conj, *m);540.xlm_clifford_conjugation(&clifford_conj);
541.//Compute amplitude squared reversed of m and store it on mv_amplitude_squared_reversed
542.xlm_t mv_amplitude_squared;543.xlm_init(&mv_amplitude_squared);544.xlm_set(&mv_amplitude_squared, *m);545.xlm_amplitude_squared(&mv_amplitude_squared);
546.//Rationalize547.xlm_t mv_rationalize;548.xlm_init(&mv_rationalize);549.xlm_set(&mv_rationalize, *m);550.xlm_rationalize(&mv_rationalize);
551.xlm_t mv_geometric_product;552.xlm_init(&mv_geometric_product);553.xlm_set(&mv_geometric_product, *m);554.//Perform scalar div on geometric product555.xlm_scalar_div(&mv_geometric_product, mv_amplitude_squared.m0);
556.//Make the pointer content equal mv_geometric_product557.xlm_clear(m);558.xlm_init(m);559.xlm_set(m, mv_geometric_product);
560.//Clean up561.xlm_clear(&clifford_conj);562.//xlm_clear(&mv_amplitude_squared_reversed);563.//xlm_clear(&mv_geometric_product);564.xlm_clear(&mv_rationalize);565.}
566.//567.// xlg_massive_encryption.h
568.#ifndef xlg_massive_encryption_h569.#define xlg_massive_encryption_h
182
570.#include <stdio.h>571.#include "xlg.h"
572.void xlg_encrypt_file(char* src, char* dst, xlg_t xlg);573.void xlg_decrypt_file(char* src, char* dst, xlg_t xlg);574.void xlg_append_encypted_data(char* dst_path, char* data_buffer,
xlg_t xlg);575.void xlg_encode(xlg_t xlg);576.void xlg_decode(xlg_t xlg);
577.#endif /* xlg_massive_encryption_h */
578.//579.// xlg_massive_encryption.c
580.#include "xlg_massive_encryption.h"581.int BUFFER_SIZE = 1024*10;
582.void xlg_encrypt_file(char* src, char* dst, xlg_t xlg){583.FILE *src_file = fopen(src, "rb");584.FILE *dst_file= fopen(dst, "wb");
585.while (!feof(src_file)) {586.//Read file587.long nread = 1;588.char buffer[BUFFER_SIZE];589.buffer[0] = 1;590.while(nread<BUFFER_SIZE-1 && !feof(src_file)){591.int c = getc(src_file);592.if(c!= EOF){593.buffer[nread]=c;594.nread++;595.}596.}597.//Import598.xlm_t message;599.xlm_init(&message);600.xlm_import_str_w_size(&message,buffer,nread);
601.//suni gmp_printf("Message Values: %Qd + %Qd\n", message.m0, message.m1);
602.//Encrypt603.xlm_t cypher_xlm;604.xlm_init(&cypher_xlm);605.xlg_encrypt(&cypher_xlm, message, xlg);606.//suni gmp_printf("Encrypted Values: %Qd + %Qd\n", cypher_xlm.m0,
cypher_xlm.m1);607.//Write to file608.xlm_out_raw(dst_file,cypher_xlm);
609.//Clean up610.xlm_clear(&message);611.xlm_clear(&cypher_xlm);612.}613.fclose(src_file);614.fclose(dst_file);
183
615.}
616.void xlg_decrypt_file(char* src, char* dst, xlg_t xlg){617.FILE *src_file= fopen(src, "rb");618.FILE *dst_file= fopen(dst, "wb");
619.while(!feof(src_file)){620.//Read File621.xlm_t cypher_xlm;622.xlm_init(&cypher_xlm);623.size_t nread = xlm_inp_raw(&cypher_xlm, src_file);624.if(nread <=0){625.xlm_clear(&cypher_xlm);626.break;627.}
628.//Decrypt629.xlm_t decrypt;630.xlm_init(&decrypt);631.xlg_decrypt(&decrypt,cypher_xlm,xlg);632.//suni gmp_printf("Decrypted Values: %Qd + %Qd\n", decrypt.m0,
decrypt.m1);
633.//Export634.long size;635.char* buffer = xlm_export_str(decrypt,&size);
636.//Write file637.long nwrite = 1;638.while(nwrite<size){639.putc(buffer[nwrite++], dst_file);640.}
641.//Clean up642.free(buffer);643.xlm_clear(&cypher_xlm);644.xlm_clear(&decrypt);645.}
646.fclose(src_file);647.fclose(dst_file);648.}
649.void xlg_append_encypted_data(char* dst_path, char* data_buffer, xlg_t xlg){
650.FILE *dst_file= fopen(dst_path, "wb");
651.char * buffer = malloc(strlen(data_buffer)+2);652.memset(buffer,0,strlen(data_buffer)+2);653.buffer[0]=1;654.buffer = strcat(buffer, data_buffer);
655.//Import656.xlm_t data;657.xlm_init(&data);658.xlm_import_str_w_size(&data,buffer,strlen(data_buffer)+2);
659.//Encrypt
184
660.xlm_t cypher_xlm;661.xlm_init(&cypher_xlm);662.xlg_encrypt(&cypher_xlm, data, xlg);
663.//Write to file664.xlm_out_raw(dst_file,cypher_xlm);
665.//Clean up666.xlm_clear(&data);667.xlm_clear(&cypher_xlm);668.free(buffer);
669.fclose(dst_file);670.}
671.void xlg_encode(xlg_t xlg){
672.while (!feof(stdin)) {
673.long nread = 1;674.char buffer[BUFFER_SIZE];675.buffer[0] = 1; // THANKS HANES!!!676.while(nread<BUFFER_SIZE-1 && !feof(stdin)){677.int c = getchar();678.if(c!= EOF){679.buffer[nread]=c;680.nread++;681.}682.}
683.//Import684.xlm_t message;685.xlm_init(&message);686.xlm_import_str_w_size(&message,buffer,nread);
687.//Encrypt688.xlm_t cypher_xlm;689.xlm_init(&cypher_xlm);690.xlg_encrypt(&cypher_xlm, message, xlg);
691.gmp_fprintf(stdout,"%Qd\n", cypher_xlm.m0);692.gmp_fprintf(stdout,"%Qd\n", cypher_xlm.m1);
693.xlm_clear(&message);694.xlm_clear(&cypher_xlm);695.}696.}
697.void xlg_decode(xlg_t xlg){698.FILE *stream;699.char *line = NULL;700.size_t len = 0;701.size_t read;
702.stream = stdin;703.if (stream == NULL)704.exit(0);
185
705.int count = 0;706.mpq_t m0;707.mpq_t m1;
708.xlm_t cypher;709.while ((read = getline(&line, &len, stdin)) != -1) {710.if(count%2 == 0){711.mpq_init(m0);712.mpq_set_str(m0,line,10);713.count++;714.}715.else if(count%2 == 1){716.mpq_init(m1);717.mpq_set_str(m1,line,10);718.count++;
719.xlm_init(&cypher);720.mpq_set(cypher.m0,m0);721.mpq_set(cypher.m1,m1);
722.//Decrypt723.xlm_t decrypt;724.xlm_init(&decrypt);725.xlg_decrypt(&decrypt,cypher,xlg);
726.//Export727.long size;728.char* buffer = xlm_export_str(decrypt,&size);
729.long nwrite = 1; //Thanks Hanes730.while(nwrite<size){731.putchar(buffer[nwrite++]);732.}733.count =0;734.xlm_clear(&cypher);735.xlm_clear(&decrypt);736.free(buffer);737.mpq_clear(m0);738.mpq_clear(m1);
739.}740.}
741.free(line);742.fclose(stream);743.}
Compilation:
186
13. Download math libraries and install them in the VM.
sudo apt-get install libmath-mpfr-perl
14. Install AES-Crypt executable
wget https://www.aescrypt.com/download/v3/linux/AESCrypt-GUI-3.11-Linux-x86_64-Install.gzgunzip AESCrypt-GUI-3.11-Linux-x86_64-Install.gzchmod +x AESCrypt-GUI-3.11-Linux-x86_64-Install./AESCrypt-GUI-3.11-Linux-x86_64-Install
15. Compile the code with following command.
gcc main.c xlg_compression.c xlm.c xlg.c xlg_massive_encryption.c -o xlg -lgmp -w
16. Using the following commands compared against AES-Crypt.
AES-Crypt:
time aescrypt -e -p key plaintext_file_name
time aescrypt -d -p key plaintext_file_name.aes
RVTHE:
time xlg -e -x key1 -y key2 plaintext_file_name
time xlg -d -x key1 -y key2 plaintext_file_name.xlg
time xlg -a -x key1 -y key2 “data” plaintext_file_name.xlg
17. Sample Output of created encrypted file:
187
18. Cipher text size and contents:
19. Sample Performance Metrics:
20. Various Type Files and their encryption and decryption outputs:
.TXT: The below screenshot includes append option.
188
.JPEG:
.PDF:
189
.DOCX:
190
.XLSX:
.PPTX:
All of the above worked with RVTHE encryption met
191
Appendix E – Acronym List
Abbreviation Term HDD Hard Disk Drive SATA Serial AT AttachmentSSD Solid State DriveFDE Full-disk encryptionAES Advanced Encryption Standard DES Data Encryption Standard TDEA Triple Data Encryption AlgorithmRSA Rivest–Shamir–AdlemanMD5 Message-digest algorithmSHA Secure Hash Algorithm CBC Cipher Block ChainingCTR Counter GCM Galois/Counter Mode OCB Offset Codebook ModeECB Electronic Codebook OFB Output FeedbackAWS Amazon Web ServicesNIST National Institute of Standards and TechnologyESD Every Stage of Data FHE Fully Homomorphic Encryption RVTHE Reduced Vector Technique Homomorphic Encryption SSL Secure Sockets Layer UCCS University of Colorado, Colorado Springs