Bahram Hakhamaneshi Project Report

download Bahram Hakhamaneshi Project Report

of 119

Transcript of Bahram Hakhamaneshi Project Report

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    1/119

    A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION

    STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

    Bahram Hakhamaneshi

    B.S., Islamic Azad University, Iran, 2004

    PROJECT

    Submitted in partial satisfaction ofthe requirements for the degree of

    MASTER OF SCIENCE

    in

    COMPUTER ENGINEERING

    at

    CALIFORNIA STATE UNIVERSITY, SACRAMENTO

    FALL

    2009

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    2/119

    ii

    A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTIONSTANDARD (AES) ALGORITHM USING SYSTEMVERILOG

    A Project

    by

    Bahram Hakhamaneshi

    Approved by:

    __________________________________, Committee Chair

    Dr. Behnam Arad

    ____________________________

    Date

    __________________________________, Second ReaderDr. Isaac Ghansah

    ____________________________

    Date

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    3/119

    iii

    Student: Bahram Hakhamaneshi

    I certify that this student has met the requirements for format contained in the University

    format manual, and that this project is suitable for shelving in the Library and credit is to

    be awarded for the Project.

    __________________________, Graduate Coordinator ________________

    Dr. Suresh Vadhva Date

    Department of Computer Engineering

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    4/119

    iv

    Abstract

    of

    A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION

    STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

    by

    Bahram Hakhamaneshi

    The increasing need for protecting data communication in computer networks has

    led to development of several cryptography algorithms. The Advanced Encryption

    Standard (AES) is a computer security standard issued by the National Institute of

    Standards and Technology (NIST) intended for protecting electronic data. Its

    specification is defined in Federal Information Processing Standards (FIPS) Publication

    197. The AES cryptography algorithm can be used to encrypt/decrypt blocks of 128 bits

    and is capable of using cipher keys of 128, 196 or 256 bits wide (AES128, AES196, and

    AES256).

    The Advanced Encryption Standard can be implemented in either software or

    hardware. Hardware acceleration is the use of hardware to perform a task more

    efficiently than is possible in software. In order to achieve higher performance in todays

    heavily loaded communication networks, utilization of hardware accelerators for

    cryptography algorithms is more efficient.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    5/119

    v

    In this project, a hardware implementation of the AES128 encryption algorithm was

    proposed. A unique feature of the proposed pipelined design is that the round keys,

    which are consumed during different iterations of encryption, are generated in parallel

    with the encryption process. This lowers the delay associated with each round of

    encryption and reduces the overall encryption delay of a plaintext block. This leads to an

    increase in the message encryption throughput.

    The proposed pipelined design was modeled and validated in SystemVerilog

    hardware description language. The testbench developed for validating the design kept

    track of Functional Coverage to make sure the design is thoroughly verified. The design

    was validated using the Synopsys VCS tool and synthesized using the Synopsys Design-

    Compiler tool. The gate level netlist generated during the synthesis phase using the

    LSI_10K technology library was capable of operating at 40MHz frequency. We expect

    the timing and area of the gate level netlist to improve if a more efficient technology

    library file is used for synthesis.

    Finally, to get an estimate of the speed gain by the hardware implementation, a

    virtual system was created using the Virtutech Simicssoftware to emulate the

    execution of a C program that implements the AES128 encryption in software. The

    Simicsvirtual system utilized in this project is based on Intels x86 architecture with the

    440BX chipset and has a 2GHz Pentium4 processor.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    6/119

    vi

    The statistics gathered from the virtual system showed that it would take more than

    30,000 CPU cycles to encrypt a block of plaintext, assuming one clock per instruction.

    The results indicate that the hardware implementation proposed in this project is at least

    60 times faster than the software implementation.

    _______________________, Committee ChairDr. Behnam Arad

    _______________________Date

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    7/119

    vii

    To Mom and Dad whom I love the most in the world

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    8/119

    viii

    ACKNOWLEDGMENTS

    I would like to say thanks to Dr. Behnam Arad and Dr. Isaac Ghansah for their help

    with defining and concluding this project. This project could not have reached this far

    without their guidance and assistance. I also want to give special thanks to them for

    reviewing this report and proofreading it in the very short time that was left before

    submission deadline.

    I also would like to thank my family, either those who were close or far away, for

    encouraging and supporting me during the course of this project and all my life.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    9/119

    ix

    TABLE OF CONTENTS

    Page

    Dedication..vii

    Acknowledgmentsviii

    List of Tables..xi

    List of Figures....xii

    Chapter

    1. INTRODUCTION.1

    2. ADVANCED ENCRYPTION STANDARD (AES).5

    2.1 Overview....5

    2.2 Inputs, Outputs and the State.6

    2.3 Cipher Transformations 9

    2.3.1 SubBytes ( ) Transformation11

    2.3.2 ShiftRows ( ) Transformation...13

    2.3.3 MixColumns ( ) Transformation...13

    2.3.4 AddRoundKey ( ) Transformation ...15

    2.4 AES Key Expansion....16

    3. AES128 DESIGN AND IMPLEMENTATION..19

    3.1 Overview..........19

    3.2 Design Hierarchy.........20

    3.2.1 AES128 Encryption Process.21

    3.2.2 AES128 Round Key Generation...22

    3.3 AES128 Pipelined Design25

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    10/119

    x

    4. AES128 VERIFICATION...27

    4.1 Overview..27

    4.2 Testbench Infrastructure...27

    4.3 AES128_Interface29

    4.4 AES128_Program31

    5. AES128 SYNTHESIS.36

    5.1 Overview..36

    5.2 Synthesis Methodology37

    5.3 Synthesis Timing Result...40

    5.4 Synthesis Area Result...42

    5.5 Synthesis Constraint Violators Result..43

    6. AES128 SOFTWARE IMPLEMENTATION.....44

    6.1 Overview...44

    6.2 AES128 Software Implementation on a Simics Virtual System...44

    7. CONCLUSION........48

    Appendix A: AES128 Hardware Model Source Files...52

    Appendix B: AES128 Testbench Source Files..68

    Appendix C: AES128 Simulation Results.75

    Appendix D: AES128 Implementation in C Language102

    References....107

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    11/119

    xi

    LIST OF TABLES

    Page

    1. Table 1 AES Variations7

    2. Table 2 AES S-box.12

    3. Table 3 Simics Virtual System Statistics47

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    12/119

    xii

    LIST OF FIGURES

    Page

    1. Figure 1 State Population and Results.8

    2. Figure 2 AES Cipher..10

    3. Figure 3 SubBytes Transformation11

    4. Figure 4 ShiftRows Transformation..13

    5. Figure 5 MixColumns Transformation......15

    6. Figure 6 AddRoundKey Transformation...16

    7. Figure 7 KeyExpansion Algorithm........17

    8. Figure 8 Design Hierarchy.....20

    9. Figure 9 AES128_Cipher_Top Module State Diagram.... 22

    10. Figure 10 AES128_Key_Expand Module State Diagram.23

    11. Figure 11 AES128_Key_Expand Module.... 24

    12. Figure 12 AES128_Rcon Module. 25

    13. Figure 13 AES128 Pipelined Round Key Generation and Cipher Rounds...26

    14. Figure 14 AES128 Test Infrastructure...28

    15. Figure 15 AES128_Top Definition....29

    16. Figure 16 AES128_Interface Definition... 30

    17. Figure 17 Class Definition in the AES128_Program.32

    18. Figure 18 AES128_Program Pseudo Code....33

    19. Figure 19 AES128_Testbench_Package Pseudo Code..34

    20. Figure 20 Sample Simulation Results35

    21. Figure 21 AES128 Block Encryption Pseudo Code in C...46

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    13/119

    1

    Chapter 1

    INTRODUCTION

    In todays digital world, encryption is emerging as a disintegrable part of all

    communication networks and information processing systems, for protecting both stored

    and in transit data. Encryption is the transformation of plain data (known as plaintext)

    into unintelligible data (known as ciphertext) through an algorithm referred to as cipher.

    There are numerous encryption algorithms that are now commonly used in computation,

    but the U.S. government has adopted the Advanced Encryption Standard (AES) to be

    used by Federal departments and agencies for protecting sensitive information. The

    National Institute of Standards and Technology (NIST) has published the specifications

    of this encryption standard in the Federal Information Processing Standards (FIPS)

    Publication 197. [1]

    Any conventional symmetric cipher, such as AES, requires a single key for both

    encryption and decryption, which is independent of the plaintext and the cipher itself. It

    should be impractical to retrieve the plaintext solely based on the ciphertext and the

    encryption algorithm, without knowing the encryption key. Thus, the secrecy of the

    encryption key is of high importance in symmetric ciphers such as AES. Software

    implementation of encryption algorithms does not provide ultimate secrecy of the key

    since the operating system, on which the encryption software runs, is always vulnerable

    to attacks.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    14/119

    2

    There are other important drawbacks in software implementation of any encryption

    algorithm, including lack of CPU instructions operating on very large operands, word

    size mismatch on different operating systems and less parallelism in software. In

    addition, software implementation does not fulfill the required speed for time critical

    encryption applications. Thus, hardware implementation of encryption algorithms is an

    important alternative, since it provides ultimate secrecy of the encryption key, faster

    speed and more efficiency through higher levels of parallelism.

    Different versions of AES algorithm exist today (AES128, AES196, and AES256)

    depending on the size of the encryption key. In this project, a hardware model for

    implementing the AES128 algorithm was developed using the SystemVerilog hardware

    description language. A unique feature of the design proposed in this project is that the

    round keys, which are consumed during different iterations of encryption, are generated

    in parallel with the encryption process.

    The hardware model was then completely verified using a testbench, which took

    advantage of the SystemVerilogs object oriented programming (OOP) feature, by

    constructing random test objects and providing them to the model. The validation

    process continued until the model was verified for a certain Functional Coverage. Then,

    the verified model was synthesized using the Synopsis Design-Compiler tool to get an

    estimate of the number of gates, area and timing of the hardware model.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    15/119

    3

    In addition, the AES128 algorithm was modeled in C language and was ported on

    a Simicsvirtual system. The statistics of the Simics virtual system was gathered to get an

    estimate of the time it would take to encrypt a plaintext block on the virtual system.

    Finally, the performances of software and hardware implementations were compared.

    The rest of the report is organized into six chapters. Chapter 2 covers an overview

    of the AES encryption algorithm and different version of it. In this chapter, different

    types of transformations and steps that are involved in the AES encryption process are

    introduced.

    Chapter 3 discusses the design and modeling of the hardware implementation of the

    AES128 encryption algorithm by explaining the modules used in the design hierarchy,

    their interconnections and state diagrams.

    Chapter 4 covers the verification of the hardware model. In this chapter, a test

    infrastructure is developed which fully validates the design. The testbench generates

    random input test vectors for the hardware model and validates its functionality until a

    certain Functional Coverageis met.

    Chapter 5 covers the synthesis of the hardware model using the Synopsys Design

    Compiler synthesis tool. In this chapter, a script is developed to synthesize the design

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    16/119

    4

    into a gate-level netlist using the LSI_10K library file. The synthesis result, including the

    timing and area of the netlist comes at the end of this chapter.

    Chapter 6 covers the software implementation of the AES128 algorithm (in C

    language) and porting it on a Simics virtual system. In addition, the software and

    hardware implementation are compared based on the time it takes to encrypt a block of

    plaintext.

    Finally, in Chapter 7, the research work is summarized and potential improvements

    and suggestions of future works for this project are included.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    17/119

    5

    Chapter 2

    ADVANCED ENCRYPTION STANDARD (AES)

    2.1 Overview

    This chapter is a summary of the Federal Information Processing Standards (FIPS)

    Publication 197[1], issued by theNational Institute of Standards and Technology (NIST)

    which specifies the Advanced Encryption Standard. Throughout the remainder of this

    chapter, the mathematical properties of the Advanced Encryption Standard (AES) are

    introduced using the information obtained from the AES specification.

    The AES is a subset of a much larger encryption algorithm known as Rijndael,

    which was one of many proposals to the NIST competing for becoming a standard

    encryption algorithm. On October of 2000, the NIST announced theRijndaelalgorithm

    as the winner due to the best overall score in security, performance, efficiency,

    implementation capability and simplicity. [2]

    The AES algorithm is a symmetric cipher. In symmetric ciphers, a single secret key

    is used for both the encryption and decryption, whereas in asymmetric ciphers, there are

    two sets of keys known as private and public keys. The plaintext is encrypted using the

    public key and can only be decrypted using the private key.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    18/119

    6

    In addition, the AES algorithm is a block cipher as it operates on fixed-length

    groups of bits (blocks), whereas in streamciphers, the plaintext bits are encrypted one at

    a time, and the set of transformations applied to successive bits may vary during the

    encryption process.

    The AES algorithm operates on blocks of 128 bits, by using cipher keys with

    lengths of 128, 192 or 256 bits for the encryption process. Although the original Rijndael

    encryption algorithm was capable of processing different blocks sizes as well as using

    several other cipher key lengths, but the NIST did not adopt these additional features in

    the AES. [1]

    2.2 Inputs, Outputs and the State

    The plaintext input and ciphertext output for the AES algorithms are blocks of

    128 bits. The cipher key input is a sequence of 128, 192 or 256 bits. In other words the

    length of the cipher key, Nk, is either 4, 6 or 8 words which represent the number of

    columns in the cipher key. The AES algorithm is categorized into three versions based

    on the cipher key length. The number of rounds of encryption for each AES version

    depends on the cipher key size.

    In the AES algorithm, the number of rounds is represented byNr, whereNr = 10

    whenNk= 4,Nr= 12 whenNk= 6, andNr= 14 whenNk= 8. The following table

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    19/119

    7

    illustrated the variations of the AES algorithm. For the AES algorithm the block size

    (Nb), which represents the number of columns comprising the StateisNb = 4.

    AES Version

    Key Length

    (Nkwords)

    Block Size

    (Nbwords)

    Number of Rounds

    (Nrrounds)

    AES128 4 4 10

    AES192 6 4 12

    AES256 8 4 14

    Table 1 AES Variations

    The basic processing unit for the AES algorithm is a byte. As a result, the plaintext,

    ciphertext and the cipher key are arranged and processed as arrays of bytes. For an input,

    an output or a cipher key denoted by a, the bytes in the resulting array are referenced as

    an, where n is in one of the following ranges:

    Block length = 128 bits, 0

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    20/119

    8

    All byte values in the AES algorithm are presented as the concatenation of their

    individual bit values between braces in the order {b7, b6, b5, b4, b3, b2, b1, b0}. These

    bytes are interpreted as finite field elements using a polynomial representation:

    i

    i

    ixbxbxbxbxbxbxbxbxb =

    =+++++++

    7

    0

    012

    3

    3

    4

    4

    5

    5

    6

    6

    7

    7

    As an example, {10001001} (or {85} in hexadecimal) identifies the polynomial

    137++xx . The arrays of bytes in the AES algorithm are represented as naaaa ...210 .

    All the AES algorithm operations are performed on a two dimensional 4x4 array

    of bytes which is called the State, and any individual byte within the Stateis referred to

    as sr,c, where letter rrepresent the row and letter cdenotes the column. At the

    beginning of the encryption process, the Stateis populated with the plaintext. Then the

    cipher performs a set of substitutions and permutations on the State. After the cipher

    operations are conducted on the State, the final value of the state is copied to the

    ciphertext output as is shown in the following figure.

    in0 in4 in8 in12

    in1 in5 in9 in13

    in2 in6 in10 in14

    in3 in7 in11 in15

    s0,0 s0,1 s0,2 s0,3

    s1,0 s1,1 s1,2 s1,3

    s2,0 s2,1 s2,2 s2,3

    s3,0 s3,1 s3,2 s3,3

    out0 out4 out8 out12

    out1 out5 out9 out13

    out2 out6 out10 out14

    out3 out7 out11 out15

    Figure 1 State Population and Results

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    21/119

    9

    At the beginning of the cipher, the input array is copied into the Stateaccording

    the following scheme:

    s[r,c] = in [r + 4c] for 40

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    22/119

    10

    - MixColumns( )- AddRoundKey ( )

    The AES cipher is described as a pseudo code in Figure 2. [1] As shown in the

    pseudo code, all the Nrrounds are identical with the exception of the final round which

    does not include theMixColumnstransformation. The array w[] represents the round

    keys that are generated by the key expansion routine. In the following sections,

    individual transformations that are used in each encryption round are described.

    Cipher(byte PlainText[4*Nb], byte CipherText[4*Nb], word w[Nb*(Nr+1)])

    begin

    byte state[4,Nb]

    state = in

    AddRoundKey(state, w[0, Nb-1])

    for round = 1 step 1 to Nr1

    SubBytes(state)

    ShiftRows(state)

    MixColumns(state)

    AddRoundKey(state, w[round*Nb, (round+1)*Nb-1])

    end for

    SubBytes(state)

    ShiftRows(state)

    AddRoundKey(state, w[Nr*Nb, (Nr+1)*Nb-1])

    out = state

    end

    Figure 2 AES Cipher

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    23/119

    11

    2.3.1 - SubBytes ( ) Transformation

    The SubBytesis a byte substitution operation performed on individual bytes of the

    State, as shown in Figure 3, using a substitution table called S-box.

    s0,0 s0,1 s0,2 s0,3

    s1,0 s1,1 s1,2 s1,3s2,0 s2,1 s2,2 s2,3

    s3,0 s3,1 s3,2 s3,3

    s0,0 s

    0,1 s

    0,2 s

    0,3

    s

    1,0 s

    1,1 s

    1,2 s

    1,3s2,0 s

    2,1 s

    2,2 s

    2,3

    s3,0 s

    3,1 s

    3,2 s

    3,3

    Figure 3 SubBytes Transformation

    The invertible S-box table is constructed by performing the following transformation on

    each byte of the State. [1]

    - Take the multiplicative inverse in the finite field GF(28) of the byte.- Apply the following transformation to the byte:

    iiiiiii cbbbbbb = ++++ 8mod)7(8mod)6(8mod)5(8mod)4('

    The biis the ithbit of the byte and ciis the i

    thbit of a constant byte with the value of {63}.

    The combination of the two transformations can be expressed in matrix form as shown

    below:

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    24/119

    12

    +

    =

    0

    1

    1

    0

    0

    0

    1

    1

    11111000

    01111100

    00111110

    00011111

    10001111

    11000111

    11100011

    11110001

    7

    6

    5

    4

    3

    2

    1

    0

    '

    7

    '

    6

    '

    5

    '

    4

    '

    3

    '2

    '

    1

    '

    0

    b

    b

    b

    b

    b

    b

    b

    b

    b

    b

    b

    b

    b

    b

    b

    b

    The S-box table shown in Table 2 is constructed by performing the two

    transformations described earlier for all possible values of a byte, ranging from {00} to

    {ff}. For example the substitution value for {53} would be determined by the

    intersection of the row with index 5 and the column with index 3.

    Y

    0 1 2 3 4 5 6 7 8 9 a b c d e f

    0 63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76

    1 ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0

    2 b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 15

    3 04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75

    4 09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84

    5 53 d1 00 ed 20 fc b1 5b 6a cb be 39 4a 4c 58 cf

    6 d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8

    7 51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff f3 d2

    8 cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73

    9 60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db

    Ae0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79

    B e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08

    C ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a

    D 70 3e b5 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e

    E e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df

    X

    F 8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16

    Table 2 AES S-box

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    25/119

    13

    2.3.2 - ShiftRows ( ) Transformation

    The ShiftRowstransformation cyclically shifts the last three rows of the state by

    different offsets. The first row is left unchanged in this transformation. Each byte of the

    second row is shifted one position to the left. The third and fourth rows are shifted left

    by two and three positions, respectively. The ShiftRowstransformation is illustrated in

    Figure 4.

    s0,0 s0,1 s0,2 s0,3

    s1,0 s1,1 s1,2 s1,3

    s2,0 s2,1 s2,2 s2,3

    s3,0 s3,1 s3,2 s3,3

    s0,0 s0,1 s0,2 s0,3

    s1,1 s1,2 s1,3 s1,0

    s2,2 s2,3 s2,0 s2,1

    s3,3 s3,0 s3,1 s3,2

    Figure 4 ShiftRows Transformation

    2.3.3 MixColumns ( ) Transformation

    This transformation operates on the columns of the State, treating each columns

    as a four term polynomial the finite field GF(28). Each columns is multiplied modulo

    x4+1 with a fixed four-term polynomial a(x) = {03}x

    3+ {01}x

    2+ {01}x + {02} over the

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    26/119

    14

    GF(28). TheMixColumnstransformation can be expressed as a matrix multiplication as

    shown below:

    =

    c

    c

    c

    c

    c

    c

    c

    c

    s

    s

    s

    s

    s

    s

    s

    s

    ,0

    ,0

    ,0

    ,0

    '

    ,3

    '

    ,2

    '

    ,1

    '

    ,0

    02010103

    03020101

    01030201

    01010302

    TheMixColumnstransformation replaces the four bytes of the processed column

    with the following values:

    ccccc sssss ,3,2,1,0'

    ,0 )}03({)}02({ =

    ccccc sssss ,3,2,1,0'

    ,1 )}03({)}02({ =

    )}03({)}02({ ,3,2,1,0'

    ,0 ccccc sssss =

    )}02({)}03({ ,3,2,1,0'

    ,1 ccccc sssss =

    The corresponds to the multiplication of polynomials in GF(28) modulo an

    irreducible polynomial of degree 8. A polynomial is irreducible if its only divisors are

    one and itself. For the AES algorithm the irreducible polynomial is:

    m(x) = x8+ x 4+ x3+ x +1.[1]

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    27/119

    15

    TheMixColumnstransformation is illustrated in Figure 5. This transformation

    together with ShiftRows, provide substantial diffusionin the cipher meaning that the

    result of the cipher depends on the cipher inputs in a very complex way. In other words,

    in a cipher with a good diffusion, a single bit change in the plaintext will completely

    change the ciphertext in an unpredictable manner.

    s0,0 s0,1 s0,2 s0,3

    s1,0 s1,1 s1,2 s1,3

    s2,0 s2,1 s2,2 s2,3

    s3,0 s3,1 s3,2 s3,3

    s0,0 s0,1 s0,2 s0,3

    s1,1 s1,2 s1,3 s1,0

    s2,2 s2,3 s2,0 s2,1

    s3,3 s3,0 s3,1 s3,2

    Figure 5 MixColumns Transformation

    2.3.4 AddRoundKey ( ) Transformation

    During theAddRoundKeytransformation, the round key values are added to the

    Stateby means of a simpleExclusive Or(XOR) operation. Each round key consists of

    Nbwords that are generated from the KeyExpansion routine. The round key values are

    added to the columns of the state in the following way:

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    28/119

    16

    [ ] [ ] [ ]cNbroundcccccccc wssssssss += *,3,2,1,0'

    ,3

    '

    ,2

    '

    ,1

    '

    ,0 ,,,,,, for bNc

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    29/119

    17

    According to the Federal Information Processing Standards (FIPS) Publication

    197 [1], there is no restriction on the cipher key selection, as no week cipher key has been

    identified for the AES algorithm. The expansion of the cipher key into the round keys is

    performed by the KeyExpansionalgorithm as shown in the pseudo code in Figure 7. [1]

    KeyExpansion(byte CipherKey[4*Nk], word w[Nb*(Nr+1)], Nk)

    begin

    word temp

    i = 0

    while (i < Nk)

    w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3])

    i = i+1

    end while

    i = Nk

    while (i < Nb* (Nr+1)]

    temp = w[i-1]

    if (i mod Nk= 0)

    temp = SubWord(RotWord(temp)) xor Rcon[i/Nk]

    else if (Nk> 6 and i mod Nk= 4)temp = SubWord(temp)

    end if

    w[i] = w[i-Nk] xor temp

    i = i + 1

    end while

    end

    Figure 7 KeyExpansion Algorithm

    In the above pseudo code, the array w[]represents the round keys that are generated

    by the KeyExpansionroutine andNkrepresents the size of the cipher key. Depending on

    the version of the AES algorithm,Nk=4, 6 or 8. The firstNkwords of the expanded key

    are filled with the cipher key.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    30/119

    18

    The SubWord( ) function applies the same S-box substitution to each of the four

    bytes in the word. TheRotWord( ) function takes a word [a0,a1,a2,a3] as input and

    perform a cyclic shift and returns the word [a1,a2,a3,a0]. The round constant word array,

    Rcon[i], contains a 32 bit value given by [{02} i-1,{00},{00},{00}].

    Every following round key , w[i], is equal to the XOR of the previous round key,

    w[i-1], and the wordNkpositions earlier, w[i-Nk]. For words in positions that are a

    multiple ofNk, two transformations are initially applied to the previous round key, w[i-1].

    These transformations are a cyclic shift of the bytes in the previous round key, followed

    by the application of the S-box table lookup to all four bytes of the word. Afterwards, an

    XOR with a round constant value, Rcon[i], is applied to the previous round key.

    The KeyExpansionroutine for the AES256 (Nk=8) is slightly different than the

    AES128 and AES192 ones, as an additional SubWordfunction is applied to the previous

    round key, w[i-1], prior to the XOR with w[i-Nk].

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    31/119

    19

    Chapter 3

    AES128 DESIGN AND IMPLEMENTATION

    3.1 Overview

    In this chapter, a hardware model for implementing the AES128 algorithm is

    introduced. The model is implemented using the SystemVerilog hardware description

    language [5]. This chapter covers the design and implementation issues of the AES128

    algorithm. In the next chapter, a test infrastructure is presented that thoroughly tests the

    functionality of the implemented model. The hardware model developed in this chapter

    is synthesizable. This means that the model provides a cycle-by-cycle RTL description

    of the circuit that a logic synthesis tool can convert to an optimized gate-level netlist. [3]

    The modeling process utilized in this project is the bottom-up approach. This

    means that the leaf components in the design hierarchy were developed first and the

    higher-level modules were constructed by instantiating their subcomponents and

    connecting them with the internal signals. All the modules in the design hierarchy were

    modeled in behavioral style, but the root module consisted of data flow modeling as well

    to implement the four major cipher transformations.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    32/119

    20

    3.2 Design Hierarchy

    The proposed AES128 hardware model is a 3-level hierarchical design as shown in

    Figure 8. The root module in the hierarchy is the AES128_cipher_top. This module

    implements the AES128 pseudo code displayed in Figure 2. It has two 128-bit inputs for

    receiving the cipher key and the plaintext. There is also a single bit input signal, Ld,

    which is used to indicate the availability of a new set of plaintext or cipher key on the

    input ports. The completion of the encryption process is indicated by asserting the done

    single bit output.

    AES128_Cipher_Top

    AES128_Key_Expand

    AES128_Rconclk

    rst

    plaintext

    done

    128 b

    128 b

    128 b

    ciphertext

    cipherkey

    ld

    Figure 8 Design Hierarchy

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    33/119

    21

    A unique feature of the proposed design is that the AES128_Key_Expand module is

    pipelined with the AES128_cipher_top module. While the AES128_cipher_top module

    is performing an iteration of the encryption transformations on the Stateusing the

    previously generated round keys, the AES128_Key_Expand produces the next rounds

    set of keys to be used by the root module in the next encryption iteration.

    3.2.1 AES128 Encryption Process

    The AES128_cipher_top module state diagram is shown in Figure 9. There are ten

    rounds of transformations represented by r1to r10states. The four cipher

    transformations introduced in section 2.3 are applied to each state. The r0state

    corresponds to the initialAddRoundKeytransformation in Figure 2.

    After leaving theResetstate, the AES128_Cipher_Top module waits for assertion

    of the Ldsignal, which indicates that a valid set of plaintext and cipher key is available

    on the input ports. After reaching the r0state, there is a transition on every clock cycle

    for the next ten cycles, as ten rounds of encryption is applied to the State.

    After going through ten rounds of transformations, the donesignal is asserted to

    indicate the completion of cipher and availability of the ciphertext on the corresponding

    output port.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    34/119

    22

    Figure 9 AES128_Cipher_Top Module State Diagram

    3.2.2 AES128 Round Key Generation

    The round keys used by the AES128_Cipher_Top module are generated based on

    the state diagram shown in Figure 10. The AES128_Key_Expand and the

    AES128_RCon modules are responsible for generating the round keys. These two

    modules operate based on the state diagram shown in Figure 10, which is slightly

    different than the one used for the encryption process.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    35/119

    23

    r0

    r8

    r10 r6

    r5

    r7

    Reset

    r4

    r3r1

    r2

    r9

    rst

    !rst

    Ld

    !Ld

    clk clk

    clk

    clk

    clk

    clk

    clkclk

    clk

    clk

    States Outputs

    --------------- ---------------------------------R0 R10 w0 = roundkey(Round*i)

    w1 = roundkey(Round*i+1)

    w2 = roundkey(Round*i+2)w3 = roundkey(Round*i+3)

    Figure 10 AES128_Key_Expand Module State Diagram

    In the state diagram shown above, the Ld signal is checked in the r0state and if

    asserted, then the cipher key is provided to the AES128_Cipher_Top module to be used

    for the initialAddRoundKeytransformation.

    The AES128_Key_Expand module generates four 32-bit keys for each round of the

    encryption process, by using the cipher key. Figure 12 shows the block diagram of the

    AES128_Key_Expand module. The cipher key is passed to this module through a 128-

    bit input port, and the round keys are generated on the four output ports.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    36/119

    24

    AES128_Key_Expand

    clk

    rst

    128 bcipherkey

    ld

    32 b w3

    32 b w2

    32 b w1

    32 b w0

    Figure 11 AES128_Key_Expand Module

    There is a 32-bit round constant value, which is used by the key expansion

    algorithm to generate the round keys. This value varies for each encryption round and for

    Nr=1 to Nr=10 is given by [{02}i-1

    ,{00},{00},{00}]. The AES128_RCcon module is used

    to generate this value as shown in Figure 13. The AES128_RCon module also operates

    based on the state diagram shown in Figure 10.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    37/119

    25

    AES128_RCon

    clk

    rst

    ld

    32 b rcon

    Figure 12 AES128_Rcon Module

    3.3 AES128 Pipelined Design

    As stated earlier in this chapter, the round key generation in the proposed design is

    pipelined with the encryption rounds. The pipelined operation of the round key

    expansion and the cipher is shown in Figure 11. Each AES encryption round n(white

    cells) is pipelined with the key generation for round n+1(gray cells).

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    38/119

    26

    r0

    r10 r1

    r9 r0

    r8 r10

    r7 r9

    r6 r8

    r5 r7

    r4 r6

    r3 r5

    r2 r4

    r1 r3r0 r2wait for

    ld r1

    reset r0

    resetFigure 13 AES128 Pipelined Round Key Generation and Cipher Rounds

    The most important advantage of the pipelined design is the lower delay for each

    encryption iteration, since the round keys for each encryption iteration is present at the

    beginning of the iteration cycle. The lower delay in each encryption iteration means

    faster completion of each round of encryption. This reduces the overall encryption delay

    and allows the design to operate at higher clock frequencies. The higher clock frequency

    will increase the message encryption rate (throughput) making this design suitable for

    time critical encryption applications.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    39/119

    27

    Chapter 4

    AES128 VERIFICATION

    4.1 Overview

    In this chapter, we describe the test infrastructure that is developed in

    SystemVerilog to verify the functionality of the model described in the previous chapter.

    The simulation was done using the Synopsis VCS tool. The testbench fully validated the

    design by constructing random cyclic test vectors for the plaintext and the cipher key,

    passing them to the model, and comparing the ciphertext to the expected result.

    4.2 Testbench Infrastructure

    There are four major steps involved in verifying a design using an HDL, including

    test vector generation, passing the test vectors to the design and capturing the design

    response, determining correctness by comparing the design response with the expected

    results, and measuring the verification coverage. The test infrastructure described in this

    chapter performs all the above steps in a systematic way.

    The AES128 test infrastructure contains several components, some of which are

    unique SystemVerilog features. These SystemVerilog features make the verification of a

    design more reliable and more structured. The test infrastructure components are

    displayed in Figure 14 as part of the AES128_Top module.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    40/119

    28

    Figure 14 AES128 Test Infrastructure

    The test infrastructure utilizes the SystemVerilogprogramblock, which has

    multiple implicit timing regions to evaluate the design events separately from the

    testbench events. Theprogramblock is connected to the model through another unique

    feature of the SystemVerilog, calledInterface.

    TheInterfacebundles the connections between the testbench and the design while

    enforcing the synchronization and communication protocol between the two entities. [4]

    The definition of the AES128_Top module in SystemVerilog is shown in Figure 15,

    which has the high-level instantiation of the modules constructing the test infrastructure.

    AES128_Cipher_Top

    AES128_Key_Expand

    AES128_rcon

    AES128_Program

    AES128_Interface

    AES128_TopClock Generator

    Clk

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    41/119

    29

    module top;

    bit clk;

    always #5 clk=~clk;

    AES128_interface intf(clk);

    AES128_program prog(intf);

    AES128_cipher_top aes(intf);

    endmodule

    Figure15 AES128_Top Definition

    The AES128_Top module instantiates the design,Interfaceand the Program. The

    Interface and the Programconstructs are discussed in the next two sections. The clock generator

    is defined inside the AES128_top module as well, to avoid any potential race conditions. [4]

    4.3 AES128_Interface

    As designs are becoming more complex, the number of module ports and the

    complexity of the interconnections between the modules are also increasing. The

    SystemVerilogInterfaceconstruct is the solution for properly connecting the modules as

    it provides an intelligent means of communication between several modules.

    TheInterface bundles the ports together and enforces synchronization between the

    modules connected through it. TheInterface can provide connectivity between design

    modules and/or testbench. The modport construct is used in anInterface to specify the

    direction of signals that are bundled together and to group the signals that are

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    42/119

    30

    synchronous to a specifc clock. In this project, the SystemVerilogInterfacewas only

    used to connect the high-level design with the testbench as shown in Figure 14. As a

    result, there were two modportsdeclared for theInterface in this project.

    In anInterface, the signals that are synchronous to a clock are defined inside a

    Clocking Block to ensure correct timing between the testbench and the high-level design.

    This ensures that any synchronous signal is driven or sampled with respect to clock and

    eliminates the potential race condition that exists between the testbench and high-level

    design written in Verilog. The AES128_Interface definition is shown in Figure 16.

    Interface AES128_interface(input bit clk);

    logic rst, ld, done;

    logic [127:0] key, text_in, text_out;

    clocking cb @(posedge clk);

    output ld ;

    output key;output text_in;

    input done;

    input text_out;

    endclocking

    modport dut(

    input clk,

    input rst,

    input ld ,

    input key,

    input text_in,

    output done,

    output text_out);

    modport tb(

    input clk,

    output rst,

    clocking cb);

    endinterface

    Figure 16 AES128_Interface Definition

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    43/119

    31

    4.3 AES128_Program

    In Verilog, a testbench is basically another module which is connected to the high-

    level design. This can cause a race condition between the testbench and the design. [4]

    SystemVerilog hardware description language introduces a new construct called Program

    to be used as the testbench. The SystemVerilog Program, having one (or more entry)

    points, is closer to a program in C, than Verilogs many small blocks of concurrently

    executing hardware [4]. It also has multiple implicit timing regions to evaluate the

    design events separately from the testbench event, eliminating any race condition

    between the design under test and the testbench.

    The testbench described in this chapter consists of a single Program, which uses the

    Object Oriented Programming feature of SystemVerilog to dynamically build random test

    vectors. This is done by defining a Classinside the AES128_Program that encapsulates

    two random cyclic variables (Properties) for generating stimulus to the high-level design.

    The class defined in the AES128_Program is shown in Figure 17.

    As stated earlier in this chapter, another important feature of a testbench is keeping

    track of the verification coverage. In other words, to make sure that a design is

    thoroughly verified, the testbench needs to test all the design features. Functional

    Coverage is a measure of which design features have been exercised by the test. [4]

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    44/119

    32

    Functional Coverageis done by means of Cover Groups defined inside the

    SystemVerilog Program. Each Cover Group consists of multiple Cover Pointsthat are

    the variables used for generating stimulus for the design under test. As it is shown in

    Figure 17, the class defined in the AES128_Program uses a single Cover Groupto keep

    track of the 128-bitplain_text and cipher_key stimuli. Due to limitations of the Synopsys

    VCS compiler that limits the cyclic random objects to no more than 16 bits, the 128-bit

    stimuli are broken into arrays of 16-bit elements. Each array element is declared as a

    Cover Point inside the Cover Group to be sampled together for measuring the Functional

    Coverage.

    class Transaction;

    randc bit [15:0] plain_text[8];

    randc bit [15:0] cipher_key[8];

    covergroup Coverage;

    coverpoint this.plain_text[0];

    coverpoint this.plain_text[1];

    coverpoint this.plain_text[2];coverpoint this.plain_text[3];

    coverpoint this.plain_text[4];

    coverpoint this.plain_text[5];

    coverpoint this.plain_text[6];

    coverpoint this.plain_text[7];

    coverpoint this.cipher_key[0];

    coverpoint this.cipher_key[1];

    coverpoint this.cipher_key[2];

    coverpoint this.cipher_key[3];

    coverpoint this.cipher_key[4];

    coverpoint this.cipher_key[5];

    coverpoint this.cipher_key[6];

    coverpoint this.cipher_key[7];

    endgroup

    function new;

    Coverage = new();

    endfunction

    endclass

    Figure 17 Class Definition in the AES128_Program

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    45/119

    33

    The AES128_Program pseudo code is shown in Figure 18. This testbench verifies

    the design until the Functional Coverageis 100%. The verification procedure involves

    generating the stimuli and passing them through the AES128_Interface to the design

    under test and verifying correctness of the results obtained from the design.

    Class Transaction

    // see Figure 17

    end class

    initial begin

    //reset the design

    while (Functional_Coverage < 100) begin

    // randomize the cover points

    // populate palin_text & cipher_key using the cover points

    // calculate the expected ciphertext using the following function

    aes128_cipher(plain_text, cipher_key, expected_cipher_text);

    // pass the stimuli to the design and wait for the result

    // compare the expected result with the ciphertext generated by

    // the design to determine correctness

    // sample the Functional Coverage percentage

    end

    $finish;

    endFigure 18 AES128_Program Pseudo Code

    To verify the correct functionality of the design under test, a C-style function is

    developed in SystemVerilog, which takes the stimuli as input and calculates the expected

    ciphertext. This function is defined as part of package that contains all the variables and

    routines involved in the encryption process as shown in Figure 19.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    46/119

    34

    package AES128_testbench_package

    logic [7:0] state [4][4];

    function aes128_KeyExpansion(input bit [127:0] cipher_key);

    //generate the round keys

    endfunction

    function aes128_SubBytes();

    //performs SubBytes transformation on the state

    endfunction

    function aes128_ShiftRows();

    //performs ShiftRows transformation on the state

    endfunction

    function aes128_AddRoundKey(input int round);

    //performs AddRoundKey transformation on the stateendfunction

    function aes128_MixColumns();

    //performs MicColumns transformation on the state

    endfunction

    /*********************************************************************/

    function aes128_cipher( input bit [127:0] plain_text, input bit [127:0]

    cipher_key, output [127:0] expected_cipher_text);

    state = plain_text;

    aes128_KeyExpansion(cipher_key);

    aes128_AddRoundKey(0);

    for(round=1;round

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    47/119

    35

    The complete simulation result of the testbench is included in Appendix C.

    Figure 20 illustrates the simulation result for the first three test cases. Each test case starts

    with randomizing the cover points to populate the plaintext and cipher key inputs to the

    design under test. Then, the expected ciphertext is calculated using the AES128_cipher

    function shown in Figure 19. After the design under test has encrypted the plaintext and

    the done signal is asserted, the ciphertext generated by the hardware model is compared

    with the expected result to catch any mismatch. The last step in each test case is gathering

    the Functional Coverageand continuing with the next test case until all design features

    are tested.

    Test# 0

    plain_text=55f529e00b1a3f14d8a746860e9b533e

    cipher_key=bbda8d5457141b255a022fee50b6461c

    expected_cipher_text:116340860130033742714813403090106826404

    intf.cb.text_out: 116340860130033742714813403090106826404

    *****+++++Match+++++*****

    Functional Coverage = %1.562500

    Test# 1

    plain_text=37500380d9d6dccbf474334e02c23ec9

    cipher_key=fd1f4dd414ec0fec5078a0a5ef328294

    expected_cipher_text:279883244544087465675915927115776104969

    intf.cb.text_out: 279883244544087465675915927115776104969

    *****+++++Match+++++*****

    Functional Coverage = %3.125000

    Test# 2

    plain_text=dd27152407a1dfc8f2c67423377b3d28

    cipher_key=e9a308df435809a059ce2b9e26b08c8b

    expected_cipher_text: 55911193611511870268248153978729662868

    intf.cb.text_out: 55911193611511870268248153978729662868

    *****+++++Match+++++*****

    Functional Coverage = %4.394531

    Figure 20 Sample Simulation Results

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    48/119

    36

    Chapter 5

    AES128 SYNTHESIS

    5.1 Overview

    A primary objective of this project was to develop a synthesizable model for the

    AES128 encryption algorithm. Synthesis is the process of converting the register transfer

    level (RTL) representation of a design into an optimized gate-level netlist. This is a

    major step in ASIC design flow that takes an RTL model closer to a low-level hardware

    implementation.

    Synthesis consists of three main steps. The first step is the Translation,which

    involves converting the RTL description of a design into a non-optimized intermediate

    representation that is used by the synthesis tool. The second step is the logic

    optimization, which optimizes the internal representation by removing redundant logic

    and performing Boolean logic optimizations. The third step is called technology

    mapping & optimizationwhich maps the internal representation to an optimized gate

    level representation using the technology library cells based on design constraints.[3]

    In this chapter, we describe how the Synopsys Design_Compiler tool was utilized to

    synthesize the verified AES128 model, by using a script that was developed to perform

    the synthesis based on certain constraints. The script generates several reports about the

    synthesis outcome including timing and area estimates.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    49/119

    37

    5.2 Synthesis Methodology

    The first step in the synthesis process is to read all the components in the design

    hierarchy. There are three components in the 3-level design hierarchy that needs to be

    synthesized. Since the RTL model utilizes a SystemVerilog Package, then the

    synthesis tool needs to enable the semantics of a package. In addition, the synthesis tool

    needs to know if there are multiple instances of calling an automatic function in the

    design, to preserve separate values for each instance.

    The following Synopsys Design Compiler (DC) shell commands enable package and

    automatic function utilizations:

    set hdlin_sv_packages "enable"

    set hdlin_infer_function_local_latches "true"

    Then, the package and the modules in the design hierarchy are read using the following

    commands:

    read_file -format sverilog {./AES128_DUT_package.sv}

    read_file -format sverilog {./AES128_rcon.sv}

    read_file -format sverilog {./AES128_key_expand.sv}

    read_file -format sverilog {./AES128_cipher_top.sv}

    After reading the design files, they are Analyzedand Elaborated through

    which the RTL code is converted into the Synopsys Design Compiler internal format. [6]

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    50/119

    38

    The intermediate results are stored in the defined working library. The following DC

    commands are used for these steps:

    analyze -library WORK -format sverilog {./AES128_rcon.sv}

    analyze -library WORK -format sverilog {./AES128_key_expand.sv}

    analyze -library WORK -format sverilog {./AES128_cipher_top.sv}

    elaborate AES128_rcon -architecture verilog -library WORK

    elaborate AES128_key_expand -architecture verilog -library WORK

    elaborate AES128_cipher_top -architecture verilog -library WORK

    Then, the dont_touchattribute is removed from all the modules in the design

    hierarchy so that during the optimization phase the tool can modify the modules. The

    following DC command is used for this step:

    remove_attribute [find design -hierarchy] dont_touch

    After this step, a 40MHz clock signal is applied to the clock port of the root

    module, and the synthesis tool is programmed not to modify the clock tree during the

    optimization phase. In addition, an arbitrary input delay of 5ns with respect to the clock

    port is applied to all input and output ports (except the clock port itself) to set a safe

    margin by considering any unintended source of delay such as the delay associated with

    driving module/modules.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    51/119

    39

    Then, the design is constrained with hypothetical maximum area equal to zero to

    force the tool to make the gate level netlist as compact as possible. The following DC

    commands are used for these steps:

    create_clock -name clk -period 25 [find port intf_clk]

    set_dont_touch_network [find clock "clk"]

    set non_clock_ports [remove_from_collection [all_inputs]

    [get_ports intf_clk]]

    set_input_delay 5 $non_clock_ports -clock clkset_output_delay 5 [all_outputs]

    set_max_area 0

    In the next steps, the tool is programmed to consider a unique design for each cell

    instance by removing the multiply-instantiated hierarchy in the current design. Then, the

    synthesis script removes the boundaries from all the components in the design hierarchy

    and removes all levels of hierarchy.

    uniquify

    set_boundary_optimization [find design -hierarchy] true

    ungroup -all -flatten -all_instances

    Finally, the tool compiles the design with high effort and reports any warning

    related the mapping and final optimization step. At the end, the tool generates reports for

    the optimized gate level netlist area, the worst combinational path timing, and any

    violated design constraint.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    52/119

    40

    report_attribute > ./Synthesis_Reports_Attribute.txt

    report_area > ./Synthesis_Reports_Area.txt

    report_constraints -all_violators >

    ./Synthesis_Reports_Constraint_Violaters.txt

    report_timing -path full -delay max -max_paths 1 -nworst 1 >

    ./Synthesis_Reports_Timing.txt

    5.3 Synthesis Timing Result

    The synthesis tool optimizes the combinational paths in a design. In General, four

    types of combinational paths can exist in any design: [3]

    1- Input port of the design under test to input of one internal flip-flip2- Output of an internal flip-flip to input of another flip-flip3- Output of an internal flip-flip to output port of the design under test4- A combinational path connecting the input and output ports of the design

    under test

    The last DC command in the script developed in previous section, instructs the tool

    to report the path with the worst timing. In this case, the path with the worst timing is a

    combinational path of type two. The delay associated with this path is the summation of

    delays of all combinational gates in the path plus the Clock-To-Qdelay of the originating

    flip-flop, which was calculated as 24.09ns. By considering the setup time of the

    destination flip-flop in this path, which is 0.85ns, the 40MHz clock signal satisfies the

    worst combinational path delay. The delays of combinational gates, setup time of flip-

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    53/119

    41

    flops and Clock-To-Qvalues are derived from the LSI_10k library file that was used for

    the mapping step during synthesis. The synthesis timing report is shown below:

    ****************************************

    Report : timing

    -path full

    -delay max

    -max_paths 1

    Design : AES128_cipher_top

    Version: Z-2007.03

    Date : Mon Nov 16 21:25:14 2009

    ****************************************

    Operating Conditions:

    Wire Load Model Mode: top

    Startpoint: u0/w3_reg[22]

    (rising edge-triggered flip-flop clocked by clk)

    Endpoint: u0/w2_reg[27]

    (rising edge-triggered flip-flop clocked by clk)

    Path Group: clk

    Path Type: max

    Point Incr Path

    -----------------------------------------------------------

    clock clk (rise edge) 0.00 0.00

    clock network delay (ideal) 0.00 0.00

    u0/w3_reg[22]/CP (FD2) 0.00 0.00 ru0/w3_reg[22]/Q (FD2) 1.84 1.84 f

    U12175/Z (ND2) 2.01 3.86 r

    U11490/Z (IVP) 0.49 4.35 f

    U952/Z (ND2) 1.46 5.81 r

    U11501/Z (IVP) 0.42 6.24 f

    U11511/Z (ND2P) 1.25 7.48 r

    U907/Z (IV) 0.39 7.87 f

    U11489/Z (ND2) 1.05 8.92 r

    U828/Z (NR2) 0.37 9.29 f

    U11485/Z (NR4) 1.58 10.87 r

    U818/Z (ND4) 0.59 11.46 f

    U11728/Z (NR4) 2.10 13.56 r

    U553/Z (AN3) 0.84 14.40 r

    U542/Z (ND4) 0.73 15.13 fU541/Z (AO1) 1.50 16.63 r

    U540/Z (IV) 0.21 16.84 f

    U534/Z (NR16) 2.42 19.26 r

    U533/Z (EN) 1.26 20.51 f

    U11486/Z (EN) 1.37 21.89 r

    U118/Z (EO) 1.13 23.01 f

    U117/Z (EON1) 1.08 24.09 r

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    54/119

    42

    u0/w2_reg[27]/D (FD2) 0.00 24.09 r

    data arrival time 24.09

    clock clk (rise edge) 25.00 25.00

    clock network delay (ideal) 0.00 25.00

    u0/w2_reg[27]/CP (FD2) 0.00 25.00 r

    library setup time -0.85 24.15

    data required time 24.15

    -----------------------------------------------------------

    data required time 24.15

    data arrival time -24.09

    -----------------------------------------------------------

    slack (MET) 0.06

    5.4 Synthesis Area Result

    The synthesis area report shows the total number of cells and nets in the netlist. It

    also uses the area parameter associated with each cell in the LSI_10K library file, to

    calculate the total combinational and sequential area of the netlist. The total area of the

    gate level netlist is unknown since it depends on total area of the interconnects, which

    itself is a function of the wiring load model used in physical design. The total cell area in

    the netlist is reported as 22978 units, which is the sum of combinational and sequential

    areas. The synthesis area report is shown below:

    Information: Updating design information... (UID-85)

    ****************************************

    Report : area

    Design : AES128_cipher_topVersion: Z-2007.03

    Date : Mon Nov 16 21:25:14 2009

    ****************************************

    Library(s) Used:

    lsi_10k (File: /usr/pkg/syn/libraries/syn/lsi_10k.db)

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    55/119

    43

    Number of ports: 388

    Number of nets: 12020

    Number of cells: 11574

    Number of references: 42

    Combinational area: 19045.000000

    Noncombinational area: 3940.000000

    Net Interconnect area: undefined (No wire load specified)

    Total cell area: 22985.000000

    Total area: undefined

    5.5 Synthesis Constraint Violators Result

    To enforce the synthesis tool to create the most compact netlist, the area of the gate

    level netlist was constrained to zero during the synthesis process. As a result, the only

    constraint violation, which is expected, is related to the area as shown bellow:

    ****************************************

    Report : constraint

    -all_violatorsDesign : AES128_cipher_top

    Version: Z-2007.03

    Date : Tue Nov 10 12:50:19 2009

    ****************************************

    max_area

    Required Actual

    Design Area Area Slack

    -----------------------------------------------------------------

    AES128_cipher_top 0.00 22978.00 -22978.00

    (VIOLATED)

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    56/119

    44

    Chapter 6

    AES128 SOFTWARE IMPLEMENTATION

    6.1 Overview

    The optimized gate level netlist generated after synthesizing the hardware model by

    using the LSI_10K technology library can operate at a 40MHz clock signal. Since the

    hardware model takes ten clock cycles (for ten rounds of encryption) to encrypt a 128-bit

    block, the overall delay for encrypting a block of plaintext is 250ns.

    In order to compare the speed of the hardware implementation with that of a

    software implementation, the AES128 algorithm was modeled in C language. The C

    program was then run on a virtual system, and the statistics of the virtual system were

    gathered before and after encrypting a block of plaintext. The number of CPU cycles that

    were required on the virtual system to encrypt a block of plaintext was used to compare

    the efficiency of software and hardware implementations.

    6.2 AES128 Software Implementation on a SimicsVirtual SystemSimics is a complete functional simulation tool for creating virtual platforms that

    supports single-core, multicore, multiple processor, and multiple machine configurations

    (racks, clusters, and distributed systems). [7]

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    57/119

    45

    Simicssupports several processor families (e.g. ARM, MIPS, PowerPC, x86) and

    runs the same binary software as the physical target system. To the target software, the

    virtualized target hardware behaves exactly the same as the physical target hardware. [8]

    In this project, the Simicssoftware was used to create a virtual system based on

    Intels x86 architecture and the 440BX chipset. The target virtual system consisted of a

    2GHz Pentium4 processor and ran the Red Hat 7.3 Enterprise Linux operating system.

    The C program implementing the AES128 encryption algorithm (See Appendix

    D) was ported to the Simicss virtual system and then compiled to create the executable

    file (object file). The virtual systems statistics were gathered during the execution of the

    C program, before and after encrypting a block of plaintext. This was done by using

    the Simicss Magic instruction that called a registered python function for gathering the

    virtual system statistics. The portion of the C code for encrypting a block of plaintext

    is shown in Figure 20. Encrypting a block of plaintext involves copying the block to the

    state, generating the round keys from the cipher key and performing ten rounds of

    encryption on the state.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    58/119

    46

    int main() {

    ...

    MAGIC(1);

    for(i=0;i

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    59/119

    47

    # of CPU Instructions

    User Supervisor ToallCPU Cycles

    Callback 1 1893584723 11589546395282 11591439980005 11591439980005

    Callback 2 1893616460 11589546395282 11591440011742 11591440011742

    Difference 31737 0 31737 31737

    Table 3 Simics Virtual System Statistics

    The User and Supervisor columns refer to the number of instruction that were

    executed in the user space and the system space, respectively. Since the clock per

    instruction for the virtual target was assumed to be one (CPI=1), the total CPU cycles

    was equal to the total number of instructions.

    The results show that encrypting a block of plaintext in software takes more than

    30,000 CPU cycles of the virtual target system. Since the virtual system has a 2GHz

    Pentium4 processor, the encryption of a plaintext block takes more than 15us, which is 60

    times slower than the proposed hardware implementation.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    60/119

    48

    Chapter 7

    CONCLUSION

    In this project, a hardware accelerator for the AES128 encryption algorithm was

    designed, modeled and verified using the SystemVerilog hardware descrition language.

    The Synopsys VCS tool was used for simulation and verification of the model. The

    hardware model was then synthesized using the Synopsys Design Compiler tool. In

    addition, to get an estimate of the speed gain by hardware implementation, a virtual

    system was created using the Virtutech Simicssoftware to run a C program

    implementing the AES128 encryption in software.

    The proposed pipelined design of the AES encryption algorithm reduces the delay

    associated with each round of encryption, which allows the hardware to operate at a

    much higher clock frequencies, compared to a non-pipelined design. This increases the

    message encryption throughput and makes the hardware model suitable for time critical

    encryption applications. In addition, the hardware implementation of AES encryption

    algorithm provides ultimate secrecy of the encryption key, much faster speed compared

    to software implementation, and higher throughput by means of inherent hardware

    concurrency.

    The pipelined design was thoroughly validated by means of a test infrastructure,

    which utilized several unique SystemVerilog features includingInterfaceand Program.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    61/119

    49

    The test infrastructure utilizedInterfaceto enforce synchronization and communication

    protocol between the design and the testbench. The SystemVerilog Programwas used as

    part of the testbench to construct and provide test objects to the design, while eliminating

    any potential race condition between them. The testbench included Functional Coverage

    to measure the verification progress of the design features to make sure the design is fully

    validated.

    The gate level netlist generated during the synthesis phase using the LSI_10K

    technology library is capable of operating at 40MHz frequency, which means the

    proposed model can encrypt a block of plaintext in 250ns after ten clock cycles. We

    expect the design to run at higher frequency if synthesized using a more efficient

    technology library.

    The software implementation of AES128 algorithm (in C language) on a Simics

    virtual system (Intels x86 architecture and a 2GHz Pentium4 processor) showed that it

    would take more than 30,000 CPU cycles (15,000 ns) to encrypt a block of plaintext.

    This shows that the hardware implementation of the AES algorithm proposed in this

    project is more than 60 times faster than the software implementation.

    There are certain aspects of this project that may be explored in future. One

    example is to add decryption capability to the design so that it can perform both

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    62/119

    50

    encryption and decryption. The model can also be extended to perform encryption/

    decryption based on other versions of the AES algorithm.

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    63/119

    51

    APPENDICES

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    64/119

    52

    APPENDIX A

    AES128 Hardware Model Source Files

    AES128_DUT_package.sv

    `ifndef AES128_DUT_package_defined

    `define AES128_DUT_package_defined

    package AES128_DUT_package;

    typedef enum [3:0]

    {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,reset,wait_for_load} aes_rounds_t;

    function automatic [7:0] xtime (input [7:0] b);

    return {b[6:0],1'b0}^(8'h1b&{8{b[7]}});endfunction

    function automatic [31:0] mix_col (input [7:0] s0,s1,s2,s3);

    mix_col={xtime(s0)^xtime(s1)^s1^s2^s3,s0^xtime(s1)^xtime(s2)^s2^s3,

    s0^s1^xtime(s2)^xtime(s3)^s3,xtime(s0)^s0^s1^s2^xtime(s3)};

    endfunction

    function automatic [7:0] sbox(input [7:0] a);

    case (a)

    8'h00: return 8'h63;

    8'h01: return 8'h7c;

    8'h02: return 8'h77;

    8'h03: return 8'h7b;8'h04: return 8'hf2;

    8'h05: return 8'h6b;

    8'h06: return 8'h6f;

    8'h07: return 8'hc5;

    8'h08: return 8'h30;

    8'h09: return 8'h01;

    8'h0a: return 8'h67;

    8'h0b: return 8'h2b;

    8'h0c: return 8'hfe;

    8'h0d: return 8'hd7;

    8'h0e: return 8'hab;

    8'h0f: return 8'h76;

    8'h10: return 8'hca;

    8'h11: return 8'h82;8'h12: return 8'hc9;

    8'h13: return 8'h7d;

    8'h14: return 8'hfa;

    8'h15: return 8'h59;

    8'h16: return 8'h47;

    8'h17: return 8'hf0;

    8'h18: return 8'had;

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    65/119

    53

    8'h19: return 8'hd4;

    8'h1a: return 8'ha2;

    8'h1b: return 8'haf;

    8'h1c: return 8'h9c;

    8'h1d: return 8'ha4;

    8'h1e: return 8'h72;

    8'h1f: return 8'hc0;

    8'h20: return 8'hb7;

    8'h21: return 8'hfd;

    8'h22: return 8'h93;

    8'h23: return 8'h26;

    8'h24: return 8'h36;

    8'h25: return 8'h3f;

    8'h26: return 8'hf7;

    8'h27: return 8'hcc;

    8'h28: return 8'h34;

    8'h29: return 8'ha5;

    8'h2a: return 8'he5;8'h2b: return 8'hf1;

    8'h2c: return 8'h71;

    8'h2d: return 8'hd8;

    8'h2e: return 8'h31;

    8'h2f: return 8'h15;

    8'h30: return 8'h04;

    8'h31: return 8'hc7;

    8'h32: return 8'h23;

    8'h33: return 8'hc3;

    8'h34: return 8'h18;

    8'h35: return 8'h96;

    8'h36: return 8'h05;

    8'h37: return 8'h9a;

    8'h38: return 8'h07;

    8'h39: return 8'h12;

    8'h3a: return 8'h80;

    8'h3b: return 8'he2;

    8'h3c: return 8'heb;

    8'h3d: return 8'h27;

    8'h3e: return 8'hb2;

    8'h3f: return 8'h75;

    8'h40: return 8'h09;

    8'h41: return 8'h83;

    8'h42: return 8'h2c;

    8'h43: return 8'h1a;

    8'h44: return 8'h1b;

    8'h45: return 8'h6e;8'h46: return 8'h5a;

    8'h47: return 8'ha0;

    8'h48: return 8'h52;

    8'h49: return 8'h3b;

    8'h4a: return 8'hd6;

    8'h4b: return 8'hb3;

    8'h4c: return 8'h29;

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    66/119

    54

    8'h4d: return 8'he3;

    8'h4e: return 8'h2f;

    8'h4f: return 8'h84;

    8'h50: return 8'h53;

    8'h51: return 8'hd1;

    8'h52: return 8'h00;

    8'h53: return 8'hed;

    8'h54: return 8'h20;

    8'h55: return 8'hfc;

    8'h56: return 8'hb1;

    8'h57: return 8'h5b;

    8'h58: return 8'h6a;

    8'h59: return 8'hcb;

    8'h5a: return 8'hbe;

    8'h5b: return 8'h39;

    8'h5c: return 8'h4a;

    8'h5d: return 8'h4c;

    8'h5e: return 8'h58;8'h5f: return 8'hcf;

    8'h60: return 8'hd0;

    8'h61: return 8'hef;

    8'h62: return 8'haa;

    8'h63: return 8'hfb;

    8'h64: return 8'h43;

    8'h65: return 8'h4d;

    8'h66: return 8'h33;

    8'h67: return 8'h85;

    8'h68: return 8'h45;

    8'h69: return 8'hf9;

    8'h6a: return 8'h02;

    8'h6b: return 8'h7f;

    8'h6c: return 8'h50;

    8'h6d: return 8'h3c;

    8'h6e: return 8'h9f;

    8'h6f: return 8'ha8;

    8'h70: return 8'h51;

    8'h71: return 8'ha3;

    8'h72: return 8'h40;

    8'h73: return 8'h8f;

    8'h74: return 8'h92;

    8'h75: return 8'h9d;

    8'h76: return 8'h38;

    8'h77: return 8'hf5;

    8'h78: return 8'hbc;

    8'h79: return 8'hb6;8'h7a: return 8'hda;

    8'h7b: return 8'h21;

    8'h7c: return 8'h10;

    8'h7d: return 8'hff;

    8'h7e: return 8'hf3;

    8'h7f: return 8'hd2;

    8'h80: return 8'hcd;

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    67/119

    55

    8'h81: return 8'h0c;

    8'h82: return 8'h13;

    8'h83: return 8'hec;

    8'h84: return 8'h5f;

    8'h85: return 8'h97;

    8'h86: return 8'h44;

    8'h87: return 8'h17;

    8'h88: return 8'hc4;

    8'h89: return 8'ha7;

    8'h8a: return 8'h7e;

    8'h8b: return 8'h3d;

    8'h8c: return 8'h64;

    8'h8d: return 8'h5d;

    8'h8e: return 8'h19;

    8'h8f: return 8'h73;

    8'h90: return 8'h60;

    8'h91: return 8'h81;

    8'h92: return 8'h4f;8'h93: return 8'hdc;

    8'h94: return 8'h22;

    8'h95: return 8'h2a;

    8'h96: return 8'h90;

    8'h97: return 8'h88;

    8'h98: return 8'h46;

    8'h99: return 8'hee;

    8'h9a: return 8'hb8;

    8'h9b: return 8'h14;

    8'h9c: return 8'hde;

    8'h9d: return 8'h5e;

    8'h9e: return 8'h0b;

    8'h9f: return 8'hdb;

    8'ha0: return 8'he0;

    8'ha1: return 8'h32;

    8'ha2: return 8'h3a;

    8'ha3: return 8'h0a;

    8'ha4: return 8'h49;

    8'ha5: return 8'h06;

    8'ha6: return 8'h24;

    8'ha7: return 8'h5c;

    8'ha8: return 8'hc2;

    8'ha9: return 8'hd3;

    8'haa: return 8'hac;

    8'hab: return 8'h62;

    8'hac: return 8'h91;

    8'had: return 8'h95;8'hae: return 8'he4;

    8'haf: return 8'h79;

    8'hb0: return 8'he7;

    8'hb1: return 8'hc8;

    8'hb2: return 8'h37;

    8'hb3: return 8'h6d;

    8'hb4: return 8'h8d;

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    68/119

    56

    8'hb5: return 8'hd5;

    8'hb6: return 8'h4e;

    8'hb7: return 8'ha9;

    8'hb8: return 8'h6c;

    8'hb9: return 8'h56;

    8'hba: return 8'hf4;

    8'hbb: return 8'hea;

    8'hbc: return 8'h65;

    8'hbd: return 8'h7a;

    8'hbe: return 8'hae;

    8'hbf: return 8'h08;

    8'hc0: return 8'hba;

    8'hc1: return 8'h78;

    8'hc2: return 8'h25;

    8'hc3: return 8'h2e;

    8'hc4: return 8'h1c;

    8'hc5: return 8'ha6;

    8'hc6: return 8'hb4;8'hc7: return 8'hc6;

    8'hc8: return 8'he8;

    8'hc9: return 8'hdd;

    8'hca: return 8'h74;

    8'hcb: return 8'h1f;

    8'hcc: return 8'h4b;

    8'hcd: return 8'hbd;

    8'hce: return 8'h8b;

    8'hcf: return 8'h8a;

    8'hd0: return 8'h70;

    8'hd1: return 8'h3e;

    8'hd2: return 8'hb5;

    8'hd3: return 8'h66;

    8'hd4: return 8'h48;

    8'hd5: return 8'h03;

    8'hd6: return 8'hf6;

    8'hd7: return 8'h0e;

    8'hd8: return 8'h61;

    8'hd9: return 8'h35;

    8'hda: return 8'h57;

    8'hdb: return 8'hb9;

    8'hdc: return 8'h86;

    8'hdd: return 8'hc1;

    8'hde: return 8'h1d;

    8'hdf: return 8'h9e;

    8'he0: return 8'he1;

    8'he1: return 8'hf8;8'he2: return 8'h98;

    8'he3: return 8'h11;

    8'he4: return 8'h69;

    8'he5: return 8'hd9;

    8'he6: return 8'h8e;

    8'he7: return 8'h94;

    8'he8: return 8'h9b;

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    69/119

    57

    8'he9: return 8'h1e;

    8'hea: return 8'h87;

    8'heb: return 8'he9;

    8'hec: return 8'hce;

    8'hed: return 8'h55;

    8'hee: return 8'h28;

    8'hef: return 8'hdf;

    8'hf0: return 8'h8c;

    8'hf1: return 8'ha1;

    8'hf2: return 8'h89;

    8'hf3: return 8'h0d;

    8'hf4: return 8'hbf;

    8'hf5: return 8'he6;

    8'hf6: return 8'h42;

    8'hf7: return 8'h68;

    8'hf8: return 8'h41;

    8'hf9: return 8'h99;

    8'hfa: return 8'h2d;8'hfb: return 8'h0f;

    8'hfc: return 8'hb0;

    8'hfd: return 8'h54;

    8'hfe: return 8'hbb;

    8'hff: return 8'h16;

    endcase

    endfunction

    endpackage

    `endif

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    70/119

    58

    AES128_cipher_top.sv

    interface AES128_interface(input bit clk);logic ld, rst ;

    logic [127:0] key, text_in ;

    logic done;

    logic [127:0] text_out;

    clocking cb @(posedge clk);

    output ld ;

    output key;

    output text_in;

    input done;

    input text_out;

    endclocking

    modport dut(

    input clk,

    input rst,

    input ld ,

    input key,

    input text_in,

    output done,

    output text_out);

    modport tb(

    input clk,

    output rst,

    clocking cb);

    endinterface

    ****************************************************

    module AES128_cipher_top(AES128_interface.dut intf);

    import AES128_DUT_package::*;

    aes_rounds_t cs, ns;

    logic [127:0] plain_text;

    wire [31:0] w[4];

    logic [7:0] sa [4][4];

    logic [7:0] sa_next[4][4];

    wire [7:0] sa_sub[4][4];

    wire [7:0] sa_sr[4][4];

    wire [7:0] sa_mc[4][4];

    int i,j,a,b;

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    71/119

    59

    AES128_key_expand u0(

    .clk( intf.clk ),

    .rst( intf.rst ),

    .kld( intf.ld ),

    .key( intf.key ),

    .w0( w[0] ),

    .w1( w[1] ),

    .w2( w[2] ),

    .w3( w[3] ));

    assign sa_sub[0][0] = sbox( sa[0][0] ) ;

    assign sa_sub[0][1] = sbox( sa[0][1] ) ;

    assign sa_sub[0][2] = sbox( sa[0][2] ) ;

    assign sa_sub[0][3] = sbox( sa[0][3] ) ;

    assign sa_sub[1][0] = sbox( sa[1][0] ) ;

    assign sa_sub[1][1] = sbox( sa[1][1] ) ;

    assign sa_sub[1][2] = sbox( sa[1][2] ) ;

    assign sa_sub[1][3] = sbox( sa[1][3] ) ;assign sa_sub[2][0] = sbox( sa[2][0] ) ;

    assign sa_sub[2][1] = sbox( sa[2][1] ) ;

    assign sa_sub[2][2] = sbox( sa[2][2] ) ;

    assign sa_sub[2][3] = sbox( sa[2][3] ) ;

    assign sa_sub[3][0] = sbox( sa[3][0] ) ;

    assign sa_sub[3][1] = sbox( sa[3][1] ) ;

    assign sa_sub[3][2] = sbox( sa[3][2] ) ;

    assign sa_sub[3][3] = sbox( sa[3][3] ) ;

    assign sa_sr[0][0] = sa_sub[0][0];

    assign sa_sr[0][1] = sa_sub[0][1];

    assign sa_sr[0][2] = sa_sub[0][2];

    assign sa_sr[0][3] = sa_sub[0][3];

    assign sa_sr[1][0] = sa_sub[1][1];

    assign sa_sr[1][1] = sa_sub[1][2];

    assign sa_sr[1][2] = sa_sub[1][3];

    assign sa_sr[1][3] = sa_sub[1][0];

    assign sa_sr[2][0] = sa_sub[2][2];

    assign sa_sr[2][1] = sa_sub[2][3];

    assign sa_sr[2][2] = sa_sub[2][0];

    assign sa_sr[2][3] = sa_sub[2][1];

    assign sa_sr[3][0] = sa_sub[3][3];

    assign sa_sr[3][1] = sa_sub[3][0];

    assign sa_sr[3][2] = sa_sub[3][1];

    assign sa_sr[3][3] = sa_sub[3][2];

    assign {sa_mc[0][0], sa_mc[1][0], sa_mc[2][0], sa_mc[3][0]} =mix_col(sa_sr[0][0],sa_sr[1][0],sa_sr[2][0],sa_sr[3][0]);

    assign {sa_mc[0][1], sa_mc[1][1], sa_mc[2][1], sa_mc[3][1]} =

    mix_col(sa_sr[0][1],sa_sr[1][1],sa_sr[2][1],sa_sr[3][1]);

    assign {sa_mc[0][2], sa_mc[1][2], sa_mc[2][2], sa_mc[3][2]} =

    mix_col(sa_sr[0][2],sa_sr[1][2],sa_sr[2][2],sa_sr[3][2]);

    assign {sa_mc[0][3], sa_mc[1][3], sa_mc[2][3], sa_mc[3][3]} =

    mix_col(sa_sr[0][3],sa_sr[1][3],sa_sr[2][3],sa_sr[3][3]);

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    72/119

    60

    always_ff @(posedge intf.clk or negedge intf.rst)

    if(!intf.rst) begin

    cs

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    73/119

    61

    end

    r1,r2,r3,r4,r5,r6,r7,r8,r9:begin

    intf.text_out=128'bz;

    intf.done=1'b0;

    sa_next[0][0] = sa_mc[0][0] ^ w[0][31:24];

    sa_next[0][1] = sa_mc[0][1] ^ w[1][31:24];

    sa_next[0][2] = sa_mc[0][2] ^ w[2][31:24];

    sa_next[0][3] = sa_mc[0][3] ^ w[3][31:24];

    sa_next[1][0] = sa_mc[1][0] ^ w[0][23:16];

    sa_next[1][1] = sa_mc[1][1] ^ w[1][23:16];

    sa_next[1][2] = sa_mc[1][2] ^ w[2][23:16];

    sa_next[1][3] = sa_mc[1][3] ^ w[3][23:16];

    sa_next[2][0] = sa_mc[2][0] ^ w[0][15:08];

    sa_next[2][1] = sa_mc[2][1] ^ w[1][15:08];

    sa_next[2][2] = sa_mc[2][2] ^ w[2][15:08];

    sa_next[2][3] = sa_mc[2][3] ^ w[3][15:08];

    sa_next[3][0] = sa_mc[3][0] ^ w[0][07:00];sa_next[3][1] = sa_mc[3][1] ^ w[1][07:00];

    sa_next[3][2] = sa_mc[3][2] ^ w[2][07:00];

    sa_next[3][3] = sa_mc[3][3] ^ w[3][07:00];

    end

    r10:begin

    sa_next[0][0] = sa_sr[0][0] ^ w[0][31:24];

    sa_next[0][1] = sa_sr[0][1] ^ w[1][31:24];

    sa_next[0][2] = sa_sr[0][2] ^ w[2][31:24];

    sa_next[0][3] = sa_sr[0][3] ^ w[3][31:24];

    sa_next[1][0] = sa_sr[1][0] ^ w[0][23:16];

    sa_next[1][1] = sa_sr[1][1] ^ w[1][23:16];

    sa_next[1][2] = sa_sr[1][2] ^ w[2][23:16];

    sa_next[1][3] = sa_sr[1][3] ^ w[3][23:16];

    sa_next[2][0] = sa_sr[2][0] ^ w[0][15:08];

    sa_next[2][1] = sa_sr[2][1] ^ w[1][15:08];

    sa_next[2][2] = sa_sr[2][2] ^ w[2][15:08];

    sa_next[2][3] = sa_sr[2][3] ^ w[3][15:08];

    sa_next[3][0] = sa_sr[3][0] ^ w[0][07:00];

    sa_next[3][1] = sa_sr[3][1] ^ w[1][07:00];

    sa_next[3][2] = sa_sr[3][2] ^ w[2][07:00];

    sa_next[3][3] = sa_sr[3][3] ^ w[3][07:00];

    intf.text_out = {sa_next[0][0], sa_next[1][0],

    Sa_next[2][0], sa_next[3][0], sa_next[0][1], sa_next[1][1],

    sa_next[2][1], sa_next[3][1], sa_next[0][2], sa_next[1][2],

    sa_next[2][2], sa_next[3][2], sa_next[0][3], sa_next[1][3],

    sa_next[2][3], sa_next[3][3]};intf.done=1'b1;

    end

    default:begin

    intf.text_out=128'bz;

    intf.done=1'b0;

    for(a=0;a

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    74/119

    62

    sa_next[a][b]=0;

    end

    endcase

    end

    always_comb begin

    case (cs)

    reset: ns = wait_for_load;

    wait_for_load: begin

    if( intf.ld ) ns=r0;

    else ns=wait_for_load;

    end

    r0: ns=r1;

    r1: ns=r2;

    r2: ns=r3;

    r3: ns=r4;

    r4: ns=r5;

    r5: ns=r6;r6: ns=r7;

    r7: ns=r8;

    r8: ns=r9;

    r9: ns=r10;

    r10: ns=wait_for_load;

    default:ns=wait_for_load;

    endcase

    end

    endmodule

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    75/119

    63

    AES128_key_expand.sv

    module AES128_key_expand(clk, rst, kld, key, w0, w1, w2, w3);input clk, rst;

    input kld;

    input [127:0] key;

    output [31:0] w0, w1, w2, w3;

    logic [31:0] w0, w1, w2, w3;

    logic [31:0] w0_next,w1_next,w2_next,w3_next;

    logic [31:0] subword;

    wire [31:0] rcon;

    import AES128_DUT_package::*;

    aes_rounds_t cs, ns;

    AES128_rcon rcon0( .clk(clk), .rst(rst), .kld(kld), .out(rcon));

    always_ff @(posedge clk or negedge rst)

    if(!rst) begin

    cs

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    76/119

    64

    subword[31:24] = sbox(w3[23:16]);

    subword[23:16] = sbox(w3[15:08]);

    subword[15:08] = sbox(w3[07:00]);

    subword[07:00] = sbox(w3[31:24]);

    w0_next = w0^subword^rcon;

    w1_next = w1^w0^subword^rcon;

    w2_next = w2^w1^w0^subword^rcon;

    w3_next = w3^w2^w1^w0^subword^rcon;

    end

    default:begin

    subword = 32'h0;

    w0_next = 32'h0;

    w1_next = 32'h0;

    w2_next = 32'h0;

    w3_next = 32'h0;

    end

    endcase

    end

    always_comb begin

    case(cs)

    reset:begin

    ns = r0;

    end

    r0:begin

    ns = kld ? r1 : r0 ;

    end

    r1:begin

    ns = r2;

    end

    r2:begin

    ns = r3;

    end

    r3:begin

    ns = r4;

    end

    r4:begin

    ns = r5;

    end

    r5:begin

    ns = r6;

    end

    r6:begin

    ns = r7;

    endr7:begin

    ns = r8;

    end

    r8:begin

    ns = r9;

    end

    r9:begin

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    77/119

    65

    ns = r10;

    end

    r10:begin

    ns = r0;

    end

    default:begin

    ns = r0;

    end

    endcase

    end

    endmodule

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    78/119

    66

    AES128_rcon.sv

    module AES128_rcon(clk, rst, kld, out);input clk, rst;

    input kld;

    output [31:0] out;

    logic [31:0] out;

    import AES128_DUT_package::*;

    aes_rounds_t cs, ns;

    always_ff @(posedge clk or negedge rst)

    if(!rst) begin

    cs

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    79/119

    67

    ns = r7;

    end

    r7:begin

    out = 32'h40_00_00_00;

    ns = r8;

    end

    r8:begin

    out = 32'h80_00_00_00;

    ns = r9;

    end

    r9:begin

    out = 32'h1b_00_00_00;

    ns = r10;

    end

    r10:begin

    out = 32'h36_00_00_00;

    ns = r0;

    enddefault:begin

    out = 32'h00_00_00_00;

    ns = r0;

    end

    endcase

    end

    endmodule

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    80/119

    68

    APPENDIX B

    AES128 Testbench Source Files

    AES128_Testbench_Package.sv

    `ifndef AES128_testbench_package_defined

    `define AES128_testbench_package_defined

    package AES128_testbench_package;

    logic [7:0] state [4][4];

    logic [7:0] key [16];

    logic [7:0] RoundKey[176];

    logic [7:0] Rcon[11] = '{8'h0, 8'h01, 8'h02, 8'h04, 8'h08, 8'h10,

    8'h20, 8'h40, 8'h80, 8'h1b, 8'h36};

    /*******************************************************/

    function automatic [7:0] times2 (input [7:0] b);

    return {b[6:0],1'b0}^(8'h1b&{8{b[7]}});

    endfunction

    /*******************************************************/

    function logic [7:0] getSBoxValue(input logic [7:0] num);

    logic [7:0] sbox[256] = '{

    //0 1 2 3 4 5 6 7 8

    9 A B C D E F

    8'h63, 8'h7c, 8'h77, 8'h7b, 8'hf2, 8'h6b, 8'h6f, 8'hc5, 8'h30,8'h01, 8'h67, 8'h2b, 8'hfe, 8'hd7, 8'hab, 8'h76, //0

    8'hca, 8'h82, 8'hc9, 8'h7d, 8'hfa, 8'h59, 8'h47, 8'hf0, 8'had,

    8'hd4, 8'ha2, 8'haf, 8'h9c, 8'ha4, 8'h72, 8'hc0, //1

    8'hb7, 8'hfd, 8'h93, 8'h26, 8'h36, 8'h3f, 8'hf7, 8'hcc, 8'h34,

    8'ha5, 8'he5, 8'hf1, 8'h71, 8'hd8, 8'h31, 8'h15, //2

    8'h04, 8'hc7, 8'h23, 8'hc3, 8'h18, 8'h96, 8'h05, 8'h9a, 8'h07,

    8'h12, 8'h80, 8'he2, 8'heb, 8'h27, 8'hb2, 8'h75, //3

    8'h09, 8'h83, 8'h2c, 8'h1a, 8'h1b, 8'h6e, 8'h5a, 8'ha0, 8'h52,

    8'h3b, 8'hd6, 8'hb3, 8'h29, 8'he3, 8'h2f, 8'h84, //4

    8'h53, 8'hd1, 8'h00, 8'hed, 8'h20, 8'hfc, 8'hb1, 8'h5b, 8'h6a,

    8'hcb, 8'hbe, 8'h39, 8'h4a, 8'h4c, 8'h58, 8'hcf, //5

    8'hd0, 8'hef, 8'haa, 8'hfb, 8'h43, 8'h4d, 8'h33, 8'h85, 8'h45,

    8'hf9, 8'h02, 8'h7f, 8'h50, 8'h3c, 8'h9f, 8'ha8, //6

    8'h51, 8'ha3, 8'h40, 8'h8f, 8'h92, 8'h9d, 8'h38, 8'hf5, 8'hbc,

    8'hb6, 8'hda, 8'h21, 8'h10, 8'hff, 8'hf3, 8'hd2, //7

    8'hcd, 8'h0c, 8'h13, 8'hec, 8'h5f, 8'h97, 8'h44, 8'h17, 8'hc4,

    8'ha7, 8'h7e, 8'h3d, 8'h64, 8'h5d, 8'h19, 8'h73, //8

    8'h60, 8'h81, 8'h4f, 8'hdc, 8'h22, 8'h2a, 8'h90, 8'h88, 8'h46,

    8'hee, 8'hb8, 8'h14, 8'hde, 8'h5e, 8'h0b, 8'hdb, //9

    8'he0, 8'h32, 8'h3a, 8'h0a, 8'h49, 8'h06, 8'h24, 8'h5c, 8'hc2,

    8'hd3, 8'hac, 8'h62, 8'h91, 8'h95, 8'he4, 8'h79, //A

  • 8/13/2019 Bahram Hakhamaneshi Project Report

    81/119

    69

    8'he7, 8'hc8, 8'h37, 8'h6d, 8'h8d, 8'hd5, 8'h4e, 8'ha9, 8'h6c,

    8'h56, 8'hf4, 8'hea, 8'h65, 8'h7a, 8'hae, 8'h08, //B

    8'hba, 8'h78, 8'h25, 8'h2e, 8'h1c, 8'ha6, 8'hb4, 8'hc6, 8'he8,

    8'hdd, 8'h74, 8'h1f, 8'h4b, 8'hbd, 8'h8b, 8'h8a, //C

    8'h70, 8'h3e, 8'hb5, 8'h66, 8'h48, 8'h03, 8'hf6, 8'h0e, 8'h61,

    8'h35, 8'h57, 8'hb9, 8'h86, 8'hc1, 8'h1d, 8'h9e, //D

    8'he1, 8'hf8, 8'h98, 8'h11, 8'h69, 8'hd9, 8'h8e, 8'h94, 8'h9b,

    8'h1e, 8'h87, 8'he9, 8'hce, 8'h55, 8'h28, 8'hdf, //E

    8'h8c, 8'ha1, 8'h89, 8'h0d, 8'hbf, 8'he6, 8'h42, 8'h68, 8'h41,

    8'h99, 8'h2d, 8'h0f, 8'hb0, 8'h54, 8'hbb, 8'h16 //F

    };

    return sbox[num];

    endfunction

    /*****************************************************/

    function aes128_ShiftRows();

    logic [7:0] temp;

    // Rotate left the second row by 1 columnstemp=state[1][0];

    state[1][0]=state[1][1];

    state[1][1]=state[1][2];

    state[1][2]=state[1][3];

    state[1][3]=temp;

    // Rotate left the third row by 2 columns

    temp=state[2][0];

    state[2][0]=state[2][2];

    state[2][2]=temp;

    temp=state[2][1];

    state[2][1]=state[2][3];

    state[2][3]=temp;

    // Rotate left the fourth row by 3 columns

    temp=state[3][0];

    state[3][0]=state[3][3];

    state[3][3]=state[3][2];

    state[3][2]=state[3][1];

    state[3][1]=temp;

    endfunction

    /*********************************************