6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed...

83
6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in Electronics and Communications Engineering FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2016

Transcript of 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed...

Page 1: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS

By

Safaa Ahmed Mohammed Abdelfattah

A Thesis Submitted to theFaculty of Engineering at Cairo University

in Partial Fulfillment of theRequirements for the Degree of

MASTER OF SCIENCEin

Electronics and Communications Engineering

FACULTY OF ENGINEERING, CAIRO UNIVERSITYGIZA, EGYPT

2016

Page 2: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS

By

Safaa Ahmed Mohammed Abdelfattah

A Thesis Submitted to theFaculty of Engineering at Cairo University

in Partial Fulfillment of theRequirements for the Degree of

MASTER OF SCIENCEin

Electronics and Communications Engineering

Under the Supervision of

Prof. Serag E. D. Habib Dr. Sameh A. IbrahimProfessor of Electronics Assistant Professor

Electronics and Communications Engineering Department Electronics and Communications Engineering Department

Faculty of Engineering, Cairo University Faculty of Engineering, Ain Shams University

FACULTY OF ENGINEERING, CAIRO UNIVERSITYGIZA, EGYPT

2016

Page 3: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS

By

Safaa Ahmed Mohammed Abdelfattah

A Thesis Submitted to theFaculty of Engineering at Cairo University

in Partial Fulfillment of theRequirements for the Degree of

MASTER OF SCIENCEin

Electronics and Communications Engineering

Approved by the Examining Committee:

Prof. Serag E. D. Habib, Thesis Main Advisor

Prof. Mohamed Riad Elghoneimy, Internal Examiner

Prof. Mohamed A. Dessouky, External ExaminerElectronics and Communications Engineering Department,Faculty of Engineering, Ain Shams University

FACULTY OF ENGINEERING, CAIRO UNIVERSITYGIZA, EGYPT

2016

Page 4: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Engineer’s Name: Safaa Ahmed Mohammed AbdelfattahDate of Birth: 22/11/1989Nationality: EgyptianE-mail: [email protected]: +201285436641Address: 13 Abd El-latif street, El salam City, CairoRegistration Date: 1/10/2012Awarding Date: -/-/2016Degree: Master of ScienceDepartment: Electronics and Communications Engineering

Supervisors:Prof. Serag E. D. HabibDr. Sameh A. Ibrahim(Electronics and Communications EngineeringDepartment, Faculty of Engineering,Ain Shams University)

Examiners:Prof. Mohamed A. Dessouky (External examiner)(Electronics and Communications EngineeringDepartment, Faculty of Engineering,Ain Shams University)Prof. Mohamed Riad Elghoneimy (Internal examiner)Prof. Serag E. D. Habib (Thesis main advisor)

Title of Thesis:

6-Gb/s Serial Link Transceiver For NoCs

Key Words:

SerDes; NoC; CUSPARC

Summary:The design of a 6-Gb/s serial link for CUSPARC NoC is presented. Theproposed SerDes consists of a serializer and a deserializer. The design targetsTSMC digital 65-nm CMOS technology and 1.2-V supply. The use of seriallinks reduces the interconnect area of the network on chip by 93.96% relativeto the design with parallel 32 bit data links. The traces between the coresachieved maximum tolerable clock skew between the Tx and the Rx up to ±36% of the clock period. The link consumes 6.9 mW power (1.15 pJ/bit).

Page 5: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

AcknowledgementsAll Praise to ALLAH for his many blessings on me and for completing this work.

I am really would like to thank my advisors

Prof. Dr. Serag E. D. Habib and Dr. Sameh A. Ibrahim

for their assistance, motivation and guidance through out my work.

I would express my gratitude to my family (especially my parents) for their love andencouragement to me since I was born.

I am very grateful for my dear fiancee’

Hussein Mohammed

for his support, encouragement, advice, patience, and help since we have met.

Thanks to Ahmed Reda for helping in the layout design.

i

Page 6: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Dedication

To my parents and my fiancee’ Hussein

ii

Page 7: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table of Contents

Acknowledgements i

Dedication ii

Table of Contents iii

List of Tables v

List of Figures vi

List of Symbols and Abbreviations viii

Abstract x

1 INTRODUCTION 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 BACKGROUND 32.1 Parallel and Serial Links . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 History of Serial Links . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Visual Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.1.1 Smoke Signal . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.2 Wire Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.2.1 Current Loop . . . . . . . . . . . . . . . . . . . . . . . 42.2.2.2 Morse code . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Introduction to high speed serial links . . . . . . . . . . . . . . . . . . . 52.3.1 Types of SERDES . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.1.1 Parallel Clock SerDes . . . . . . . . . . . . . . . . . . 62.3.1.2 Embedded Clock SerDes . . . . . . . . . . . . . . . . . 62.3.1.3 8b/10b SerDes . . . . . . . . . . . . . . . . . . . . . . 72.3.1.4 Bit Interleaving SerDes . . . . . . . . . . . . . . . . . 72.3.1.5 Shift Register SerDes . . . . . . . . . . . . . . . . . . 7

2.4 Introduction to CUSPARC . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.1 CUSPARC Architecture . . . . . . . . . . . . . . . . . . . . . . 8

2.4.1.1 Processor Integer Unit (IU) . . . . . . . . . . . . . . . 92.4.1.2 Instruction Cache (I-Cache) . . . . . . . . . . . . . . . 92.4.1.3 Data Cache (D-Cache) . . . . . . . . . . . . . . . . . . 102.4.1.4 Cache Controller . . . . . . . . . . . . . . . . . . . . . 102.4.1.5 Main processor bus . . . . . . . . . . . . . . . . . . . . 102.4.1.6 Memory Controller . . . . . . . . . . . . . . . . . . . . 102.4.1.7 Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.1.8 Interrupt Controller . . . . . . . . . . . . . . . . . . . . 11

iii

Page 8: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

2.4.1.9 Peripherals . . . . . . . . . . . . . . . . . . . . . . . . 112.5 Introduction to NoCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5.1 Problems of parallel Buses in NoC . . . . . . . . . . . . . . . . . 122.6 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 PROPOSED SERDES DESIGN 243.1 System Description (Design A) . . . . . . . . . . . . . . . . . . . . . . . 243.2 Transmitter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 8b/10b Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 10:1 MUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.3 Design of Selection Circuit . . . . . . . . . . . . . . . . . . . . . 283.2.4 Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.5 Frequency Divider . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 Receiver Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3.1 Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3.1.1 SA and CML flip flops . . . . . . . . . . . . . . . . . . 323.3.2 1:10 DEMUX and Retimers . . . . . . . . . . . . . . . . . . . . 333.3.3 Design of Selection Circuit . . . . . . . . . . . . . . . . . . . . . 353.3.4 10b/8b Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Platform Layout of Design A . . . . . . . . . . . . . . . . . . . . . . . . 363.4.1 Transmitter Layout . . . . . . . . . . . . . . . . . . . . . . . . . 363.4.2 Receiver Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.5 Design B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.5.1 Synchronization Process . . . . . . . . . . . . . . . . . . . . . . 373.5.2 Power Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 SIMULATION RESULTS 414.1 Circuit Level Simulation Results of Design B . . . . . . . . . . . . . . . 41

4.1.1 Simulation Results of the Transmitter . . . . . . . . . . . . . . . 414.1.2 Simulation Results of the Receiver . . . . . . . . . . . . . . . . . 46

4.2 Post Layout Results of Design A . . . . . . . . . . . . . . . . . . . . . . 524.3 CUSPARC NoC Clock Skew results of Design B . . . . . . . . . . . . . 544.4 Power Distribution of Design B . . . . . . . . . . . . . . . . . . . . . . . 574.5 Comparison to Parallel Buses NoC Architecture . . . . . . . . . . . . . . 584.6 Comparison With Related Works . . . . . . . . . . . . . . . . . . . . . . 59

5 CONCLUSION AND FUTURE WORK 615.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

References 62

iv

Page 9: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

List of Tables

2.1 Instruction and Data Caches parameters . . . . . . . . . . . . . . . . . . 102.2 SerDes Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 WAFT and Basic SerDes Comparison . . . . . . . . . . . . . . . . . . . 142.4 Coplanar and Microstrip Waveguides Comparison . . . . . . . . . . . . . 152.5 Comparison between Area of 2 Gb/s serial link and its corresponding

parallel link Proposed In [14] . . . . . . . . . . . . . . . . . . . . . . . . 182.6 Comparison between Area of 8 Gb/s serial link and its corresponding

parallel link Proposed In [14] . . . . . . . . . . . . . . . . . . . . . . . . 192.7 Area Optimziation for the Routers proposed in [15] . . . . . . . . . . . . 212.8 Performance of 64-bit 8-mm Serial Links in [16] . . . . . . . . . . . . . . 222.9 Comparison Between Different SerDes Architectures . . . . . . . . . . . 23

3.1 Simulation results of the SA sampler . . . . . . . . . . . . . . . . . . . . 33

4.1 Data Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2 Input and encoded data of the encoder . . . . . . . . . . . . . . . . . . . 424.3 Parallel Encoded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.4 Clock Skew Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5 Clock Skew results for Typical Typical Corner . . . . . . . . . . . . . . . 574.6 Clock Skew results for Slow Slow Corner . . . . . . . . . . . . . . . . . 574.7 Clock Skew results for Fast Fast Corner . . . . . . . . . . . . . . . . . . 574.8 Power Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.9 SerDes Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

v

Page 10: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

List of Figures

2.1 Parallel Transmission Example . . . . . . . . . . . . . . . . . . . . . . . 32.2 Serial Transmission Example . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Serializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Deserializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.5 Simple Parallel Clock SerDes Example . . . . . . . . . . . . . . . . . . . 62.6 Simple Interleaving SerDes Example . . . . . . . . . . . . . . . . . . . . 72.7 Conventional SerDes [8] . . . . . . . . . . . . . . . . . . . . . . . . . . 82.8 CUSPARC Architecture [1] . . . . . . . . . . . . . . . . . . . . . . . . . 92.9 WAFT SerDes [8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.10 Prototype SerDes for on Chip signaling [13] . . . . . . . . . . . . . . . . 152.11 The Transmitter Proposed In [13] . . . . . . . . . . . . . . . . . . . . . . 162.12 The Receiver Proposed In [13] . . . . . . . . . . . . . . . . . . . . . . . 172.13 2-Gb/s SerDes Proposed In [14] . . . . . . . . . . . . . . . . . . . . . . 182.14 8-Gb/s Quasi Serial Link [14] . . . . . . . . . . . . . . . . . . . . . . . . 192.15 SerDes With Serialization Ratio n/p [15] . . . . . . . . . . . . . . . . . . 202.16 Proposed Serial Link in [16] . . . . . . . . . . . . . . . . . . . . . . . . 212.17 WP CML Serializer in [17] . . . . . . . . . . . . . . . . . . . . . . . . . 222.18 WP CML Deserializer in [17] . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1 The Whole 6-Gb/s Serial Link Platform . . . . . . . . . . . . . . . . . . 243.2 Coding Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 10:1 Multiplexer Block Diagram . . . . . . . . . . . . . . . . . . . . . . 263.4 The Transmitter 5:1 Multiplexer Block Diagram . . . . . . . . . . . . . . 263.5 CMOS D Latch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.6 Transmission Gate 2:1 MUX . . . . . . . . . . . . . . . . . . . . . . . . 273.7 Waveforms of the Selection Signals Used in the even 5:1 Multiplexer . . . 283.8 Selections Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.9 JK FF Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.10 TSPC D Flip Flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.11 Driver Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.12 Driver with TL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.13 Frequency Divider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.14 StrongARM Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.15 Optimized RS Latch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.16 The Receiver 1:10 Demultiplexer Block Diagram . . . . . . . . . . . . . 343.17 D Flip Flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.18 The Receiver Selection Circuit . . . . . . . . . . . . . . . . . . . . . . . 353.19 Transmitter Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.20 Receiver Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.21 Synchronization Process . . . . . . . . . . . . . . . . . . . . . . . . . . 383.22 Timing Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.23 Gate based 2:1 MUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.24 New 5:1 MUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

vi

Page 11: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

4.1 Block Diagram with Data Rates and Eye Diagrams . . . . . . . . . . . . 414.3 D0 Encoder Output Eye Diagram . . . . . . . . . . . . . . . . . . . . . . 424.2 Encoder Data (a) Encoder Inputs (b) Encoder Outputs . . . . . . . . . . . 434.4 Even Output from the 5:1 MUX . . . . . . . . . . . . . . . . . . . . . . 434.5 Odd Output from the 5:1 MUX . . . . . . . . . . . . . . . . . . . . . . . 444.6 Final Output from the 10:1 MUX . . . . . . . . . . . . . . . . . . . . . . 444.7 MUX Output Eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . 454.8 Output after the Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.9 Driver Eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.10 Output after the Channel . . . . . . . . . . . . . . . . . . . . . . . . . . 464.11 Channel output Eye diagram . . . . . . . . . . . . . . . . . . . . . . . . 474.12 Output after the Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 474.13 Sampler Output Eye diagram . . . . . . . . . . . . . . . . . . . . . . . . 484.14 Output after the DEMUX . . . . . . . . . . . . . . . . . . . . . . . . . . 484.15 Eye Diagram of D0 Output from DEMUX . . . . . . . . . . . . . . . . . 494.16 Output after the Regenerative Latches . . . . . . . . . . . . . . . . . . . 494.17 Eye Diagram of D0 Output from REG. Latches . . . . . . . . . . . . . . 504.18 Output after the Final FF . . . . . . . . . . . . . . . . . . . . . . . . . . 504.19 Eye Diagram of D0 Output from FF . . . . . . . . . . . . . . . . . . . . 514.20 Decoder Output (a) TT Corner (b) FF Corner (c) SS Corner . . . . . . . . 514.21 XOR Result (a) TT Corner (b) FF Corner (c) SS Corner . . . . . . . . . . 524.22 Encoded Input Parallel Data (VIN0:VIN9) and Serial Transmitted Data . . 534.23 Transmitted Data Eye Diagram . . . . . . . . . . . . . . . . . . . . . . . 544.24 Received Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.25 CUSPARC Many-Core Mesh with Clock Tree Distribution [1] . . . . . . 554.26 CUSPARC Clock Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.27 Area reduction [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

vii

Page 12: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

List of Symbols and Abbreviations

2D Two-Dimensional

3D Three-Dimensional

AMS Analog Mixed Signal

ASIC Application-Specific Integrated Circuit

CDR Clock and Data Recovery

CML Current Mode Logic

CMOS Complementary Metal Oxide Semiconductor

CPU Central Processing Unit

CUSPARC Cairo University SPARC processor

DC Direct Current

D-Cache Data Cache

DE Delay Element

DEMUX Demultiplexer

DMA Direct Memory Access

FF Flip Flop

FF Corner Fast Fast Corner

FIFO First In First Out

FSM Finite State Machine

I2C Inter-Integrated Circuit

IBM International Business Machines

I-Cache Instruction Cache

IR Instruction Register

ISI Inter-Symbol Interference

ITRS International Technology Roadmap for Semiconductors

IU Integer Unit

LSB Least Significant Bit

viii

Page 13: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

MSB Most Significant Bit

MUX Multiplexer

NoC Network on Chip

PCI Express Peripheral Component Interconnect Express

PLL Phase Locked Loop

R Router

RAM Random Access Memory

RD Running Disparity

RS Reset Set

Rx Receiver

SA Strong Arm

SAFF Strong Arm Flip Flop

SATA Serial ATA

SerDes Serializer and Deserializer

SONET Synchronous Optical Networking

SPI Serial Peripheral Interface

SRAM Static Random Access Memory

SS Corner Slow Slow Corner

TL Transmission Line

TSMC Taiwan Semiconductor Manufacturing Corporation

TSPC True Single-Phase-Clocked

TSV Through Silicon Via

TT Corner Typical Typical Corner

Tx Transmitter

UART Universal Asynchronous Receiver/Transmitter

USB Universal Serial Bus

UTOPIA Universal Test and Operations Physical Interface for Atm

VLSI Very Large-Scale Integration

WAFT Wave-Front Train

WP Wave Pipelined

ix

Page 14: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

AbstractCompared to parallel data transmission, serial transmission has the advantage of

smaller number of ports/pins, high immunity to interference, lower power consumptionand smaller area. Concurrent with the trend to replace parallel buses with fast high-speedserial links, there is an ongoing trend to replace the multi-core memory-shared processorwith a loosely-coupled, many-core processor Network on Chip (NoC).

This work introduces a fast serial link for many-core NoCs. The proposed serial linkconsists of a Serializer and a Deserializer (SerDes). The serializer contains an 8b/10bencoder, a 10:1 multiplexer (MUX) and a driver. The deserializer contains a sampler, a1:10 demultiplexer (DEMUX), regenerative latches and a 10b/8b decoder. The design ismodeled using a digital 65-nm CMOS technology and a 1.2-V supply using an AnalogMixed Signal (AMS) simulation tool. This SerDes design is integrated with the previouslydesigned 16-core NoC based on Cairo University SPARC processor (CUSPARC) arrangedin a 2D mesh architecture.

The SerDes works at 6 Gb/s. The use of serial links reduces the interconnect areaof the network on chip by 93.96% relative to the design with parallel 32 bit data links.The traces between the cores are modeled using metal layer number eight achieving highpercentage of allowable clock skew between the transmitter and the receiver. The designcan tolerate clock skew between the transmitter and receiver of up to ± 36% of clockperiod at TT corner. The transceiver consumes 6.9 mW power (1.15 pJ/bit).

x

Page 15: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Chapter 1: Introduction1.1 MotivationThe number of transistors on chip almost doubles every two years, a technology obser-vation famously known as Moore’s law. So the development in the VLSI fabricationenabled the fabrication of transistors smaller in size and faster in performance with highertransistor density on the same area. A faster transistor means higher operational clockfrequency and higher performance.

Throughout time, the operational frequency kept increasing with VLSI technologyscaling until it was hit by power limitations. To overcome this problem a new era camewhich is the Many-Core era. In this era, instead of having a single core processor withhigh frequency, designers used many cores on the same chip and each core operates withlow frequency. However they achieve higher overall performance than the single-coreprocessor and the power consumption is almost the same.

According to the International Technology Roadmap for Semiconductors (ITRS), asthe technology scales, the delay of the interconnect increases. The problem now is nolonger in the performance of the cores but in the communication between the cores. Hence,the overall performance will be affected.

Now the focus should be directed to the communication between the cores. Mostof the on-chip communications are carried out through parallel communication buses.Parallel communication sounds like the best solution for the first glance because eachbit has its own dedicated bus which is available all the time. However, the interconnectscaling problem makes the parallel communication not very efficient in terms of speed,area, and power consumption.

When it comes to a design with large number of bits, the routing of the parallel busesbecomes very complicated because of the increase in the number of lines. The increaseof power consumption and timing errors due to jitter, skew and crosstalk are the maindrawbacks of parallel communication.

With these drawbacks of parallel communication, designers now tend to use serialcommunication on-chip. Serial communication of course don’t have the drawbacks ofarea, power consumption and complexity in routing like parallel buses. In the case ofparallel buses with N buses each working at F Gbps, the serial bus must operate at N ×FGbps. Thus, serial buses have to operate at higher speeds.

Our work aims at replacing parallel buses NoC with serial buses NoC to use higherdata rates for transmission and make it smaller in area.

1

Page 16: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

1.2 Problem StatementThis work aims to design on-chip SerDes for Many-core NoCs to transmit data throughserial links with high speed. As parallel interconnects were predominant in transmittingdata between frameworks. However, because of their sensitivity to delay mismatches andinterference, high cost and area, nowadays serial buses are essential. By diminishing thenumber of pins and interconnects, the serial buses offer high speed-links that can copewith the incremental speed of the Central Processing Units (CPUs).

We aim to apply this design to CUSPARC-based Many-core NoC designed by Solimanetal [1] which employed parallel data transmission.

1.3 Thesis OutlineThe thesis is organized as follows:

• Chapter 2 provides a background about NoCs, the problems of parallel buses inNoC and a literature review on serial links.

• Chapter 3 presents the proposed 6-Gb/s SerDes design.

• Chapter 4 presents the simulation results of the proposed SerDes.

• Chapter 5 concludes this work.

2

Page 17: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Chapter 2: Background2.1 Parallel and Serial LinksSerial communication transmits bits sequentially through a channel. Parallel commu-nication transmits bits simultaneously through multiple channels with a synchronous clock.

The main difference between serial and parallel transportation is the number of trans-mitting channels. Parallel communication uses higher number of channels than serialcommunication. For example, if we have eight bits needed to be transmitted, in parallelcommunication it uses eight channels, but in serial communication it uses only one channelas shown in Figure 2.1 and Figure 2.2.

D0

D1

D2

D3

D4

D5

D6

D7

D0

D1

D2

D3

D4

D5

D6

D7

TX Rx0

0

1

0

0

1

1

1

Figure 2.1: Parallel Transmission Example

D7D6D5D4D3D2D1D0

0 0 1 0 0 1

TX RX

1 1

Figure 2.2: Serial Transmission Example

There are other differences between serial and parallel buses:

• Parallel buses are simpler than serial buses, so parallel ports are easier to be im-plemented. Serial buses require an interface like Universal Asynchronous Re-ceiver/Transmitter (UART) to send the data.

3

Page 18: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

• Parallel buses can’t use high frequency to send the data. All bits should arrive atthe same time at the receiver side, but this is difficult to achieve for high data ratesas the signal delay time is not equal for all the links. Hence the receiver must waituntil all the data arrive and this decreases the data rate and throughput.

• In parallel buses, the interference between the lines is significant. As a result, onlyshort buses are allowed.

• Integrated circuits are expensive if they have large number of pins, so to decreasethe number of pins, integrated circuits use serial links rather than parallel linksespecially if the speed is not an issue.

Parallel buses were used in the integrated circuits and memories and in other peripher-als, but in the modern computers serial links are dominant.

Serial transmission is usually used in long channels to overcome the problems of costand interference that exist in parallel transmission. Moreover serial communication isused in short channels especially in computers buses because serial buses improve thesignal integrity.

Serial Peripheral Interface (SPI), Inter-Integrated Circuit I2C, Peripheral ComponentInterconnect Express (PCI Express), Universal Serial Bus (USB), Serial ATA (SATA), andSynchronous Optical Networking (SONET) are all examples of serial buses.

2.2 History of Serial LinksThere are many earlier methods for the serial links.

2.2.1 Visual Method2.2.1.1 Smoke Signal

Americans and Chinese developed primitive serial communication technique using smokesignaling [2]. The smoke signal is a type of a visual transmission. When the fire is coveredwith a blanket and removed quickly, there is a smoke. The size, timing and shape ofthe puffs of the smoke can be controlled. Within the visual range of the smoke, anyonecan observe the puffs. Stations were created to maximize the visual range to cover largedistances.

2.2.2 Wire Method2.2.2.1 Current Loop

Analog and digital loops are two types of the current loops. An analog current loop uses4-20 mA of current where 4 mA corresponds to zero output or 0%, and 20 mA corresponds

4

Page 19: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

to full scale output or 100%. In the digital current loop, the high signal is represented bythe absence of current, and the low signal is represented by the presence of the current.

2.2.2.2 Morse code

In 1832, Samuel F.B. Morse developed the single wire telegraph concept [3]. The Morsecode became in some earlier time the dominant language of the telegraph. Morse wasable to transmit signals over long distance wires. Professor Leonard Gale, a professorof chemistry at New York University, helped Morse in sending a message along 16 kmdistance and this was a great development.

2.3 Introduction to high speed serial linksHigh speed serial links consists mainly of a Serializer and a Deserializer (SerDes). Theserializer acts as a transmitter and the deserializer acts as a receiver.

The serializer main goal is to achieve high bit rate, low power consumption and lownoise. It consists of a multiplexer to convert from parallel data to serial ones, a pre-driverand a driver that makes the output signals able to transmit properly [4]. Figure 2.3 showsa simple serialzer.

Mu

ltiplixe

r

Pre Driver

Driver

N bits

Figure 2.3: Serializer

The deserializer main goal is to achieve low Bit Error Rate (BER), high bit rate, andlow power consumption. It consists of a pre-amplifier which compensates the loss of thechannel, a sampler or a slicer which is responsible for extracting the data from the receivedsignal, and a DEMUX which converts serial data to parallel data [5]. Figure 2.4 shows asimple deserialzer.

Amp

N bits

Dem

ult

iplix

er

Sampler

Figure 2.4: Deserializer

5

Page 20: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Deserializer can contain a Clock and Data Recovery (CDR) block which selects anoptimal sampling point for the data by observing the data transitions.

2.3.1 Types of SERDESThere are many types of SerDes design [6] [7]: parallel clock SerDes, embedded clockSerDes, 8b/10b SerDes, bit interleaving SerDes, and shift register SerDes.

2.3.1.1 Parallel Clock SerDes

This type is widely used in Universal Test And Operations Physical Interface For Atm(UTOPIA), PCI, and processor buses. It consists of several serial links in parallel. Thus, abank of n:1 MUXs can be used instead of using one big MUX as shown in Figure 2.5.

n:1 M

UX

n:1

MU

Xn

:1 M

UX

Latch

PLL

Stream

Stream bar

Stream

Stream bar

Stream

Stream bar

ClkClk bar

Figure 2.5: Simple Parallel Clock SerDes Example

Each MUX is responsible for serializing its section of the bus separately. The serialstreams travel to the Rx in parallel with a differential clock signal pair that can be used forthe Rx to recover the data. The skew between the clock and the data must be considered andminimized for proper operation. Parallel clock SerDes is efficient in price and performanceand it is an efficient way to transmit a wide parallel bus through long cables.

2.3.1.2 Embedded Clock SerDes

This type of SerDes serializes both data and clock onto one serial pair. The transmittedframe consists of: two bits of the clock (low and high bits) are embedded into the serialstream, a start signal, payload of the data, and a stop bit at the end of the frame. Afterpowering up, the Rx searches for the periodic embedded clock rising edge. Sice the data

6

Page 21: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

bits change value over time while the clock bits do not, the Rx is capable of recognizingthe clock bits and synchronize to it. Consequently, the Rx can recover the data once locked.The embedded clock SerDes is commonly used in applications that transmit data withother control signals such as parity bits, synchronization bits, status bits, etc. There isanother feature of this type of SerDes which is that the Rx lock automatically to data.This is useful especially for a remote Rx and in the case of having one Tx that sendsdata to many receivers, so each Rx can lock to the data without interrupting the traffic bytransmitting training characters.

2.3.1.3 8b/10b SerDes

It maps eight parallel bits to ten bits parallel bits and then serializes these ten bits onto aserial pair. The ten bits code is generated by IBM Corporation in early 1980’s. 8b/10bencoding is used to generate balanced number of ones and zeros in the transmitted data andto guarantee many transitions every cycle to make the Rx synchronize to the incoming datastream. The Rx can locate the ten bits code word boundaries in the stream by observinga special symbol called a comma character which was sent by the Tx. The bit sequencein the comma character never appears in the normal data. Once the Rx observed the 10bit code, it maps them back to eight bits and flagging an error if it detects an invalid tenbits code. 8b/10b SerDes is used in many standards such as Ethernet, Fiber Channel,InfiniteBand.

2.3.1.4 Bit Interleaving SerDes

It multiplexes slower SONET or 8b/10b serial streams into one faster serial stream byinterleaving the bits as shown in Figure 2.6.

n:1

MU

X

Latch

ABCD

D C B A

D C B A

Figure 2.6: Simple Interleaving SerDes Example

The Rx demultiplexes the bits back to the slower streams. Bit interleaving SerDes isused for switches and routeres to get more bandwidth.

2.3.1.5 Shift Register SerDes

This type is considered the fastest conventional SerDes. It consists of D flip-flops whichload and shift the data, and MUXs which select between the output data from the Dflip-flops and the parallel data as shown in Figure 2.7

7

Page 22: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

D3

2:1

MU

X

D2

D Q

2:1

MU

X

D1

D Q

2:1

MU

X

D0

D Q D Q

CLKLoad

OUT

En

D Q D Q D Q D Q

Q3 Q2 Q1 Q0

En

CLK

Figure 2.7: Conventional SerDes [8]

2.4 Introduction to CUSPARCCUSPARC is the first Egyptian processor which was developed at Cairo University.CUSPARC development started in 2004 in a graduation project. Since then, CUSPARCpassed through many development phases to boost its performance. IBM CMOS 130nmprocess was used in implement two versions of CUSPARC (V1 and V2) on silicon. TheCUSPARC design was also ported to TSMC 65 nm technology. Soliman etal [1] designeda 16-core NoC based on CUSPARC V1 cores.

2.4.1 CUSPARC ArchitectureFigure 2.8 shows the architecture of the first generation of CUSPARC.

8

Page 23: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Interrupt Controller

64-bit Wishbone Bus

IU

I-CACHE D-CACHE

Cache Controller

Slave

Memory Controller

64-bit Wishbone Bus

UARTBridge

Memory Interface

UART Interface

8-bit Wishbone Interface

Slav

e

Ma

ste

rSl

ave

Ma

ste

r

Slav

e

Figure 2.8: CUSPARC Architecture [1]

2.4.1.1 Processor Integer Unit (IU)

The IU is very important part in the CUSPARC. The fetching, decoding and executionare done in this unit. The processor has four stages of pipeline and about 80 instructionsare implemented as per the SPARC ISA standard. IU additionally contains a windowedregister file with four windows and 72 registers. Additionally, various control/statusregisters which control the entire processor operation are actualized in the IU. In 2010, aninteger multiplier has been added to the IU to enhance its performance.

2.4.1.2 Instruction Cache (I-Cache)

A basic finite state machine (FSM) controls the instruction cache which is used in the fetchstage. The instruction of the address is read inline and stored in the Instruction Registerof the decode stage (IR) by the I-Cache. If there is a cache miss, the processor pipelineis sitting tight for the I-Cache to have the required block from the primary memory.Consequently, there is a cache miss state at which the I-Cache sends a request to thecontroller of the cache to read the block from the memory.

Table2.1 shows the I-Cache parameters.

9

Page 24: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

2.4.1.3 Data Cache (D-Cache)

While the I-Cache is controlled by a simple FSM, a more complex FSM controls theD-Cache because it is responsible for the operations of the read and write. Moreover, theD-Cache is responsible for managing the Input / Output (I/O) devices. The controller unitof the D-Cache checks the address coming from the IU to recognize if it corresponds to anI/O devices or not. Then if the address is from an I/O device the D-Cache sends a requestfor read or write operation to the controller of the cache to write or read data to or fromthe I/O device. Data is not kept in the RAM of the cache to follow along the data in thedevices. Furthermore, if the address is not for an I/O device the D-Cache carries out acomparison between the TAG memory and the coming address to see if there is a miss ora hit in the cache.

Table2.1 illustrates the D-Cache parameters.

Table 2.1: Instruction and Data Caches parametersParameters I-Cache D-CacheMiss latency 8 Clock Cycles 8 Clock Cycles

Replacement Policy Direct Map Direct Map

Write Policy NA Write Through

Size 4 KB 4 KB

2.4.1.4 Cache Controller

A simple FSM controls the cache controller. In case of a cache miss or I/O access thecache controller works like a server to answer for the requirements form the I-Cache andD-Cache. The cache controller is considered the only unit on the main bus of the processor,which made the implementation of the bus simpler.

2.4.1.5 Main processor bus

The main bus of the CUSPARC is the 64-bit Wishbone bus. This wishbone bus is anopen source standard created by the OpenCores organization [9]. The Wishbone bus makedifferent bus widths simple and flexible. Moreover, it offers portability to different ICtechnologies.

There are two types of the interfaces to the wishbone bus: MASTER and SLAVE.The cores that can generate bus cycles are called MASTER interfaces and the cores thatreceive the cycles of the bus are called SLAVE interfaces.

2.4.1.6 Memory Controller

The memory controller interfaces the flash memory and the main RAM memory to theprocessor main bus. The design of the memory controller incorporates flexible softwarecontrol of the SRAM and flash memory timing signals. The memory controller consists ofseveral blocks:

10

Page 25: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

• Main Unit: which receives the requests that are coming from the controller of thecache through the main procesoor bus (64-bit wishbone bus) and recognizes if theaddress is a flash memory or RAM memory.

• Flash Controller: responsible for the producing the control signals of the FLASHmemory and waits to get the valid data from the bus of the memory.

• SRAM Controller: responsible for producing the control signals of the SRAM andwaits to get the valid data from the bus of the memory.

• Control Registers: consists of different registers which control the units of thememory controller like registers with data addresses and instructions and memorydelay counter.

• Boot Loader: is a Direct Memory Access (DMA) which is responsible for mov-ing the data and instructions from the FLASH to SRAM memory at the bootingstage. To determine the start of the boot loader, a software is used to get the loca-tions of the destination and the source of the data and instructions that will be moved.

2.4.1.7 Bridge

The bridge connects the 64-bit wishbone bus to the 8-bit wishbone bus. A FIFO bufferis responsible for saving the data coming from the main bus (64-bit wishbone bus) tobe written into the devices connected to the slow bus (8-bit wishbone bus) in the writeoperation. This FIFO buffer is not used in the read operation, instead the fast bus justwaits for the data to come from the slow bus.

2.4.1.8 Interrupt Controller

The interrupt controller handles with the interrupt requests coming to the CUSPARCprocessor from external devices.

2.4.1.9 Peripherals

CUSPARC can deal with slow external peripherals through the 8-bit wishbone bus in-terface. These peripherals should be compatible with the standard of the wishbone bus.CUSPARC can deal with 256 devices simultaneously. There are other peripherals such asUART and interrupt inputs IRQ1, IRQ2 and IRQ3. Every peripheral has a specific addressspace.

2.5 Introduction to NoCsAs per Pollack’s rule [10], the performance of the processors increases proportional tothe square root of the increase in complexity. This leads to a slower increase in the

11

Page 26: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

performance than the power. A Multi-Core processor offers a solution to this problem.

Multi-Core-processors result in higher performance through increasing the number ofcores rather than increasing the frequency. However, by increasing the number of cores,the interconnects become an obstacle for enhancing the performance. This is because ofthe errors and the delays that will result from very long wires. NoCs provide a solutionto this problem. In NoCs, shorter interconnects are used between the cores as each corecommunicates with the neighboring cores only even if there are thousands of cores [11].

NoCs consist of topology, routing, flow control, router design, and link design.

• Topology: determine the layout and the links between the NoC nodes.

• Routing: for a certain topology, routing method determine the links between nodesthat the message will travel through.

• Flow control : is responsible for allocating channels and buffers for messages.

• Router: consists of input buffers, router logic and state, allocators, and a switch.Pipelining technique is used to improve throughput.

2.5.1 Problems of parallel Buses in NoCAlthough NoC offers a solution for many multi-core challenge as discussed in Section2.5, it introduces other challenges on its own especially in a large NoC with parallel datatransmission between cores.

Parallel buses must work with low data rates to decrease the power dissipation in thecores of the NoC. Moreover, interference exists between multiple adjacent wires. Parallelbuses consume area, and increase the complexity in routing. The increase in the routingcomplexity can be solved with different metal layers, but to a certain limit [11].

Serial buses result in simpler layout and result in lower area, cost, and interferencebetween wires. However, serial buses introduce Inter-Symbol Interference (ISI) at highdata rates. Proper SerDes design is needed to overcome this issue [11].

2.6 Literature ReviewA lot of research is available in the field of serial links and made modifications over thebasic SerDes designs discussed in section 2.3.1. Table 2.2 shows comparison betweensome designs related to on-chip NoC designs with serial communication links over theperiod from year 2005 to 2014.

A new serdes technique, wave-front train (WAFT), is presented in [8] to get over theproblems of the basic serializer/deserializer. The shift register SerDes is considered thefastest conventional SerDes. It contains D flip-flops which load and shift the data, andMUXs which select between the output data from the D flip-flops and the parallel data.But this architecture has several problems:

12

Page 27: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table 2.2: SerDes Comparisonpaper

number/specs

[8] [12] [13] [14] [15] [16] [17]

ShortSum-mery

A newserializa-

tiontech-nique

WAFTserdesused inNoC

Prototypeserial

link em-ployingpulsedcurrent-mode

signaling

Seriallink

trans-ceiver foron-chip

Synch-ronousparallel3D links

TSV

Asynch-ronous

3D-NoCrouter

Self-timedsemiseriallink

WP-CML

SerDesstructure

Year 2005 2005 2009 2010 2011 2012 2014

Fabri-cation

Yes Yes Yes No Yes No No

Tech-nology

0.18 µm 0.18 µm 0.13 µm 90 nm TSMC65 nm

TSMC65 nm

TSMC65 nm

DataRate

(Gb/s)

3 8 9 2 16 9.09 (onelink)

12.67

Energyper

bit(pJ/bit)

- - - - 1.95 0.98 (onelink)

1.12

Area ofSerDes(µm2)

- - 177500 460.681 19000 33124 -

• Limitation of the maximum clock frequency: as it is limited by the delay time ofthe D FF.

• The overhead of the high clock of the system.

• Synchronization problem: If the channel is long, there will be a skew between thereceiver and the transmitter which should be taken into account.

The WAve-Front Train (WAFT) technique is a new method of serialization and dese-rialization to overcome these problems. Table 2.3 shows the difference between WAFTSerDes and the basic SerDes.

As shown in Figure 2.9 which is 4:1 WAFT circuit, when the EN signal equals to zero(Low) the data ( D0 : D3) is loaded into the delay elements (QS0 : QS3). VDD is loaded

13

Page 28: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table 2.3: WAFT and Basic SerDes ComparisonBasic WAFT

Uses clock Uses a constant delay elements(DEs)

Uses shifting mechanism Uses signal propagation

Data recovery sampling time notembedded in the signal

The sampling time embedded in thesignal

in QP and the ground signal is loaded into OUT and this means that the serializer is off.The second case if the EN equals to 1 (High), the loaded signal (D0 : D3) are shiftedthrough the MUXs to OUT. The transmitted data shifts through the deserializer until theVDD signal reaches at the end of the receiver. The unit delay of both the receiver and thetransmitter is the same, which equals to the delay of the delay element (DE) and the MUXdelay. Consequently, if the VDD signal reaches to the stop point, the data (D0 : D3) willreach at (Q0 : Q3) at the same time. When the stop signal is equals to 1 (High) the outputis latched.

2:1

MU

X

D3

2:1

MU

X

D22

:1 M

UX

D1

DE DE

2:1

MU

X

D0

DE

2:1

MU

X

DE

VDD

2:1

MU

X

DE

EN

OUT

2:1

MU

X

Q3

2:1

MU

X

Q2

2:1

MU

X

Q1

DE DE

2:1

MU

X

Q0

DE

2:1

MU

X

DE

STOP

DE

/2

QpQS0QS1QS2QS3

Pilot signal MUXp MUX0

Figure 2.9: WAFT SerDes [8]

The WAFT main concept is the necessity of having equal delays for both the trans-mitter and receiver, so any variation between the delays of the DE will increase the jitterat the deserializer and lower the performance. An NoC is fabricated in [8] using 0.18µm technology. The interconnection between switches were the serial links with WAFTSerDes. The eight to one WAFT had a data rate 3 Gb/s at 1.8-V supply.

In [12] a prototype serial link with pulsed current-mode signaling is fabricated witha 1 GHz clock using 0.18− µm technology with data rate of 8 Gbps. The wires on chiphave impedance with inductive part. Proper design of the interconnection wires canutilize the inductive part of the wire impedance to lower the interconnection delays. Theimplemented links are co-planar wave-guide links which is better in performance thanthe micro-strip wave-guide. Table 2.4 summarize the difference between co-planar andmicro-strip wave-guide.

14

Page 29: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table 2.4: Coplanar and Microstrip Waveguides ComparisonCoplanar Microstrip

Consistent with layout image Not consistent with layout image

Flexible to provide limitedincreasing of Zo

Not flexible to provide limitedincreasing of Z

The driver consists of eight current mode drivers and the receiver is a StrongARMsense amplifier latches. A prototype link was fabricated depending on the concept of onchip pulsed current mode signaling to achieve the speed of light latency across a transmis-sion line using TSMC 0.18− µm technology with interconnect length of 3 mm. The linkachieves 8−Gb/s data rate with a clock frequency of 1 GHz. The Driver at the transmitterand the receiver have the same clock, even though there is a clock skew between themwhich can be adjusted at the start up of operation.

On chip global signaling with serial link transceiver is presented in [13] and fabricated.One of the factors that leads to Inter-Symbol Interfernce (ISI) is the series resistance.This problem can be solved by making a limit for the frequency, but this makes theimplementation of the link more complex. Another way of solving this problem is usingvery thick lines for the interconnections. The data can be transmitted through only onechannel or two channels at most (differential signals) through the serial transmission. Ifthe data consists of 8-bits, then the speed of the serial link is higher than the parallel busby eight times. The high clock frequency is a challenge in the serial link as the receivermay require higher clock frequency to sample the data. The transceiver used is shown inFigure 2.10. It consists of PLL, Tx, Rx and self error checking block.

Clock generation

Self test (error check)

Phase tuned Rx

Tx

TL

Figure 2.10: Prototype SerDes for on Chip signaling [13]

The transmitter shown in Figure 2.11 consists of serializer, drivers and frequency

15

Page 30: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

divider. The data coming from the self test block is serialized and transmitted using thedrivers.

Clock generation

Tx

Serializer

Self test (error check)

Divider

Driver

2(4.5 GHz)

2(2.25 GHz)2(1.125 GHz)

8(1.125 GHz)

2(4.5 GHz)

9 Gbps2

2

Figure 2.11: The Transmitter Proposed In [13]

The receiver as shown in Figure 2.12 composed of phase interpolator and a filter totune the phase, comparator to sample the data and First In First Out Buffer (FIFO) todeserialize the data.

16

Page 31: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Clock generation

RC-CR(4.5 GHz)

2

(4.5 GHz)4 Phase

interpolatorPhase

control

2

Comparator

2

Clk Domain Sync.

FIFO

2

FIFO

4

Self test (error check)

8

%2

%2

2

2

2

(2.25 GHz)

(1.125 GHz)

Recovered Data

Figure 2.12: The Receiver Proposed In [13]

The on chip channel length is 5.8 mm. The used technology is 0.13 µm. The data istransmitted with 9−Gbps data rate and the area of the chip is 3.57 mm2.

In [14] a novel method to achieve the high bandwidth offered by through silicon vias(TSVs) is proposed. To get over the synchronization problem of the long interconnects, thedata is transmitted using parallel buses. In 3D NoC, data can be transferred synchronouslybetween two tiers through “3D” vias. Those “3D” vias are known as Through Silicon Vias(TSVs), but using parallel buses will lead to large area and cost. By using serial links, datacan be sent through limited number of TSVs and hence solving the problem of the areaand the synchronization by the large bandwidth offered by the TSVs. Serial links are usedinstead of parallel 3D links to gain area and increase yield by using 8:1 MUX and 1:8

17

Page 32: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

DEMUX. The deign used 90-nm technology and the output data rate is 2 Gbps with areagain of 74% relative to the case of parallel buses. The first case studied in this paper is aserial link that has a data rate of 2 Gb/s and consists of 8:1 MUX on a tier, 1:8 DEMUXon another tier and TSVs as shown in Figure 2.13.

8:1

MU

X

IN8 1

:8

DE

MU

X

8IN

PLL 1/81/2

1/2 1/2

Tier 1 Tier 2

Signal TSV

Clock TSV2 GHz

250 MHZ

2 GHz

1 GHz 250 MHZ

Figure 2.13: 2-Gb/s SerDes Proposed In [14]

The 8:1 MUX at which the data is propagated through eight shift registers using 2 GHzclock. This circuit is implemented using 90 nm technology. The 1:8 DEMUX consistsof some 1:2 DEMUXs. Each group of the 1:2 DEMUXs operates with different clockfrequency. This DEMUX is implemented using 90 nm kit with a data rate of 8 Gb/s.The used TSVs have spacing and diameters of 20 µm. Table 2.5 shows the comparisonbetween the area of serial link and parallel link.

Table 2.5: Comparison between Area of 2 Gb/s serial link and its corresponding parallellink Proposed In [14]

Component Area of seriallink(µm2)

Area of 8-bit parallellink(µm2)

MUX 440 0

DEMUX 443 0

Frequency Divider 21.2 0

TSV number 4xTSV 18xTSV

TSV array 6400 28800

Total Link 7304 28800

Area % 25.36% 100%

The second case studied is 8Gb/s quasi serial link for NoCs,which consists of 32:4MUX, 4:32 DEMUX, PLL and some frequency dividers as shown in Figure 2.14. Table2.6 shows the comparison between the area of serial link and parallel link.

18

Page 33: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

32

:4 M

UX

IN32 4

:32

D

EMU

X

32IN

PLL 1/81/2

1/2 1/2

Tier 1 Tier 2Signal TSV

Clock TSV2 GHz

250 MHZ

2 GHz

1 GHz 250 MHZ

4 4

D1

D2

D3

D4

Figure 2.14: 8-Gb/s Quasi Serial Link [14]

Table 2.6: Comparison between Area of 8 Gb/s serial link and its corresponding parallellink Proposed In [14]

Component Area of seriallink(µm2)

Area of 8-bit parallellink(µm2)

MUX 1282.6 0

DEMUX 11854.4 0

Frequency Divider 21.2 0

TSV number 12xTSV 66xTSV

TSV array 19200 105600

Total Link 21689.2 105600

Area % 20.54% 100%

An asynchronous 3D-NoC router with a novel serialization scheme in the verticaldirections is fabricated in TSMC 65-nm technology in [15]. The 3D-NoC router presentedin this work supports two vertical links. Vertical TSVs are more costly than the horizontallinks, which has its impact on reducing the number of vertical connections. A vertical seriallink is used in the vertical connection and two different channels are used in the horizontalintra tier connections. The serial links reduces the number of interconnects between diesand hence increasing the bandwidth of the vertical connections. Consequently, the layoutof the TSVs will be simpler. The m:1 serializer is shown in Figure 2.15. It consists ofgroups of MUXs and the 1:m deserializer consists of tree of DEMUXs. The used MUXsand DEMUXs are self controlled to get the minimum overhead of the controller part inthe serializer and deserializer.

19

Page 34: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

m

m

m

m

P

P

P

P

P

P

m

P

P

P

P

P

P

m

P

P

P

P

P

P

m

P

P

P

P

P

P

m

m

m

m

m

Pn

n

Self controlled

MUX

Figure 2.15: SerDes With Serialization Ratio n/p [15]

The design of the NoC router and serial link is implemented by place and route tools.Table 2.7 shows the area optimization between different routers.

The output data rate is 16 Gb/s and the power consumed by the serial links (up anddown) is 31.2 mwatt with area gain of 57 % than parallel buses architecture.

In [16] a serial link with pulse dual-rail encoding techniques is designed using TSMC65-nm technology. The semi-serial link composed of eight serial links. The main topic

20

Page 35: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table 2.7: Area Optimziation for the Routers proposed in [15]Parameters Parallel links

RouterSerial Link with

External TSVSerial Link with

Internal TSVRouter Logic 0.170 mm2 0.170 mm2 0.185 mm2

Serial LinkLogic

0 mm2 0.02 mm2 0.02 mm2

TSV 0.36 mm2 0.085 mm2 0.02 mm2

Total Area 0.53 mm2 0.275 mm2 0.225 mm2

Area % 100 % 52 % 43 %

of this paper is the design of circuits and methods used in the serializer with the encoderwhich is dual rail type. The deserializer at the receiver composed of shift registers. Thereis an interface circuit between the serial link and the beginning of the router. The proposedserial link is shown in Figure 2.16.

Sending Router

Serializer

Ack Receiver

Encoder+

Differential Driver

Ack Driver

Differential Rx+

Decoder

DeserializerReceiving

Router

Ack2L

Req2L

Din

N

Ackout

SRdout

SClk

Ack diff. wires

Data diff. wires Wdout

DVIout

Ack2R

Req2R

Dout

N

Figure 2.16: Proposed Serial Link in [16]

The serializer consists of a clock generator,a shift register, a counter and interfacingcircuits. The shift register shifts the data, using the clock generated, which consists ofTrue Single-Phase-Clocked (TSPC) flip-flops (FFs) having small delay. The TSPC isdeveloped to have the ability of loading parallel data. The channel between the transmitterand receiver is a distributed RLC model with a length of 8 mm. The deserializer composedof a shift register (TSPC FFs) and interfacing circuit between the receiving circuit andthe deserializer. The receiving circuit is responsible for receiving and amplifying thedifference of the voltage between the differential wires. The link simulated is composed ofeight bit-serial links each of them is eight bit to one bit serial link. The simulation resultsare compared of 64-bit to one-bit serial link. The semi-serial link had a lower energydissipation than the bit-serial link. This is due to a couple of reasons:

• The increase of the channel bandwidth is greater than the increase of the powerdissipation.

• SerDes circuits in the semi serial link has lower power dissipation than the bit seriallink because there is no need for the 63-bit counter and using seven-bit counterinstead of it. Moreover, there is no need to replicate the clock generator and thecontrol circuits. Instead, buffers are used to propagate the clock and control signalsinstead.

21

Page 36: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

The semi serial link has a better performance and is more energy efficient than a bit-serial link especially for a long channel distance. As shown in Table 2.8, the throughput isdoubled with the number of the parallel links, but the power and energy dissipation didnot double.

Table 2.8: Performance of 64-bit 8-mm Serial Links in [16]No. of parallel

serial linksThroughput

(Gbps)Power Con-

sumption(mW)Energy perbit(pJ/bit)

1 9.09 8.931 0.982

2 18.182 9.020 0.496

4 36.364 13.890 0.382

8 72.728 20.852 0.286

The design is simulated in Cadence Analog Spectre and results in 72.72-Gbps datarate and 20.852 mwatt power consumption for eight parallel eight bit to one bit serial linkswith 8−mm channel.

Wave pipelined CML (WP-CML) SerDes structure is designed in [17] with eachcomponent designed in CML mode using TSMC 65-nm technology. The simulationsresult in 12.67-Gbps data rate and 14.3-mwatt power. The independence from the clock inasynchronous WP SerDes is a great benefit as the control signals guarantee the validity ofdata transmission and they are used instead of a PLL to adjust the clocks in the Tx andRx. Even though it is a must to wait for the handshake signal and this will slow down theoperation. The insertion of CML makes faster transition between the levels and requiresmall delay between stages. The design implemented is based on pipelined wave SerDes.The design does not depend on the clock and this is considered an advantage for thisdesign. The serializer consists of MUXs, Delay Elemnts (DE) and CMOS to CML blocks.The deserializer consists of MUXs, FF, DE, and CML to CMOS block. Figure 2.17 andFigure 2.18 show the block diagrams of the serializer and deserializer.

Cmos To

CML

MUX

DE

In7 Cmos To

CML

MUX

In6

DE

Cmos To

CML

MUX

In0

DE

MUX

DESer out

Load

Pilot

Figure 2.17: WP CML Serializer in [17]

22

Page 37: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

MUX

DE

MUX

DE

MUX

DE

MUXPilot

Ser In

FFCmos

To CML

Out7FF

Cmos To

CML

Out6FF

Cmos To

CML

Out0

Control Block

Latch

FF_en

Figure 2.18: WP CML Deserializer in [17]

The MUXs in the serializer are responsible for loading new data if the load signal ishigh. When the load signal is low the DE shifts the data throughout the stages. The signal(Ser Out) is sent to the deserializer that should have similar structure like the serializerto guarantee correct sampling. The deserializer uses MUxs for latching the signal. Thelatch-configured MUX stores the output value if the transmission is stopped. To startnew data transmission, the (FF en) signal becomes high to load the received data in eightoutput FFs. This makes the data valid until a new data is transmitted and registered. Thetechnique used in this paper is similar to the WAFT technique of [8]. Table 2.9 shows thecomparison between different SerDes Architectures.

Table 2.9: Comparison Between Different SerDes ArchitecturesDesign Technology(nm) Speed(Gb/s) Power(mW) Power per

bit(pJ/bit)WP-CMOS [18] 180 3.9 2.44 0.62

WAFT [8] 180 4.3 NA −

CMOS-CML[19]

65 10 106 10.6

Self Timed [20] 65 12 15.5 1.29

WP-SR [21] 65 67 150 2.23

[17] 65 12.67 14.3 1.12

23

Page 38: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Chapter 3: Proposed SERDES DesignThis chapter presents our proposed 6-Gb/s serial link transceiver in details. The proposedserial link transceiver is composed of a Tx and a Rx. The Tx consists of encoder, MUX,and drivers. The Rx consists of sampler, DEMUX, retimers and decoders.

Two designs of our proposed SerDes are presented in this thesis: Design A andDesign B. Design A presents SerDes that works only in the TT corner with high powerconsumption. Design A is presented in section 3.1. Design B presents a second versionof design A after power reduction and it works for all corners. Design B is presented insection 3.5.

3.1 System Description (Design A)The platform for the 6-Gb/s serial link is shown in Figure 3.1. It consists of two mainblocks; a transmitter and a receiver.

600 Mb/s

600 MS/s

600 MS/s

6 Gb/s

6 Gb/s

6 Gb/s

600 MS/s

600 Mb/s

10 bits

10 bits

8 bits

Clocks and selections

Driver

1:1

0

DE

MU

X

clock

10 bits Latches

and Retimer

8 bits

10b/8b Decoder

CLK

8b/10b Encoder

10

:1 M

UX

Clocks and selections

Sampler

Tx Rx

6 Gb/s

Figure 3.1: The Whole 6-Gb/s Serial Link Platform

3.2 Transmitter DesignThe transmitter consists of an 8b/10b encoder to adjust the number of consecutive onesand zeros, a 10:1 multiplexer (MUX) responsible for providing the 6-Gb/s data and adriver to generate a signal suitable for transmission.

3.2.1 8b/10b Encoder8b/10b encoder encodes the eight bits input data to ten bits parallel data using 600MHzclock, so the data rate output from this block is 600Msymbols/s. After the encoding isdone, the serial data has the same number of zeros and ones for a given length which iscalled a Direct Current (DC) balanced data. The maximum number of continuous zeros orones in the serial data without transitions is five (the duration of run equals to five) whichhelps the receiver to extract the data correctly [22]. If clock and data recovery is to beimplemented. This also reduces the ISI.

24

Page 39: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

The encoded data has a Running Disparity (RD) which is the difference between thenumber of ones and zeros in the code. The RD can take three values +2, -2 and zero [22].If the coded vector has non-zero RD, then the next encoded data must have a differentRD per the coding rules [22]. Hence each coded ten bits have two disparities: current RDwhich is the disparity of the encoded vector and next RD which is next disparity calculatedform the current RD. RD can be divided into two categories: RD+ (RD = 1) when RDequals to -2 or zero and RD- (RD = 0) when RD equals to +2 or zero [22].

The code consists of 256 data characters (Dx.y) and 12 control characters (Kx.y). Thecoding method divide the eight bits data inputs into two categories of bits: three mostsignificant bits (MSB) (x) and five least significant bits (LSB) (y) denoted by H, G, F andE, D, C, B, A (from MSB to LSB). The three bits block is encoded into four bits withnames j, h, g, f. The five bits block is encoded into six bits with names i, e, d, c, b, a. Theencoded four bits and six bits are the ten bits parallel outputs [22] as shown in Figure 3.2.

D x.y

H G F E D C B A

MSB LSB8b

a b c d e i f g h j

LSB LSBMSB MSB10b

Figure 3.2: Coding Method

3.2.2 10:1 MUXIt consists of two 5:1 MUXes and one 2:1 MUX [23],[24] as shown in Figure 3.3.

25

Page 40: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

D0D2D4D6D8

D1D3D5D7D9

SEL

1

SEL

1 b

ar

SEL

2

SEL

2 b

ar

SEL

3

SEL

3 b

ar

SEL

4

SEL

4 b

ar

CLK

/2

CLK

/2 b

ar

OUT

5:1 MUX

SEL

1

SEL

1 b

ar

SEL

2

SEL

2 b

ar

SEL

3

SEL

3 b

ar

SEL

4

SEL

4 b

ar

5:1 MUX

2:1

MU

XC

LK/2

CLK

/2

CLK

/2 b

arC

LK/2

bar

Figure 3.3: 10:1 Multiplexer Block Diagram

The first 5:1 MUX is used for multiplexing the even bits (D0, D2, D4, D6, D8) andthe second 5:1 MUX is used for multiplexing the odd bits (D1, D3, D5, D7, D9). Theoutput 2:1 MUX multiplexes between the two streams to generate one line consistingof the ten bits (D0-D9) using a 3-GHz clock. Figure 3.4 illustrates that each 5:1 MUXconsists of five CMOS D latches. Figure 3.5 depicts the D latch design. Four CMOS 2:1MUX (Figure 3.6) are used for selecting the data by using selection lines [23],[25].

2:1

MU

X

D8

D6

SEL

1SE

L 1

bar

D Latch

2:1

MU

X

D4

SEL

2SE

L 2

bar

D Latch

2:1

MU

X

D2

SEL

3SE

L 3

bar

D Latch

2:1

MU

X

D0

SEL

4

D Latch

SEL

4 b

ar

D Latch

CLK (3-GHz)

CLK_bar (3-GHz)

OUT

Figure 3.4: The Transmitter 5:1 Multiplexer Block Diagram

26

Page 41: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

CLKB

CLK

CLK

CLKB

D OUT

Figure 3.5: CMOS D Latch

IN1

IN2

OUT

SEL

SELB

SELB

SEL

Figure 3.6: Transmission Gate 2:1 MUX

27

Page 42: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

The clocking scheme for the 10:1 MUX is shown in Figure 3.7. The selection pulsewidth is equal to (1/3) nsec, D8 is transferred to the output of the D latch using 3-GHzclock. The MUX chooses D6 when SEL1 is high and chooses D8 when it is low, then thedata output from the first MUX is transferred to the output of the second D latch. Thesecond MUX chooses D4 when SEL2 is high and chooses D6 and D8 when it is low. Afterthat the data transferred to the output of the third D latch, and SEL3 chooses D2 when it ishigh and chooses D4, D6 and D8 when it is low. SEL4 which is the last selection signalchooses D0 when it is high and D2, D4, D6 and D8 when it is low.

4*(1/3) nsec (1/3) nsecSEL 1

SEL 2

SEL 3

SEL 4

D6 D6 D6 D6 D8 D6

D4 D4 D4 D6 D8 D4

D2 D2 D4 D6 D8 D2

D0 D2 D4 D6 D8 D0

3*(1/3) 2*(1/3)

2*(1/3) 3*(1/3)

(1/3) 4*(1/3)

1st Mux

output

2nd Mux output

3rd Mux output

4th Mux output

Figure 3.7: Waveforms of the Selection Signals Used in the even 5:1 Multiplexer

3.2.3 Design of Selection CircuitThe selection circuit consists of a 3-bit synchronous counter with reset input which countstill four. The counter consists of 3 J-K flip flops and some logic gates as shown inFigure 3.8. Figure 3.9 shows the schematic of JK FF.

28

Page 43: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

J

KRe

Q

Q

J

KRe

Q

Q

J

KRe

Q

Q

A

A

B

B

C

C

C

BVDD VDD

SEL1C

A

B

CSEL2

A

B

CB

SEL3

ABC

SEL4

CLKRe

Figure 3.8: Selections Circuit

J

K

Q

Q

RE

DQ

Q

RETSPC

CLK

Figure 3.9: JK FF Schematic

29

Page 44: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

The selections signals are logic functions from the outputs of the counter as shown inFigure 3.8.

Figure 3.10 illustrate the schematics of TSPC D FF.

OUT

D CLK

CLK

CLK

CLK

Figure 3.10: TSPC D Flip Flop

3.2.4 DriverThe driver used is a chain of inverters, each inverter is scaled by f from the previous oneas shown in Figure 3.11.

In OUT

Figure 3.11: Driver Schematic

The number of inverters and the scaling factor f are chosen to get minimum delaythrough the driver when loaded by the Transmission Line (TL) and the input of the receiver.The TL is modeled as a wire on metal 8 layer (M8 layer) of the TSMC digital 65-nmCMOS technology with a length of 600 µm and a width of 0.4 µm. The TL is modeled byits S parameters as illustrated in Figure 3.12.

30

Page 45: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Driver2-port

network(s-parameters)

In OUT RX

Figure 3.12: Driver with TL

3.2.5 Frequency DividerThe 3-GHz clock is generated from a frequency divider which is a T flip flop as shown inFigure 3.13. T FF is a JK FF with its J port is connected to its K port.

T

QCLK OUT

VDD

CLK IN

Frequency Divider

Figure 3.13: Frequency Divider

3.3 Receiver DesignThe receiver contains a sampler to get the digital signal back, a 1:10 demultiplexer(DEMUX) to get the parallel data back, a retimer for proper synchronization, and a 10b/8bdecoder to remove the encoding performed at the transmitter.

31

Page 46: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

3.3.1 SamplerThe sampler is used to sample the received signals to extract the correct data. The operationof the sampler is divided into two main phases:

• Track phase: where the output from this phase is amplified.

• Hold Reset phase: the output here is sometimes amplified through positive feedbackand sometimes the held value of the output is reset [5].

The two types of latches under consideration in this thesis are: Strong Arm (SA) andCurrent Mode Logic (CML). To form a flip flop from the SA latch, cascade Reset Set (RS)latch after SA latch. To form a flip flop from CML, cascade another CML latch after thefirst CML latch.

3.3.1.1 SA and CML flip flops

The sampler consists of a Strong Arm Flip Flop (SAFF) which consists of StrongARMlatch (Figure 3.14) [26] followed by an optimized Reset-Set (R-S) latch (Figure 3.15) [27].

CLK CLK

In+ In-

CLK

OUT+OUT-

Figure 3.14: StrongARM Sampler

32

Page 47: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

VDD

S R Q Q

Figure 3.15: Optimized RS Latch

The SAFF has advantages of:

• No static power dissipation.

• Full CMOS output levels.

The optimized RS latch has symmetric pull up and pull down paths which allowsequal delays for Q and Q‘. Another advantage of the optimized RS latch is that during theevaluation phase, only one pmos transistor is activated to change the output data whichspeeds up the operation.

The SAFF simulation results are shown in Table 3.1.

Table 3.1: Simulation results of the SA samplerSpecification Value

Average power at clock 6 GHz 71.46 µW

Max. speed 24 Gb/s

Tcq at clock 6 GHz 31.2 psec

Setup time at clock 6 GHz 10 psec

Hold time at clock 6 GHz 18 psec

Input referred offset σvt 14.9 mV

3.3.2 1:10 DEMUX and RetimersThe 1:10 DEMUX consists of 12 D flip flops (CMOS), the first two are used to extracteven and odd samples (D0, D2, D4, D6, D8) and (D1, D3, D5, D7, D9) using 3-GHzclock. The other ten flip flops select the ten samples (D0 : D9) using delayed versions ofSEL4 as shown in Figure 3.16. Figure 3.17 shows the schematic of D FF.

33

Page 48: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

IN

Even samples

Odd samples

D FF CMOS

D FF CMOS

CLK EVEN

CLK Odd

D FF CMOS

SEL0

SEL2

OUT0

OUT2

OUT8

SEL8Even

samples

D FF CMOS

SEL1

SEL3

OUT1

OUT3

OUT9

SEL9Odd

samples

Figure 3.16: The Receiver 1:10 Demultiplexer Block Diagram

D

CLK

CLKB

CLK

CLKB

CLKB

CLK CLK

CLKB

OUT

Figure 3.17: D Flip Flop

The output of the DEMUX consists of ten bits that are passed through regenerativelatches and retimers to retime the data. The ten regenerative latches are optimized RSlatches [27] and the retimers are CMOS D flip flops that retime the data with 600-MHzclock.

34

Page 49: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

3.3.3 Design of Selection CircuitThe selection circuit in the receiver side is the same circuit of selection number four inthe transmitter side (SEL4) with a delayed versions. Figure 3.18 shows the even selectionsignals. Delaying the even selection signals by buffers, the odd selections can be produced.

J

KRe

Q

Q

J

KRe

Q

Q

J

KRe

Q

Q

A

A

B

B

C

C

C

BVDD VDD

ABC

SEL0

A BC

SEL2

AB C

SEL4

A B C

SEL6

ABC

SEL8

CLKRe

Figure 3.18: The Receiver Selection Circuit

3.3.4 10b/8b DecoderIt maps ten bits parallel data inputs to eight bits parallel data outputs [22] using 600-MHzclock. The decoding process is the opposite of the encoding process done at the transmitter.

35

Page 50: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

3.4 Platform Layout of Design A

3.4.1 Transmitter LayoutThe Tx layout is shown in Figure 3.19. The Area based on TSMC 65-nm technology is36×29 = 1044 µm2, The metal stack is M1 to M3.

Figure 3.19: Transmitter Layout

3.4.2 Receiver LayoutThe Rx layout is shown in Figure 3.20. The Area based on TSMC 65-nm technology is47×24 = 1128 µm2, The metal stack is M1 to M3.

36

Page 51: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Figure 3.20: Receiver Layout

3.5 Design BDesign A work only for the TT corner because the data input to the MUX are notsynchronized to each other. Thus the system was enhanced to design B that works for allcorners by making synchronization in the data with respect to each other. Design B is alsoan enhancement for design A corresponding to the reduction of the power consumption.

3.5.1 Synchronization ProcessTo make the data synchronized with each other, delaying elements are added as followsand as shown in Figure 3.21.

• A D latch has been added before (D6, D7 and sel1) to make it synchronized withthe output of the first latch inside the 5:1 MUX.

• Two latches and a 2:1 MUX have been added before (D4, D5 and sel2)

• Three latches and two 2:1 MUXs have been added before (D2, D3 and sel3)

• Four latches and three 2:1 MUXs have been added before (D0, D1 and sel4)

37

Page 52: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

D6D LatchD 6

2:1

MU

X

D Latch D LatchD 4 D4

2:1 M

UX

D Latch

2:1 M

UX

D Latch D LatchD 2 D22

:1 MU

X

D Latch

2:1 M

UX

D Latch

2:1 M

UX

D Latch D LatchD 0 D0

CLK/2CLKb/2

CLK/2CLKb/2

CLK/2CLKb/2

CLK/2CLKb/2

Figure 3.21: Synchronization Process

The corresponding timing diagram is shown in Figure 3.22

38

Page 53: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

1.66 nsec

27.7 ps

Data 8

4*(0.333) nsec 0.333 nsec

Data 6

L1

6.2 ps

27.7 ps

Data 4

L2

3*(0.333) nsec 2*(0.333) nsec

6.2 ps

0.333 nsec14 ps

D6 D6 D6 D6 D8

D6 D6 D6 D6 D8

D4 D4 D4 D6 D8

27.7 ps

D4 D4 D4 D6 D8

M1

M2

L3

Data 2

2*(0.333) nsec 3*(0.333) nsec

6.2 ps

D2 D2 D4 D6 D8M3

27.7 ps

D2 D2 D4 D6 D8L4

Data 0

1*(0.333) nsec 4*(0.333) nsec

6.2 ps

D0 D2 D4 D6 D8

Clk (3Ghz)

SEL 1

SEL 2

SEL 3

SEL 4

M4

Latch delay=27.7ps

MUX delay=6.2ps

Latch delay=27.7ps

MUX delay=6.2ps

Latch delay=27.7ps

MUX delay=6.2ps

Latch delay=27.7ps

MUX delay=6.2ps

Figure 3.22: Timing Diagram

39

Page 54: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

3.5.2 Power ReductionAll CMOS-latches inside the MUX are replaced by TSPC flip flops which are faster thanthe CMOS flip flops and needs single ended clock only. Any CMOS flip flop is replacedby TSPC flip flop and the transmission gate 2:1 MUX is replaced by 2:1 MUX usingCMOS nand gates, which needs only single ended selection signals resulting in powerreduction. The new 2:1 MUX shown in Figure 3.23.

IN1

IN2

SEL

OUT

Figure 3.23: Gate based 2:1 MUX

The new 5:1 MUX is shown in Figure 3.24. The delays elements discussed in section3.5.1 are used but not shown in Figure 3.24 for simplicity.

2:1

MU

X

D8

D6

SEL

1

TSPC FF

2:1

MU

X

D4

SEL

2

TSPC FF

2:1

MU

X

D2

SEL

3

TSPC FF

2:1

MU

X

D0

SEL

4

TSPC FF TSPC FF

CLK (3-GHz)

OUTL1 M1 L2 M2 L3 M3 L4 M4

Figure 3.24: New 5:1 MUX

In the receiver side, all the CMOS flip flops inside the DEMUX are also replaced byTSPC flip flops using single ended clock and selection signals.

40

Page 55: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Chapter 4: Simulation Results4.1 Circuit Level Simulation Results of Design BAn analog and digital co-simulation of the full system is carried out using the AnalogMixed Signal (AMS) simulation tool. The encoder and decoder are designed using Verilogcoding while the rest of the blocks are designed at transistor level.

Each block works at a specific data rate as shown in Figure 4.1 and illustrated in Table4.1.

10 bits

8 bits

Clocks and selections

Driver

1:10

D

EMU

X

clock

10 bits Latches

and Retimer

8 bits

10b/8b Decoder

CLK

8b/10b Encoder

10:1

MU

X

Clocks and selections

Sampler

Tx Rx

600 Mb/s

600 MS/s

6 Gb/s

6 Gb/s

6 Gb/s

6 Gb/s

600 MS/s

600 Mb/s

10 bits

600 MS/s

Figure 4.1: Block Diagram with Data Rates and Eye Diagrams

Table 4.1: Data RatesBlock Name Data Rate

Encoder 600 MS/s

Mux. o/p 6 Gb/s

Sampler 6 Gb/s

De-Mux o/p 600 MS/s

Retimers 600 MS/s

Decoder 600 MS/s

4.1.1 Simulation Results of the TransmitterThe data is read from file into the encoder. The output data from the encoder is passedthrough the MUX and the driver to be transmitted. Table 4.2 shows a sequence of inputdata that was read into the platform.

41

Page 56: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table 4.2: Input and encoded data of the encoderInput Data Encoded Data

00000000 (0x00) 0010111001 (0x0B9)

00000001 (0x01) 0010101110 (0x0AE)

00000010 (0x02) 0010101101 (0x0AD)

00000011 (0x03) 1101100011 (0x363)

00000100 (0x04) 1101010100 (0x354)

00000101 (0x05) 0010100101 (0x0A5)

Table 4.2 illustrate the main frame data we sent. This frame is sent three timesrespectively. Figure 4.2(a) and Figure 4.2(b) show the simulated encoder inputs andoutputs.

Figure 4.3 shows the eye diagram of D0 output from the encoder.

Figure 4.3: D0 Encoder Output Eye Diagram

The even output from the 5:1 MUX inside the 10:1 MUX should be as following (101000100011000100110111111000) according to the Table 4.2. Figure 4.4 shows the evenoutput from the 5:1 MUX inside the 10:1 MUX.

42

Page 57: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

0 10 20 300

0.51

Time (ns)

Am

p

00 01 02 03 04 05 00 01 02 03 04 05 00 0102 03 04 05

(a)

5 10 15 20 25 300

0.51

Time (ns)

Am

p

0A53543630AD0AE0B90A53543630AD0AE0B90A53543630AD0AE0B9

(b)

Figure 4.2: Encoder Data (a) Encoder Inputs (b) Encoder Outputs

5 10 150

0.5

1

Time (ns)

Am

p (v

)

Figure 4.4: Even Output from the 5:1 MUX

The odd output from the second 5:1 MUX inside the 10:1 MUX should be as following (011101111001110101010000100110) according to the Table 4.2. Figure 4.5 shows the oddoutput from the second 5:1 MUX inside the 10:1 MUX.

43

Page 58: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

5 10 150

0.5

1

Time (ns)

Am

p (v

)

Figure 4.5: Odd Output from the 5:1 MUX

The final output from the 10:1 MUX should be as following (100111010001110101001011010100110001101100101010111010010100) according to the Table 4.2. Figure 4.6shows the final output from the 10:1 MUX.

5 10 150

0.5

1

Time (ns)

Am

p (v

)

Figure 4.6: Final Output from the 10:1 MUX

Figure 4.7 shows the eye diagram of the MUX output.

44

Page 59: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Figure 4.7: MUX Output Eye diagram

The output data from the MUX is passed through the driver to make it suitable for trans-mission. Figure 4.8 shows the output after the driver for the three corners (TT, FF and SS)respectively.

5 6 7 8 9 10 11 12 13 14 15

0

0.5

1

5 6 7 8 9 10 11 12 13 14 15

0

0.5

1

5 6 7 8 9 10 11 12 13 14 15

0

0.5

1

Time (ns)

Am

p (

v)

TT

FF

SS

Figure 4.8: Output after the Driver

Figure 4.9 shows the eye diagram of the Driver output.

45

Page 60: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Figure 4.9: Driver Eye diagram

4.1.2 Simulation Results of the ReceiverThe data passes the transmission line and at the receiver side, the data is sampled by theSA sampler, demultiplexed, retimed and decoded. Figure 4.10 shows the output after thechannel.

5 10 150

0.5

1

Time (ns)

Am

p (v

)

Figure 4.10: Output after the Channel

46

Page 61: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Figure 4.11 shows the eye diagram of the channel output.

Figure 4.11: Channel output Eye diagram

Figure 4.12 shows the output after the sampler.

5 10 150

0.5

1

Time (ns)

Am

p (v

)

Figure 4.12: Output after the Sampler

Figure 4.13 shows the eye diagram of the sampler output.

47

Page 62: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Figure 4.13: Sampler Output Eye diagram

The output from the sampler is passed though the DEMUX to get back parallel bits.Bit number zero should be (101101) and this frame is repeated three times. Bit numberone should be (010100) and so on. Figure 4.14 shows the output after the DEMUX.

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

Time (ns)

Am

p (

v)

Out0

Out1

Out2

Out4

Out5

Out6

Out7

Out8

Out9

Out3

Figure 4.14: Output after the DEMUX

Figure 4.15 shows the eye diagram for D0 output from the DEMUX.

48

Page 63: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Figure 4.15: Eye Diagram of D0 Output from DEMUX

Figure 4.16 shows the output after the regenerative latches.

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

Time (ns)

Am

p (

v)

Out0

Out1

Out2

Out3

Out4

Out5

Out6

Out7

Out8

Out9

Figure 4.16: Output after the Regenerative Latches

Figure 4.17 shows the eye diagram for D0 output from the regenerative latches.

49

Page 64: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Figure 4.17: Eye Diagram of D0 Output from REG. Latches

Figure 4.18 shows the output after the final flip-flop which is the input to the decoder.

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

10 15 20 25 30 350

0.51

Time (ns)

Am

p (

v)

Out0

Out1

Out2

Out3

Out4

Out5

Out6

Out7

Out8

Out9

Figure 4.18: Output after the Final FF

Figure 4.19 shows the eye diagram for D0 output from the FF.

50

Page 65: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Figure 4.19: Eye Diagram of D0 Output from FF

Figure 4.20 shows the outputs of the decoder for the three corners (TT, FF and SS)respectively showing correct operation.

10 15 20 25 30 350

0.51

Time (ns)

Am

p

0B9 0AE 0AD 363 354 0A5 0B9 0AE 0AD 363 354 0A5 0B9 0AE 0AD 363 354 0A5

(a)

10 15 20 25 30 350

0.51

Time (ns)

Am

p

0B9 0AE 0AD 363 354 0A5 0B9 0AE 0AD 363 354 0A5 0B9 0AE 0AD 363 354 0A5

(b)

10 15 20 25 30 350

0.51

Time (ns)

Am

p

0B9 0AE 0AD 363 354 0A5 0B9 0AE 0AD 363 354 0A5 0B9 0AE 0AD 363 354 0A5

(c)

Figure 4.20: Decoder Output (a) TT Corner (b) FF Corner (c) SS Corner

By doing XOR between the transmitted data (input data to the encoder) and the re-ceived data (output data from the decoder) it results to zero output for the three corners as

51

Page 66: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

shown in Figure 4.21.

15 20 25 30 350

0.51

Time (ns)

Am

p 00

(a)

15 20 25 30 350

0.51

Time (ns)

Am

p

00

(b)

15 20 25 30 350

0.51

Time (ns)

Am

p

00

(c)

Figure 4.21: XOR Result (a) TT Corner (b) FF Corner (c) SS Corner

4.2 Post Layout Results of Design AThe parallel encoded data input to the TX is shown in Table 4.3. Thus, the transmitteddata should be (00011010110100110001) corresponding to Table 4.3. The input data andthe transmitted data are shown in Figure 4.22.

52

Page 67: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table 4.3: Parallel Encoded DataEncoded Input Parallel Data Names Bits

VIN0 00

VIN1 10

VIN2 00

VIN3 01

VIN4 11

VIN5 10

VIN6 01

VIN7 00

VIN8 01

VIN9 11

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

4 4.5 5 5.5 6 6.5 7 7.50

0.51

Time (ns)

Am

p (

v)

Vin0

Vin1

Vin2

Vin3

Vin4

Vin5

Vin6

Vin7

Vin8

Vin9

Data

Figure 4.22: Encoded Input Parallel Data (VIN0:VIN9) and Serial Transmitted Data

The eye diagram of the transmitted data is shown in Figure 4.23

53

Page 68: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Figure 4.23: Transmitted Data Eye Diagram

The output data from the FF which is the final stage in the receiver is shown inFigure 4.24

6.5 7 7.5 8 8.5 9 9.5 100

0.51

6.5 7 7.5 8 8.5 9 9.5 100

0.51

6.5 7 7.5 8 8.5 9 9.5 100

0.51

6.5 7 7.5 8 8.5 9 9.5 100

0.51

6.5 7 7.5 8 8.5 9 9.5 100

0.51

6.5 7 7.5 8 8.5 9 9.5 100

0.51

6.5 7 7.5 8 8.5 9 9.5 100

0.51

6.5 7 7.5 8 8.5 9 9.5 100

0.51

6.5 7 7.5 8 8.5 9 9.5 100

0.51

6.5 7 7.5 8 8.5 9 9.5 100

0.51

Time (ns)

Am

p (

v)

Out0

Out1

Out2

Out3

Out4

Out5

Out6

Out7

Out8

Out9

Figure 4.24: Received Data

4.3 CUSPARC NoC Clock Skew results of Design BCUSPARC [28], [29] is the first Egyptian embedded intellectual-property processor. Toboost its performance, a NoC based on this processor core was designed [1]. Figure 4.25shows the NoC for 16 cores of CUSPARC. The data path is between any two routers (R)is shown as a solid line. The paths of the clock tree from the Phase Locked Loop (PLL)are shown as dotted lines.

54

Page 69: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

R

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

R

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

R

R RR

R R RR

R R R

R R R

1 2 3 4

Me

mo

ry C

on

tro

ller

2

Me

mo

ry Co

ntro

ller 1

UA

RT

1

UA

RT 2

PLL

Figure 4.25: CUSPARC Many-Core Mesh with Clock Tree Distribution [1]

As shown in Figure 4.26, the distance between any two neighboring cores is constant.For CUSPARC NoC design [1] targeting the TSMC digital 65-nm CMOS technology,this distance is 600 µm. Moreover, having the PLL in the center of the NoC as shown inFigure 4.26 results in different path length for the clock distribution. This necessitatesproper synchronization.

55

Page 70: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

R

R

R

R RR

R R RR

R R R

R R R

PLL

1 2 3 4

Figure 4.26: CUSPARC Clock Paths

When the cores are on opposite sides of the PLL, for example CUSPARC cores (2) and(3), the paths are of equal length. Ideally, this results in zero skew between the transmitterand receiver. For 6-Gb/s data rate, these paths are 600 µm long resulting in a delay of 1.43psec with no skew.

When the cores are on the same side of the PLL, for example CUSPARCs (1) and (2),or (3) and (4), the paths are of different length. This results in a clock skew that should beaccounted for. In our design, the paths from PLL to core 3 and 4 are 600 µm and 1065 µmlong and the clock delays are 1.43 psec and 5.12 psec, respectively. When designing theserial link, the receiver should sample the data correctly whether a skew is present or not.

The first case tested is when CUSPARC (2) sends data to CUSPARC (3) or vice versa.When CUSPARC (3) sends data to CUSPARC (4) or CUSPARC (2) to CUSPARC (1)this is the second case. The third case is when CUSPARC (4) sends to CUSPARC (3)or CUSPARC (1) sends to CUSPARC (2). All these cases with the clock delays aresummarized in Table 4.4.

For the NoC of CUSPARC processor, the architecture proposed is tested for differentclock phases for the transmitter and the receiver. By shifting the clock at the receiverside by a variable delay (v) from -166.66 psec to 166.66 psec, the maximum clock skewtolerable is up to ± 36% in TT corner corresponding to the equation 4.1. Where V1 andV2 are the values of the variable V for positive and negative sweeps.

56

Page 71: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table 4.4: Clock Skew Test CasesCases Transmitter clock delay Receiver clock delay

CUSPARC(2) sends to (3) 1.43 psec 1.43 psec

CUSPARC(3) sends to (4) 1.43 psec 5.12 psec

CUSPARC(4) sends to (3) 5.12 psec 1.43 psec

Tolerable clock skew percentage =V1+ |V2|

2×166.66667% (4.1)

This maximum tolerable clock skew is much larger than the clock skew due to differentwire delays from PLL to different routers.

The clock skew results for typical typical corner shown in Table 4.5.

Table 4.5: Clock Skew results for Typical Typical CornerTransmitter clock delay Receiver clock delay clock skew percentage

1.43 psec 1.43 psec ± 35.7 %

1.43 psec 5.12 psec ± 36.0 %

5.12 psec 1.43 psec ± 34.8 %

The clock skew results for the slow slow corner shown in Table 4.6.

Table 4.6: Clock Skew results for Slow Slow CornerTransmitter clock delay Receiver clock delay clock skew percentage

1.43 psec 1.43 psec ± 28.5 %

1.43 psec 5.12 psec ± 28.2 %

5.12 psec 1.43 psec ± 29.1 %

The clock skew results for the fast fast corner shown in Table 4.7.

Table 4.7: Clock Skew results for Fast Fast CornerTransmitter clock delay Receiver clock delay clock skew percentage

1.43 psec 1.43 psec ± 45.0 %

1.43 psec 5.12 psec ± 44.7 %

5.12 psec 1.43 psec ± 43.5 %

4.4 Power Distribution of Design BThe total average power consumed in the transceiver equals to 6.9 mW (1.15 pJ/bit) whichpoints to the high interconnect power efficiency of our on-chip SerDes design. The powerdistribution of each block illustrated in the Table 4.8.

57

Page 72: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table 4.8: Power DistributionBlocks Power (mW)MUX 0.19

Driver 1 0.07

Driver 2 0.07

Sampler 0.07

DEMUX 0.07

Total Inverters 0.02

Total Latches and FFs 0.04

Selection circuits and its delays blocks at Tx 1.56

Selection circuits and its delays blocks at Rx 1.32

Frequency divider at Tx 0.55

Frequency divider at Rx 0.32

Delays blocks for data 0.27

Delays blocks for Rx clock 2.37

Total power = 6.9

4.5 Comparison to Parallel Buses NoC ArchitectureCUSPARC single core area with parallel buses transmission equals to 0.21 mm2 [1]. L1length in Figure 4.27 equals to 0.46 mm and L2 length equals to 0.032 mm. By addingthe area of SerDes after the layout which is equal to 2172 µm2, the area of the singlecore becomes 0.21217 mm2. After converting 64 parallel lines between the routers (32data inputs and 32 data outputs of the router) to four parallel lines (two input and outputdifferential lines of the SerDes), L2 reduced by 4/64. Consequently, the routing areadecreases from 0.489 mm2 to 0.0295 mm2 as shown in Figure 4.27. Thus the routing areadecreases by 93.96%.

58

Page 73: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

R

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

R

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

NA

CUSPARC

R

R RR

R R RR

R R R

R R R

L1

L2

L1

L2

L1

L2

L1

L2

L1L2L1L2L1L2L1L2

Figure 4.27: Area reduction [1]

4.6 Comparison With Related WorksTable 4.9 shows a comparison between this work (Design B) and previous works.

59

Page 74: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Table 4.9: SerDes Comparisonpaper

number/specs

[8] [12] [13] [14] [15] [16] [17] ThisWork

ShortSum-mery

A newserial-izationtech-nique

WAFTserdesused inNoC,fabri-cated

Prototypeserial

link em-ployingpulsedcurrent-modesignal-

ingfabri-cated

Seriallink

trans-ceiver

foron-chip

fabri-cated

Synch-ronousparallel

3DlinksTSV

Asynch-

ronous3D-NoCrouterfabri-cated

Self-timedsemiseriallink

WP-CML

SerDesstruc-ture

A seriallink forCUS-PARCNoC

proces-sor

Year 2005 2005 2009 2010 2011 2012 2014 2016

Tech-nology

0.18 µm 0.18 µm 0.13 µm 90 nm TSMC65 nm

TSMC65 nm

TSMC65 nm

TSMC65 nm

DataRate

(Gb/s)

3 8 9 2 16 9.09(onelink)

12.67 6

Energyper

bit(pJ/bit)

- - - - 1.95 0.98(onelink)

1.12 1.15

Area ofSerDes(µm2)

- - 177500 460.681 19000 33124 - 2172

The energy per bit for our work equals to 1.15 pJ/bit, which is smaller than [15] andnearly equals to [17]. Our design features a very attractive combination of low power perbit plus a small area (small cost).

60

Page 75: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

Chapter 5: Conclusion and FutureWork5.1 ConclusionThe design of 6-Gb/s serial link for Many-Core CUSPARC processors, arranged in a2D mesh architecture, is presented. The design is composed of a Tx and a Rx. The Txconsists of encoder, MUX and drivers. The Rx consists of sampler, DEMUX, retimers anddecoders. The design targets TSMC digital 65-nm CMOS technology with 1.2-V supply.The design is simulated using AMS simulation tool. The simulation results in lower NoCarea, high percentage of allowable clock skew between the transmitter and the receiver,and low power consumption. Our design features a very attractive combination of lowpower per bit plus a small area (small cost).

5.2 Future WorkThe future work is to make the layout for design B to work for all corners and fabricate it.To enhance the maximum tolerable clock skew for design B, a clock and data recoveryblock may be introduced.

61

Page 76: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

References

[1] M. R. Soliman, H. A. H. Fahmy, and S. E. Habib,“NoC-based Many-Core ProcessorUsing CUSPARC Architecture,” in Microelectronics (ICM), 2014 26th InternationalConference, pp. 84−87, December 2014.

[2] [Online]. Available: http://en.wikipedia.org/wiki/Smoke_signals.

[3] [Online]. Available: https://en.wikibooks.org/wiki/History_of_Serial_Communications.

[4] [Online]. Available: https://sites.google.com/site/asuece632sm13/handouts/03-ECE632-SM13-TX.pdf.

[5] [Online]. Available: https://sites.google.com/site/asuece632sm13/handouts/04-ECE632-SM13-RX.pdf.

[6] Stauffer, David Robert, et al,“High speed serdes devices and applications,” SpringerScience & Business Media, 2008.

[7] D. Lewis,“DesignCon 2004 SerDes Architectures and Applications,” National Semi-conductor Corporation, 2004.

[8] S.-J. Lee, K. Kim, H. Kim, N. Cho, H.-J. Yoo, “Adaptive network-on chip withwave-front train serialization scheme”, Digest of Technical Papers. 2005 Symposiumon VLSI Circuits, 2005, pp.104−107, 16−18 June 2005.

[9] [Online]. Available: http://opencores.org.

[10] S. Borkar,“Thousand core chips: a technology perspective,” in Proceedings of the44th annual Design Automation Conference, ser. DAC 07. New York, NY, USA:ACM,pp. 746−749, 2007.

[11] A. Agarwal, B. Raton, C. Iskander, H. Multisystems, and R. Shankar,“Survey ofNetwork on Chip (NoC) Architectures and Contributions,” Journal of engineering,Computing and Architecture, vol. 3, no. 1, p. 21−27, 2009.

[12] A. P. Jose, G. Patounakis, and K. L. Shepard, “Near Speed-of-Light On-Chip Inter-connects Using Pulsed Current-Mode Signalling,” Digest of Technical Papers. 2005Symposium on VLSI Circuits, 2005, pp. 108111, 16−18 June 2005.

62

Page 77: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

[13] J. Y. Park, J. Kang, S. Park, and M. P. Flynn, “A 9-Gbit/s serial transceiver foron-chip global signaling over lossy transmission lines,” IEEE Trans. Circuits Syst. IRegul. Pap., vol. 56, no. 8, pp. 18071817, 2009.

[14] S. Fengda, C. Alessandro, P. Athanasopoulos, and Y. Leblebici, “Design and feasi-bility of multi-Gb/s quasi-serial vertical interconnects based on TSVs for 3D ICs,”18th IEEE/IFIP International Conference on VLSI and System-on-Chip, pp. 149154,27−29 Sept. 2010.

[15] F. Darve, A. Sheibanyrad, P. Vivet, and F. Ptrot, “Physical implementation of anasynchronous 3D-NoC router using serial vertical links,” Proc. - 2011 IEEE Comput.Soc. Annu. Symp. VLSI, vol. 2, no. 1, pp. 2530, 4−6 July 2011.

[16] E. Nigussie, S. Tuuna, J. Plosila, J. Isoaho, and H. Tenhunen, “Semi-serial on-chiplink implementation for energy efficiency and high throughput,” IEEE Trans. VeryLarge Scale Integr. VLSI Syst., vol. 20, no. 12, pp. 22652277, 2012.

[17] A. Jaiswal, D. Walk, Y. Fang, and K. Hofmann, “Low-power high-speed on-chipasynchronous Wave-pipelined CML SerDes,” 2014 27th IEEE International System-on-Chip Conference (SOCC), pp. 5−10, 2−5 Sept 2014.

[18] B.C. Hien, S.-M. Kim, K. Cho, “Design of a Wave-Pipelined Serializer-Deserializerwith an Asynchronous Protocol for High Speed Interfaces,” 4th Asia Symposium onQuality Electronic Design, 10th July 2012.

[19] D.F. Tondo, R.R. Lopez, “A Low-Power, High-Speed CMOS/CML 16:1 Serializer,”Proceedings of the Argentine School of Micro-Nanoelectronics, Technology andApplications 2009, 1st October 2009.

[20] S. Safwat, E.E. Hussein, M. Ghoneima, Y. Ismail,“A 12Gbps all digital low powerSerDes transceiver for on-chip networking,” Circuits and Systems (ISCAS), 2011IEEE International Symposium on, pp.1419−1422, 15−18 May 2011.

[21] R. Dobkin, Y. Perelman, T. Liran, R. Ginosar, A. Kolodny, “High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link,” Asynchronous Circuits andSystems, 2007. ASYNC 2007. 13th IEEE International Symposium on, pp.3−14,12−14 March 2007.

[22] P. A. Franaszek, and A. X. Widmer,“Byte oriented DC balanced (0,4) 8b/10b parti-tioned block transmission code,” U.S. Patent 4486739, December 4, 1984.

[23] E. J. Kim, K. J. Lee, and S. Kim,“A high resolution Serializer and Deserializerarchitecture for mobile image sensor module,” Can. Conf. Electr. Comput. Eng., pp.1−4, May 2010.

[24] M. Fukaishi, S. Nakamura, A. Tajima, Y. Kinoshita, and Y. Suemura,“A 2.125-Gb/sBiCMOS Fiber Channel,” IEEE J. Solid-State Circuits, vol. 34, no. 9, pp. 1325−1330,September 1999.

63

Page 78: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

[25] C. H. Hsiao, M. S. Kao, C. H. Jen, Y. H. Hsu, P. L. Yang, C. T. Chiu, J. M. Wu, S. H.Hsu, Y. S. Hsu,“A 3.2 Gbit/s CML Transmitter With 20:1 Multiplexer In 0.18 CMOSTechnology,” Mixed Design of Integrated Circuits and System, 2006. MIXDES 2006.Proceedings of the International Conference, pp.179−183, June 2006.

[26] J. Kim, B. S. Leibowitz, J. Ren, and C. J. Madden,“Simulation and analysis ofrandom decision errors in clocked comparators,” IEEE Transactions Circuits andSystems I: Regular Papers, vol. 56, no. 8, pp. 1844−1857, August 2009.

[27] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K. Chiu, and M. M. Le-ung,“Improved Sense-Amplifier-Based Flip-Flop : Design and Measurements,” IEEEJournal of Solid-State Circuits, vol. 35, no. 6, pp. 876−884, June 2000.

[28] E. E. O. Hussein, et al. ,“CUSPARC IP processor: Design, characterization and ap-plications,” in Microelectronics (ICM), 2010 International Conference, pp. 435−438,December 2010.

[29] A. A. Z. Suleiman, A. F. Khedr, and S. E. Habib,“ASIC Implementation of CairoUniversity SPARC ‘CUSPARC’ embedded processor,” in Microelectronics (ICM),2010 International Conference, pp. 439−442, December 2010.

64

Page 79: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

ملخصال

،لتداخلضد ا عالية مناعة ،المنافذ من أقل بعدد للبيانات المتتالى النقل يتميز ،التوازيعلى البيانات بنقل مقارنة بوصالت المتوازية الناقالت الستبدال حاليا السائد االتجاه مع بالتزامن. أصغر ومساحة للطاقة منخفض استهالك

بشبكة المشتركة الذاكرة ذو ياتالنو المتعدد المعالج الستبدال اخر سائد اتجاه يوجد عالية، سرعة ذات متتالية .loosely-coupledضعيفة التواصل many-core مكونة من أنوية كثيرة Many-core ةقرقا على

المقترحة المتتالية الوصلة. الكثيرة االنوية ذات الرقائق على للشبكات سريعة متتالية وصلة هذه الرسالة قدمت

. وسائق 0101ارسال دعد م بت،01/بت8 كودم على يحتوى لسلس الم (. SerDes) وازىم و لسلس م من تتكون، regenerative latchesدوائر لقف الشارة مع تجديدها ،0101استقبال عددم , للعينات خذا على وازىالم يحتوي

من شركة الرقمية نانومتر CMOS 65 تكنولوجيا باستخدام التصميم وضع تم. بت01/بت8 دائرة فك الكودو TSMC تصميم دمج تم. فولت0.1 طاقة مصدرباستعمال و SerDes ثنائية معمارية شبكةق لساب تصميم مع

.(CUSPARC) سبارك القاهرة جامعة معالجنواة ، كل منها عبارة عن 01مكونة من ةقرقا االبعاد على

فى بناء المتتالية الوصالتهذه استخدامجيجا بت فى الثانية. أمكن ب 1تعمل الوصلة متتالية المقدمة بسرعة من ٪39.31 بنسبة قةاالرق على الشبكة روابط مساحة تقليل -سابقة الذكر االبعاد ثنائية - معماريةال شبكةال

فى 8 رقم المعدنية الطبقة باستخدام تمت االنوية بين التوصيالت. متوازية بت 91 المساحة الالزمة لروابط. أثبتت محاكاة عمل هذه الشبكة قدرتها على تحمل TSMCمن شركة الرقمية نانومتر CMOS 65 تكنولوجيا

ميللي 1.3 طاقةالمتتالية الوصلةتستهلك هذه .والمستقبل المرسل بين %91 ± تصل إلى لساعةل نحرافا نسبة .لنقل بت واحدة جو بيكو حوالى 0.01أى واط

Page 80: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

صفاء أحمد محمد عبد الفتاح :ةدسـمهن 1191\11\22 تاريخ الميالد:

مصرية الجنسية: 2112\11\1 تاريخ التسجيل:

2116\--\-- تاريخ المنح: االتصاالت الكهربية االلكترونيات وهندسة القسم: العلوم ماجستير الدرجة:

المشرفون: سراج الدين السيد حبيب.د. أ براهيمإسامح عاصم د. )جامعة عين شمس ,كلية الهندسة , واالتصاالت الكهربية قسم هندسة االلكترونيات(

الممتحنون: )الممتحن الخارجي( محمد أمين دسوقى أ.د.

)جامعة عين شمس ,كلية الهندسة ,واالتصاالت الكهربية قسم هندسة االلكترونيات( )الممتحن الداخلي( محمد رياض الغنيمى .أ.د )المشرف الرئيسي( سراج الدين السيد حبيب أ.د.

عنوان الرسالة: الرقائق الثانية للشبكات علىجيجابيت في 6الية تعمل عند توصلة متلمستقبل و مرسل

الكلمات الدالة: (CUSPARC) سبارك القاهرة جامعة معالج, الرقائق على شبكات ,وازىم /لسلس م

:رسالةملخـص المناسبة للشبكات على الرقائق. تم استعمال هذه الوصالت فى متتالية لوصلة ثانية/ جيجابت 1 رابط تصميم تم

التصميم تم .سبارك القاهرة جامعة معالجكل منها عبارة عن ،كثيرة نويةتحتوى أاالبعاد ثنائية معمارية شبكة من قلل المتتالية الوصالت استخدام .فولت0.1 جهد مصدر مع نانومتر-CMOS 65 تكنولوجيا باستخدام

بت 91 بيانات وصالت لبناء مقارنة بالمساحة الالزمة ٪39.31 بنسبة الرقيقة على الشبكة روابط مساحة المرسل بين لساعةل نحرافا قدرتها على تحمل أظهرت نتائج محاكاة عمل هذه الوصالت المتتالية. وازيةتم

وات مللي 1.3 الوصلة المتتالية ستهلكت. clock periodمن زمن دورة الساعة %91 ±يصل إلى والمستقبل .بت لنقل كل جول بيكو 0.01أى حوالى

ضع صورتك هنا

Page 81: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

جيجابيت في الثانية للشبكات على 6الية تعمل عند تمت وصلةلمستقبل ومرسل

الرقائق

اعداد

عبد الفتاح صفاء أحمد محمد

القاهرة جامعة - الهندسة كلية إلى مقدمة رسالة

العلوم ماجستير درجة على الحصول متطلبات من كجزء

في

هندسة االلكترونيات و االتصاالت الكهربية

:يعتمد من لجنة الممتحنين ____________________________

المشرف الرئيسى سراج الدين السيد حبيباالستاذ الدكتور: ____________________________

الممتحن الداخلي محمد رياض الغنيمى االستاذ الدكتور:

____________________________

الممتحن الخارجيمحمد أمين دسوقى االستاذ الدكتور: كلية الهندسة, واالتصاالت الكهربية, قسم هندسة االلكترونيات (

)شمسعين جامعة

القاهــرة جامعــة - الهندســة كليــة العربيــة مصـر جمهوريـة - الجيـزة

1026

Page 82: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

جيجابيت في الثانية للشبكات على 6الية تعمل عند توصلة متلمستقبل ومرسل

الرقائق

اعداد

عبد الفتاح صفاء أحمد محمد

القاهرة جامعة - الهندسة كلية إلى مقدمة رسالة

العلوم ماجستير درجة على الحصول متطلبات من كجزء

في

هندسة االلكترونيات و االتصاالت الكهربية

تحت اشراف

سراج الدين السيد حبيب. د.أ براهيمإسامح عاصم . د

مدرس

قسم هندسة االلكترونيات

واالتصاالت الكهربية

عين شمسجامعة -كلية الهندسة

دكتور أستاذ

قسم هندسة االلكترونيات

واالتصاالت الكهربية

جامعة القاهرة -كلية الهندسة

القاهــرة جامعــة - الهندســة كليــة

العربيــة مصـر جمهوريـة - الجيـزة

1026

Page 83: 6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS...6-GB/S SERIAL LINK TRANSCEIVER FOR NOCS By Safaa Ahmed Mohammed Abdelfattah A Thesis Submitted to the Faculty of Engineering at Cairo University

جيجابيت في الثانية للشبكات على 6الية تعمل عند توصلة متلمستقبل ومرسل

الرقائق

اعداد

عبد الفتاح صفاء أحمد محمد

القاهرة جامعة - الهندسة كلية إلى مقدمة رسالة

العلوم ماجستير درجة على الحصول متطلبات من كجزء

في

الكهربيه هندسة االلكترونيات واالتصاالت

القاهــرة جامعــة - الهندســة كليــة

العربيــة مصـر جمهوريـة - الجيـزة

1026