High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim...

28
12/11/1 12/11/1 997 997 [1 ] High Performance Computing & Communication Research High Performance Computing & Communication Research Laboratory Laboratory Hyok Kim http://www.hallym.ac.kr/~hkim Performance Analysis of TCP/IP Data Send/Receive Processing Under UNIX Operating Systems

Transcript of High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim...

Page 1: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[11]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Hyok Kim

http://www.hallym.ac.kr/~hkim

Performance Analysis of TCP/IP Data Send/Receive ProcessingUnder UNIX Operating Systems

Page 2: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[22]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Talk Outline

Project overviewProject overview Performance analysis of TCP/IP protocolPerformance analysis of TCP/IP protocol

Performance analysis of Parallel TCP/IPPerformance analysis of Parallel TCP/IP Bottlenecks in processing TCP/IPBottlenecks in processing TCP/IP Performance analysis techniquesPerformance analysis techniques Measurement tool and performance metricsMeasurement tool and performance metrics Empirical resultsEmpirical results

Future & on-going worksFuture & on-going works Concluding remarksConcluding remarks

Page 3: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[33]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Project OverviewProject Overview

H/W implementation of TCP/IP protocolH/W implementation of TCP/IP protocol Handling ATM traffic(155Mbps or higher)Handling ATM traffic(155Mbps or higher) ATM interfacingATM interfacing

SpecificationSpecification Design of TCP/IP protocol processorDesign of TCP/IP protocol processor ATM interfacingATM interfacing PCI/AMBA interfacingPCI/AMBA interfacing API implementation for TCP/IP H/WAPI implementation for TCP/IP H/W

Joint project with Joint project with Hallym U. & Pusan National U. (major institute)Hallym U. & Pusan National U. (major institute) Kwangwoon U. & Kyungpook National U.Kwangwoon U. & Kyungpook National U.

Page 4: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[44]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Internet Layering and Peer ModelInternet Layering and Peer Model

FTP client

FTPserver

TCP TCP

IP IP

data link driver

data link driver

data link protocol

IP protocol

FTP protocol

TCP protocol

medium

application

transport

network

link

Page 5: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[55]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Bandwidth delivery by TCP/IPBandwidth delivery by TCP/IP

Application

ATMFast EthernetFDDI

Bandwidthrequirement

Bandwidthsupply

Reasonable bandwidth delivery ?

Application

ATMFast Ethernet

FDDI

TCP/IPTCP/IP

Page 6: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[66]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Coarse Grain Architecture of Parallel TCP/IPCoarse Grain Architecture of Parallel TCP/IP

Page 7: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[77]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Wnd.Sizing

Wnd.Sizing

Urgent request

Urgent request

Segmentre-assembly

Segmentre-assembly

TCP Error Check

TCP Error Check

TCPchecksum

TCPchecksum

QueueQueue

Flagtest

Flagtest

Securitycheck

Securitycheck

connectionname check

connectionname check

ACKcheck

ACKcheck

Statuscheck

Statuscheck

Wnd.check

Wnd.check

Application

TCP Control Info.

TCP Conn. Info.

IP Layer

Parallel Architecture of TCP Data ReceiverParallel Architecture of TCP Data Receiver

Page 8: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[88]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Performance of TCP data receiverPerformance of TCP data receiver

0

500

1000

1500

2000

2500

3000

3500

Cycle Inst. Data Read Data Write

203 64 29 15

3279

726426

683 2 1 027 2 1 0

952

322128 68

713

115 48 28

Con. Name. SearchTCP checksumRcv. Wnd. CheckACK checkSequencingData Rcv.

Performance of TCPReceive

Page 9: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[99]]High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Performance of Parallel TCP/IPPerformance of Parallel TCP/IP

Estimated speed-up Estimated speed-up

against sequential executionagainst sequential execution

IP S en d 1 .0 6

IP R ece iv e 2 .5 5

T C P S en d 1 .1 4

T C P R ece iv e 1 .0 2

Page 10: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1010]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Bottlenecks in TCP/IP Processing

data copiesdata copies between user space and kernel spacebetween user space and kernel space between kernel space and network devicebetween kernel space and network device

checksum calculationchecksum calculation memorymemory/timer management/timer management

interaction between protocol and OSinteraction between protocol and OS NOT the protocol itselfNOT the protocol itself

Page 11: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1111]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Performance Measurement (I)

S/W based measurementS/W based measurement unacceptable perturbation due to interrupt handunacceptable perturbation due to interrupt hand

ling or memory swappingling or memory swapping

H/W based measurementH/W based measurement specially designed H/W or logic analyzerspecially designed H/W or logic analyzer limited flexibilitylimited flexibility data acquisition only on execution timedata acquisition only on execution time

ex) MultiKron chip(project) by NISTex) MultiKron chip(project) by NIST

Probabilistic Analysis : Probabilistic Analysis : Queueing TheoryQueueing Theory

Page 12: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1212]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Performance Measurement (II)

Our measurementOur measurement using counters in Intel Pentium processorusing counters in Intel Pentium processor

time resolution is the same as system clock ticktime resolution is the same as system clock tick 166MHz -> 6ns166MHz -> 6ns 200MHz -> 5ns200MHz -> 5ns

provides additional informationprovides additional information memory access counts (memory bandwidth)memory access counts (memory bandwidth) number of H/W interruptsnumber of H/W interrupts mis-aligned data memory referencesmis-aligned data memory references branchesbranches

Page 13: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1313]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Performance measurement setup- sender’s part -

Communicating party System under measurement

connection

write

disconnect

user processTCP

IP

data link

socket

(4) (5)

(3) (6)

(2) (7)

(1)

Isolated 10BaseT Ethernet

Legends:(1) memory allocation and data copy(2) TCP processing(3) IP processing(4) data send to media(5) ACK arrives at datalink layer(6) ACK processing at IP(7) ACK processing at TCP

socketinitialization

Page 14: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1414]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Performance Measurement Setup- receiver’s part -

Communicating party System under measurement

socket()

bind() listen()accept()

read()

disconnect()

user processTCP

IP

data link

socket

(1) (7)

(2) (6)

(3) (5)

(4)

Isolated 10BaseT Ethernet

Legends:(1) Frame arrives at data link layer(2) IP processing(3) TCP processing(4) data copy from kernel space to user space(5) ACK construction at TCP(6) IP processing(7) data send to media

Page 15: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1515]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Empirical Result (I)

Cycle counts in TCP/IP send processing

3623

4451 8597

1930

8413

10111

861

13606

26015

36984

91015

401

401

10707

10707

10707

10707

1592

7171

861

861

861

757

757

757

757

401401

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

100bytes 300bytes 500bytes 1440bytes

Da t a C o p yTCP outputIP outputEthernet ou t p u tEthernet inputIP inputTCP i n p u t

Page 16: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1616]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Empirical Result (II)

Dynamic instruction counts in TCP/IP send processing

37

254

254

254

917

1016

1098

1458

262

262

262

262

577

593

593

593

68 69 69 69

163

163

163

163

1224

1224

1224

1224

0

200

400

600

800

1000

1200

1400

1600

100bytes 300bytes 500bytes 1440bytes

Da t a C o p yTCP outputIP outputEthernet ou t p u tEthernet inputIP inputTCP i n p u t

Page 17: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1717]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Empirical Result (III)

Memory access counts in TCP/IP send processing

203

330

430

911

652 71

0 766

1029

139

139

139

139

460

571

671

1141

47 47 47 47

72 72 72 72

724

724

724

724

0

200

400

600

800

1000

1200

100bytes 300bytes 500bytes 1440bytes

Da t a C o p yTCP outputIP outputEthernet ou t p u tEthernet inputIP inputTCP i n p u t

Page 18: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1818]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Empirical Result (IV)

Cycle counts in TCP/IP receive processing

3660

5471 5744 6028

401

401

401

401

3107 35

99 4144

6434

948

1031

1111 1471

1086

1086

1086

1086

2932

4196

5196

10986

6771

6771

6771

6771

861

861

861

861

7313

7313

7313

7313

0

2000

4000

6000

8000

10000

12000

100bytes 300bytes 500bytes 1440bytes

Ethernet inputIP inputTCP i n p u tSocket appendSocket wakeupDa t a c o p yTCP outputIP outputEthernet ou t p u t

Page 19: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[1919]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Empirical Result (V)

Dynamic Instruction counts in TCP/IP receive processing

371

1311

1315

1557

163

163

163

163

654 83

0 998

1797

92

156 220

508

116

116

116

116

472

944

1416

3540

834

834

834

834

262

262

262

262

465

465

465

465

0

500

1000

1500

2000

2500

3000

3500

4000

100bytes 300bytes 500bytes 1440bytes

Ethernet inputIP inputTCP i n p u tSocket appendSocket wakeupDa t a c o p yTCP outputIP outputEthernet ou t p u t

Page 20: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[2020]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Empirical Result (VI)

Memory access counts in TCP/IP receive processing

252

826 918 1026

72 72 72 72

358 438 517

887

63 101

139

310

81 81 81 81

372

796

1220

3148

612

612

612

612

139

139

139

139

339

339

339

339

0

500

1000

1500

2000

2500

3000

3500

100bytes 300bytes 500bytes 1440bytes

Ethernet inputIP inputTCP i n p u tSocket appendSocket wakeupDa t a c o p yTCP outputIP outputEthernet ou t p u t

Page 21: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[2121]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Memory Bandwidth Requirement (I)

““By matching the memory to the special By matching the memory to the special needs of packet processing,needs of packet processing, one could achieve high performance at an one could achieve high performance at an

acceptable cost”, by V. Jacobson.acceptable cost”, by V. Jacobson.

Page 22: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[2222]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Memory Bandwidth Requirement (II)Memory Bandwidth Requirement (II)

Then, how many memory accesses Then, how many memory accesses occur ?occur ? we measured itwe measured it

requiredTime

accessesmemoryofbitswidthbusBW bytes

_

__#)(_

8

1.)sec/max(

Page 23: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[2323]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Pure TCP/IP performance

SendProcessing

ReceiveProcessing

Mbps Memory BW required(Mbytes/sec.) Mbps Memory BW required

(Mbytes/sec.)

TCP/IP performance(including ACK processing )

56 61 62 172TCP/IP performance(excluding ACK processing)

87 61 85 172

* Calculation on 1440 bytes packet

not considering data link latencynot considering data link latency considering data send/receive and ACK considering data send/receive and ACK

segment send/receive time only in TCP/IP segment send/receive time only in TCP/IP layerlayer

Page 24: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[2424]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

From Empirical Results

To enhance the performance of TCP/IPTo enhance the performance of TCP/IP design of efficient interface between design of efficient interface between

protocol stack and OS is requiredprotocol stack and OS is required And How?And How?

Page 25: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[2525]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Future & On-going Works

Feasibility Study of ATM internetworkingFeasibility Study of ATM internetworking Analysis ofAnalysis of

ALL5 trafficALL5 traffic signaling protocolssignaling protocols commercial SAR chips & bus interfacescommercial SAR chips & bus interfaces Internetworking technologyInternetworking technology

LANE, IP over ATM, Multiprotocol over ATMLANE, IP over ATM, Multiprotocol over ATM Next Hop Resolution Protocol, etc.Next Hop Resolution Protocol, etc.

Development of TCP/IP H/W moduleDevelopment of TCP/IP H/W module now, Ethernet-based implementationnow, Ethernet-based implementation

Page 26: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[2626]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Overview of TCP/IP H/W ImplementationOverview of TCP/IP H/W Implementation

TCPtimer module

Checksummodule

Memorymanagement

Unit

ARM Target System

ARM7TDMI RISC processorARM7TDMI RISC processor AMBA expansion connectorsAMBA expansion connectors

FPGA implementationFPGA implementation

Page 27: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[2727]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Duration Register

ID Manager

Lookup Table

StateScheduler

Expired &Reference TimeGenerator

TimingScheduler

CAM

Timer RecordMemory

StackManager

ZeroDetect

Timer Management ModuleTimer Management Module

Page 28: High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim hkim Performance Analysis of TCP/IP Data.

12/11/19912/11/19977

[[2828]]

High Performance Computing & Communication Research LaboratoryHigh Performance Computing & Communication Research Laboratory

Conclusion

OS overheads play major role in high OS overheads play major role in high performance TCP/IP processingperformance TCP/IP processing

Measurement of memory access countsMeasurement of memory access counts estimation of memory bandwidth estimation of memory bandwidth

requirementrequirement H/W implementation is needed for time-H/W implementation is needed for time-

consuming modulesconsuming modules