CS 61C: Great Ideas in Computer Architecture
Transcript of CS 61C: Great Ideas in Computer Architecture
CS61C:GreatIdeasinComputerArchitecturePerformanceandFloating-PointArithmetic
Instructors:KrsteAsanović &RandyH.Katz
http://inst.eecs.berkeley.edu/~cs61c/fa17
10/24/17 Fall2017-- Lecture#17 1
Outline• DefiningPerformance• Floating-PointStandard• AndinConclusion…
10/24/17 2
WhatisPerformance?• Latency(orresponsetimeorexecutiontime)– Timetocompleteonetask
• Bandwidth(orthroughput)– Taskscompletedperunittime
10/24/17 3
CloudPerformance:WhyApplicationLatencyMatters
• Keyfigureofmerit:applicationresponsiveness– Longerthedelay,thefewertheuserclicks,thelesstheuser
happiness,andthelowertherevenueperuser10/24/17 4
CPUPerformanceFactors• Aprogramexecutesinstructions• CPU Time/Program
= Clock Cycles/Program x Clock Cycle Time= Instructions/Program
x Average Clock Cycles/Instruction x Clock Cycle Time
• 1st termcalledInstructionCount• 2nd termabbreviatedCPIforaverage
ClockCyclesPerInstruction• 3rdtermis1/Clockrate
10/24/17 5
RestatingPerformanceEquation• Time= Seconds
ProgramInstructions Clockcycles SecondsProgram Instruction ClockCycle××=
10/24/17 6
WhatAffectsEachComponent?InstructionCount,CPI,ClockRate
Hardwareorsoftwarecomponent? AffectsWhat?
Algorithm InstructionCount,CPI
ProgrammingLanguage
InstructionCount,CPI
Compiler InstructionCount,CPI
InstructionSet Architecture InstructionCount,ClockRate,CPI
10/24/17 7
PeerInstruction
• Whichcomputerhasthehighestperformanceforagivenprogram?
Computer Clock frequency Clock cyclesperinstruction
#instructionsperprogram
RED 1.0GHz 2 1000
GREEN 2.0GHz 5 800
ORANGE 0.5GHz 1.25 400
5.0GHz 10 2000
10/24/17 8
PeerInstruction
Whichsystemisfaster?
9
System Rate(Task1) Rate(Task2)A 10 20B 20 10
RED:SystemAGREEN:SystemB
ORANGE:Sameperformance:Unanswerablequestion!
10/24/17
WorkloadandBenchmark• Workload: Setofprogramsrunonacomputer– Actualcollectionofapplicationsrunormadefromrealprogramstoapproximatesuchamix
– Specifiesbothprogramsandrelativefrequencies• Benchmark:Programselectedforuseincomparingcomputerperformance– Benchmarksformaworkload– Usuallystandardizedsothatmanyusethem
10/24/17 10
SPEC(SystemPerformanceEvaluationCooperative)• ComputerVendorcooperativeforbenchmarks,startedin1989• SPECCPU2006
– 12IntegerPrograms– 17Floating-PointPrograms
• Oftenturnintonumberwherebiggerisfaster• SPECratio:referenceexecutiontimeonoldreferencecomputer
dividedbyexecutiontimeonnewcomputer,reportedasspeed-up• (SPEC2017justreleased)
10/24/17 11
SPECINT2006onAMDBarcelonaDescription Instruction
Count (B) CPI Clock cycle time (ps)
Execution Time (s)
Reference Time (s)
SPEC-ratio
Interpreted string processing 2,118 0.75 400 637 9,770 15.3Block-sorting compression 2,389 0.85 400 817 9,650 11.8GNU C compiler 1,050 1.72 400 724 8,050 11.1Combinatorial optimization 336 10.0 400 1,345 9,120 6.8Go game 1,658 1.09 400 721 10,490 14.6Search gene sequence 2,783 0.80 400 890 9,330 10.5Chess game 2,176 0.96 400 837 12,100 14.5Quantum computer simulation 1,623 1.61 400 1,047 20,720 19.8Video compression 3,102 0.80 400 993 22,130 22.3Discrete event simulation library 587 2.94 400 690 6,250 9.1Games/path finding 1,082 1.79 400 773 7,020 9.1XML parsing 1,058 2.70 400 1,143 6,900 6.012
SummarizingSPECPerformance• Variesfrom6xto22xfasterthanreferencecomputer
• Geometricmeanofratios:N-th rootofproductofNratios– GeometricMeangivessamerelativeanswernomatterwhatcomputerisusedasreference
• GeometricMeanforBarcelonais11.710/24/17 13
Administrivia
1410/24/17
• Midterm2inoneweek!– GuerrillaSessiontonight7-9pminCory
293– ONEdoublesidedcheatsheet– Staytunedforroomassignments– Reviewsession:Saturday10-12pmin
HearstAnnexA1• Homework4released!
– CachesandFloatingPoint– Dueafterthemidterm– Stillgoodcachepractice!
• Proj2-Part2released– Dueafterthemidterm,butgoodtodo
before!
Outline• DefiningPerformance• Floating-PointStandard• AndinConclusion…
10/24/17 Fall2016– Lecture#17 15
QuickNumberReview• Computersdealwithnumbers• WhatcanwerepresentinNbits?– 2N things,andnomore!Theycouldbe…
• Unsignedintegers:0 to 2N- 1(forN=32,2N– 1 =4,294,967,295)
• SignedIntegers(Two’sComplement)-2(N-1) to 2(N-1)- 1(forN=32,2(N-1)=2,147,483,648)
10/24/17 16
OtherNumbers1. Verylargenumbers? (seconds/millennium)
Þ 31,556,926,00010 (3.155692610 x1010)2. Verysmallnumbers?(Bohrradius)
Þ 0.000000000052917710m(5.2917710 x10-11)3. Numberswithbothinteger&fractionalparts?
Þ 1.5• Firstconsider#3• …oursolutionwillalsohelpwith#1and#210/24/17 17
GoalsforFloatingPoint• Standardarithmeticforreals forallcomputers
– Liketwo’scomplement• Keepasmuchprecisionaspossibleinformats• Helpprogrammerwitherrorsinrealarithmetic
– +∞,-∞,Not-A-Number(NaN),exponentoverflow,exponentunderflow
• Keepencodingthatissomewhatcompatiblewithtwo’scomplement– E.g.,0inFl.Pt.is0intwo’scomplement– Makeitpossibletosortwithoutneedingtodofloating-point
comparison
10/24/17 18
RepresentationofFractions• “BinaryPoint”likedecimalpointsignifiesboundarybetweenintegerandfractionalparts:
xx.yyyy21 20 2-1 2-2 2-3 2-4
Example6-bitrepresentation:10.1010two = 1x21 + 1x2-1 + 1x2-3 = 2.625ten
Ifweassume“fixedbinarypoint”,rangeof6-bitrepresentationswiththisformat: 0to3.9375(almost4)
10/24/17 19
FractionalPowersof2
10/24/17 20
0 1.0 11 0.5 1/22 0.25 1/43 0.125 1/84 0.0625 1/165 0.03125 1/326 0.0156257 0.00781258 0.003906259 0.00195312510 0.000976562511 0.0004882812512 0.00024414062513 0.000122070312514 0.0000610351562515 0.000030517578125
i 2-i
RepresentationofFractionswithFixedPt.
Whataboutadditionandmultiplication?Additionisstraightforward:
01.100 1.5ten+ 00.100 0.5ten
10.000 2.0ten
Multiplicationabitmorecomplex:
01.100 1.5ten00.100 0.5ten 00 000
000 000110 0
0000000000
0000110000
Where’stheanswer,0.11?(e.g.,0.5+0.25;Needtorememberwherepointis!)
10/24/17 21
RepresentationofFractionsOurexamplesuseda“fixed”binarypoint.Whatwereallywantisto“float”thebinarypointtomakemosteffecitve useoflimitedbits
… 000000.001010100000…
Anyothersolutionwouldloseaccuracy!
example: put0.1640625ten intobinary.Representwith5-bitschoosingwheretoputthebinarypoint.
Storethesebitsandkeeptrackofthebinarypoint2placestotheleftoftheMSB
• Withfloating-pointrepresentation,eachnumeralcarriesanexponentfieldrecordingthewhereaboutsofitsbinarypoint
• Binarypointcanbeoutside thestoredbits,soverylargeandsmallnumberscanberepresented
10/24/17 22
ScientificNotation(inDecimal)
• Normalizedform:noleadings0s(exactlyonedigittoleftofdecimalpoint)
• Alternativestorepresenting1/1,000,000,000– Normalized: 1.0x10-9
– Notnormalized: 0.1x10-8,10.0x10-10
6.02ten x1023
radix(base)decimalpoint
mantissa exponent
10/24/17 23
ScientificNotation(inBinary)
• Computerarithmeticthatsupportsitiscalledfloatingpoint,becauseitrepresentsnumberswherethebinarypointisnotfixed,asitisforintegers– DeclaresuchvariableinCasfloat
• double fordoubleprecision
1.01two x2-1
radix(base)“binarypoint”
exponentmantissa
10/24/17 24
Floating-PointRepresentation(1/4)• 32-bitwordhas232 patterns,somustbeapproximationofrealnumbers≥1.0,<2
• IEEE754Floating-PointStandard:– 1bitforsign(s)offloatingpointnumber– 8bitsforexponent(E)– 23bitsforfraction(F)(get1extrabitofprecisionifleading1isimplicit)
(-1)s x (1+F)x 2E• Canrepresentfrom2.0x 10-38to2.0x 1038
10/24/17 25
Floating-PointRepresentation(2/4)• Normalformat:+1.xxx…xtwo*2yyy…ytwo• MultipleofWordSize(32bits)
031S Exponent
30 23 22Significand
1bit 8bits 23bits
• S representsSign• Exponent representsy’s• Significand representsx’s• Representnumbersassmallas2.0ten x10-38 toaslargeas2.0ten x1038
10/24/17 26
Floating-PointRepresentation(3/4)• Whatifresulttoolarge?
(>2.0x1038 ,<-2.0x1038 )– Overflow!Þ Exponentlargerthanrepresentedin8-bitExponentfield
• Whatifresulttoosmall?(>0&<2.0x10-38 ,<0&>-2.0x10-38 )– Underflow!Þ Negativeexponentlargerthanrepresentedin8-bitExponentfield
• Whatwouldhelpreducechancesofoverflowand/orunderflow?0 2x10-38 2x10381-1 -2x10-38-2x1038
underflow overflowoverflow
10/24/17 27
Floating-PointRepresentation(4/4)• Whataboutbiggerorsmallernumbers?• IEEE754Floating-PointStandard:
DoublePrecision(64bits)– 1bitforsign(s)offloating-pointnumber– 11bitsforexponent(E)– 52bitsforfraction(F)
(get1extrabitofprecisionifleading1isimplicit)(-1)s x (1+F)x 2E• Canrepresentfrom2.0x 10-308to2.0x 10308• 32-bitformatcalledSinglePrecision
10/24/17 28
WhichisLess?(i.e.,closerto-∞)
• 0 vs.1x10-127?
• 1x 10-126 vs.1x10-127?
• -1x10-127 vs.0?
• -1x 10-126 vs.-1x10-127?10/24/17 29
FloatingPoint:RepresentingVerySmallNumbers• Zero:Bitpatternofall0sisencodingfor0.000
Þ But0inexponentshouldmeanmostnegativeexponent(want0tobenexttosmallestreal)
Þ Can’tusetwo’scomplement(10000000two)
• Biasnotation:subtractbiasfromexponent– Singleprecisionusesbiasof127;DPuses1023
• 0uses00000000two =>0-127=-127;∞,NaN uses11111111two =>255-127=+128– SmallestSPrealcanrepresent:1.00…00x 2-126
– LargestSPrealcanrepresent:1.11…11x2+12710/24/17 30
IEEE754Floating-PointStandard(1/3)SinglePrecision(DoublePrecisionsimilar):
• Signbit: 1meansnegative0meanspositive
• Significand insign-magnitudeformat(not2’scomplement)– Topackmorebits,leading1implicitfornormalizednumbers– 1+23bitssingle,1+52bitsdouble– alwaystrue:0<Significand <1(fornormalizednumbers)
• Note:0hasnoleading1,soreserveexponentvalue0justfornumber0
031S Exponent30 23 22
Significand1bit 8bits 23bits
10/24/17 31
IEEE754Floating-PointStandard(2/3)• IEEE754uses“biasedexponent” representation– DesignerswantedFPnumberstobeusedevenifnoFPhardware;e.g.,sortrecordswithFPnumbersusingintegercompares
– Wantedbigger(integer)exponentfieldtorepresentbiggernumbers
– 2’scomplementposesaproblem(becausenegativenumberslookbigger)• Usejustmagnitudeandoffsetbyhalftherange
10/24/17 32
IEEE754Floating-PointStandard(3/3)
• Summary(singleprecision):
• CalledBiasedNotation,wherebiasisnumbersubtractedtogetfinalnumber− IEEE754usesbiasof127forsingleprecision− Subtract127fromExponentfieldtogetactualexponentvalue
031S Exponent
30 23 22Significand
1bit 8bits 23bits
• (-1)S x(1+Significand)x2(Exponent-127)•Doubleprecisionidentical,exceptwithexponentbiasof1023(half,quadsimilar)
10/24/17 33
BiasNotation(+127)
∞,NaN
Zero
Gettingclosertozero
HowitisencodedHowitisinterpreted
10/24/17 34
PeerInstruction• GuessthisFloatingPointnumber:11000000010000000000000000000000• RED: -1x2128
• GREEN:+1x2-128
• ORANGE:-1x21
-1.5x21
3510/24/17 35
MoreFloatingPoint• Whatabout0?
– Bitpatternall0smeans0,sonoimplicitleading1• Whatifdivide1by0?
– Cangetinfinitysymbols+∞,-∞– Signbit0or1,largestexponent(all1s),0infraction
• Whatifdosomethingstupid?(∞– ∞,0÷ 0)– CangetspecialsymbolsNaN for“Not-a-Number”– Signbit0or1,largestexponent(all1s),notzeroinfraction
• Whatifresultistoobig?(2x10308x 2x102)– Getoverflowinexponent,alertprogrammer!
• Whatifresultistoosmall?(2x10-308÷ 2x102)– Getunderflowinexponent,alertprogrammer!
10/24/17 36
Representationfor± ∞• InFP,divideby0shouldproduce± ∞,notoverflow•Why?– OKtodofurthercomputationswith∞E.g.,X/0>Ymaybeavalidcomparison
• IEEE754represents± ∞– Mostpositiveexponentreservedfor∞– Significand allzeroes
10/24/17 37
Representationfor0• Represent0?– Exponentallzeroes– Significand allzeroes–Whataboutsign?Bothcasesvalid+0: 0 00000000 00000000000000000000000-0: 1 00000000 00000000000000000000000
10/24/17 38
SpecialNumbers• Whathavewedefinedsofar?(SinglePrecision)
Exponent Significand Object0 0 00 nonzero ???1-254 anything +/- fl.pt.#255 0 +/- ∞255 nonzero ???
10/24/17 39
RepresentationforNot-a-Number• WhatdoIgetifIcalculate sqrt(-4.0)or0/0?
– If∞notanerror,theseshouldn’tbeeither– CalledNota Number(NaN)– Exponent=255,Significand nonzero
• Whyisthisuseful?– HopeNaNs helpwithdebugging?– Theycontaminate:op(NaN,X)=NaN– Canusethesignificand toidentifywhich!(e.g.,quietNaNs andsignalingNaNs)
10/24/17 40
RepresentationforDenorms(1/2)• Problem:There’sagapamongrepresentableFPnumbers
around0– Smallestrepresentablepositivenumber:
a=1.0…2 *2-126 =2-126– Secondsmallestrepresentablepositivenumber:
b =1.000……12 *2-126=(1+0.00…12)*2-126=(1+2-23)*2-126=2-126 +2-149
a- 0=2-126b- a=2-149 b
a0 +-
Gaps!
Normalization and implicit 1is to blame!
10/24/17 41
RepresentationforDenorms(2/2)• Solution:
− Westillhaven’tusedExponent=0,Significand nonzero− DEnormalized number:no(implied)leading1,implicit
exponent=-126− Smallestrepresentablepositivenumber:
a=2-149 (i.e.,2-126x2-23)﹣ Second-smallestrepresentablepositivenumber:
b=2-148 (i.e.,2-126x2-22)
0 +-10/24/17 42
SpecialNumbersSummary• Reserveexponents,significands:
Exponent Significand Object0 0 00 nonzero Denorm1-254 anything +/- fl.pt.number255 0 +/- ∞255 Nonzero NaN
10/24/17 43
SavingBits• Manyapplicationsinmachinelearning,graphics,signal
processingcanmakedowithlowerprecision• IEEE“half-precision”or“FP16”uses16bitsofstorage
– 1signbit– 5exponentbits(exponentbiasof15)– 10significand bits
• Microsoft“BrainWave”FPGAneuralnetcomputerusesproprietary8-bitand9-bitfloating-pointformats
44
Outline• DefiningPerformance• FloatingPointStandard• AndinConclusion…
10/24/17 45
AndInConclusion,…• Time(seconds/program)ismeasureofperformance
Instructions Clockcycles SecondsProgram Instruction ClockCycle
• Floating-pointrepresentationsholdapproximationsofrealnumbersinafinitenumberofbits- IEEE754Floating-PointStandardismostwidelyacceptedattemptto
standardizeinterpretationofsuchnumbers(Everydesktoporservercomputersoldsince~1997followstheseconventions)
- SinglePrecision:
××=
10/24/17 46
031S Exponent
30 23 22Significand
1bit 8bits 23bits