chomes.ieu.edu.tr/skumova/makaleler/E11.pdf · ICAST is supported by GRASIUS Project at...

6
Jl [ f." l 0 l o , . 1/ R I - .. o II' HI! __ 0 ICAST is supported by GRASIUS Project at Graduate School Action Scheme for In of The 4th International student Conference on Advanced Science and Technology May 25 -26, 20.1 0 Ege University, Izmir, Turkey ICAST 2010 Izmir will focus on the following fields. Mathematics and Applied Mathematics Physics and Applied Physics Chemistry, Applied Chemistry and Chemical Engineering Biology and Life Science Geology Materials Science Computer, Electrical and Electronics Engineering Civil Engineering, Energy and Environment Industrial Applications http://icast201 O. ege edu tr 0 0 f. 1 \c. r I "" - -i; 1- GEge University I c: Kumamoto University If L It ..... :'. ;, ",

Transcript of chomes.ieu.edu.tr/skumova/makaleler/E11.pdf · ICAST is supported by GRASIUS Project at...

Jl[ f." l0

lo , .

1/ RI -:;~.::;. o,~,---: -.. ,:~e" : l o II' " ~ ' v

HI! • __ 0

ICAST is supported by GRASIUS Project at G3S~S~;T~;Oltf~'~~I::~~~:~:r~~~~Graduate School Action Scheme for In of

The 4th International student Conference on Advanced Science and Technology

May 25 -26, 20.1 0 Ege University, Izmir, Turkey

ICAST 2010 Izmir will focus on the following fields. Mathematics and Applied Mathematics Physics and Applied Physics Chemistry, Applied Chemistry and Chemical Engineering Biology and Life Science Geology Materials Science Computer, Electrical and Electronics Engineering Civil Engineering, Energy and Environment Industrial Applications

http://icast201 O.ege edu tr0 0

~ f.

1 \c.

r ~ I~

""- -i;1 ­

GEge University I c:Kumamoto University

If L

It ..... :'. ;,",

ICAST 2010 Izmir

The Fourth International Student Conference

on Advanced Science and Technology

May 25-26, 2010 Ege University

Program & Papers

Organizer Graduate School of Science and Technology

Kumamoto University Japan

In Associaton with Ege University

Turkey

leAST·2010 is a part of the GRASIUS (Graduate School Act ion Scheme for Internationalization of Univers ity Students) Project of

Graduate School of Science and Technology, Kumamoto University, supported by the Grant-in-aid from MEXT

to promote innovative eduation at graduate schools.

Members of Committees for ICAST 2010 Izmir

Organizing Commitee

Chairs

Taniguchi Isao, President of Kumamoto University &

Candeger Y,lmaz, Rector of Ege University

Co-Chairs

Tadao Nishiyama, Dean of GSST, Kumamoto University Atilla Silkii, Vice-Rector, Ege University

Nalan Kabay, Ege University Mitsuyo Kishida, Director of GJEC International Division of GSST

Members

Kumamoto U Diversity Ege Universitv

Kazuki Takashima Siler Ana~ Jun Otani Azmi Telefoncu Takashi Hiyama RabeDo Kuryel Akira Yoshiasa Nadide Kazanci

M.Biilcnt Ozkan ismail Turkan Semih 6tle~ Levent Balliee ~enay .saDlier Radosveta Sokullu Engin Karatepe Ninel Alver Yah;m Alver Figen inan

Programme Commitee

Kumamoto U niversitv

Tadao Nishiyama Mitsuyo Kishida Jun Otani Kazuki Takashima Takashi Hiyama Akira Yoshiasa

Ege University

Atilla Silku Suer Ana~ Nalan Kaba y ismail Turkan Semih Otle! Levent Ballice Senay ~anher YusufOzbel is met Gtirhan Saim Selvi Zekerya Dursun Radosveta Sokullu [ngin Karatepe Ninel Alver Yah;m Alver Sinan Gungor

External Scientific Committe

Mchtap Yuksel Egrilmez, Dokuz EylUl University Ugur Unal, Kot; University

a. KuHu-Oral, Sabancl University Murat Saglam, Ludwig-Maximilians-Univenitat

Neval Vllmaz, Kumamoto -University

Members of Project Support Section of GSST, Kumamoto University

Yuki Noguchi M iho Shinmoto

Web & Publication Team

Y. Alver M.M.Mutiu S.Gungor M.Arda I.tpek T.Madeno~lu a.Arar D.Giikkaya A.Akka! P.Koseoglu F.U~ranh S.Kutiu

Registration Desk .& Poster-Social Commitee Members

M. Arda a.Arar I. ipek T. Madenoglu A.Akka! K.V.Ozdokur P.Koseoglu D. Giikkaya A.Ona~ D.Yaplcl F.Ugranh

ICAST 2010 Izmir

Application and Performance Analysis of Cache Effected Merge Sort Algorithms

Fatih Tckbacak*·' . II kcr Korkmaz1• O.-hao Dagdcvircn ' :.lIld Senem Kumov[l Mct;n~

'Dept. ofCamp. Engineering. Izm ir Institute of Technology 2F(lcully of Eng. and Compo Scicncc~, 1':'.111 ir University of Econom ics

Ab.,'lra.('/- One orlhe most ~ignifici\ llt ("CIO.-; Ihat ~nCCI run lil)1C ,)1' n program i, the cnche bclmvior. P.spcci: 'lIy. the cfficicill LIS'"' "I\,,,d,,,, in the progmms that rrocc~s huge d~!(\ with nn iter"I;"" manner is very imponmn. In this p"pcr, our objec tive is 10 mC,c_urc the cache crfectiveness or sorting algorithll\'; , To le.•1 wrling ;Ilgori thms. n ~\l i l:J.bl c cache conliguralion is o.::lc rlnincll whi k using cache clTccll ....1 merge son algorithms ill V3 1~!ld simulmvr. OUT mOlivminn is 10 inVl!SlisalC algcrilhm pcrfon ll;l)lC ..'; by e~ l)C'Tic llcing and cOltlf1;. nng Ihem on Lcyel-I and Lcvd -1 ~~chc~.

11II1t:.t Tenlls- mcrgc sort. cache df~'(:\ivcl1c~"" ~imu l ati (ln.

I. INTRODUC11ON

SORTING is a very common operntion in compuier science. MoSI familiar exnmple i~ binary search to find key dala

within 11 sorted list While the run lin\\: complexity of binary search algorithm is O(fognj, the rundom search algorithms in a list hns O(n) like complexities. If ~arl; h operat ion is the primM)' concern, sorlcd list will be no.'Ccssity. I-Iowever. soning opcmlion has also storage cost. Allhough the fin,,1 considered COSI is generally complc)( ity. the implementation of a SOl1ing algorilhm can be based on a few different paradigms. depending on thc system. One of thc impor1a!i.t parnmeters in algori thms nmning with huge data set i~ e~pccial1y cache usage. In thi s study, our aim i~ to run cache cffe cted solulion~ for merge sorl algorithms on n Penlium 4 :m,:hitecture nnd compare their performances 011 cash u~c.

[1. RELATED WORK

Son ing algorithms haye a great cOCct on nnuimc in Iheir operations. TIlcrcforc, qualifying th ..-sc a lgorilhJ)ls h;!s [In effcct that mostly decreases run lime. While olle of Ihe lIIost effective ways of decreasing run lim~ is 10 dccrc:lSC the instruc tion number. primary concern of 111 0~t rescar(:hers is to rearrange algorithms in such a way \hlll reduccs inslruction count tradit ionally. Bul previous resean::he$ ~ howcd that decreasing instruction count doc~n't supply enough improvement when it hus a positive effect On n1l1 II(lIC.

Differently frol11 this traditional ;!pproaeh, Ihe real ity of cache usage in sOftin£, algorithms was slatcu first ly in LaMarca and L:idner (I) hased QrI their mcrgcsort , liled mcrg.eson, and multi mcrgeson implcmenlalions. Xiao. Zhang, and Kubrichl (21 contributed to tbe litcr.llure with Iheir liled mcrgcson wilh padding and multi mcrgesOr1 with TLB padding 31gorilhms. An a lgorithm C(lnsidcring cache properties ;lnd c;'Iehe size will reduce memory aecC!i~ rale or cache miss rate. Basic mergcsor1

i$ a comparalive sort algorithm with O(Jlfo!::lI) complexity and J es igncd as dh'ide (llId t:IJl1qlll.!I ' parJdigm. Tiled mcrgesor' puts initial ,bta 10 two di ITerent $uh array~ and sort these llfr;tyS bctwecn each other. MIIlti mergcwn. difTerent from lik'J IIlcl'geSOn. mcrges all sub arrJy~ in one operalion. Tiled mCl"£.esofl wilh raddin.g organizes d;'lla 10~'alions and aims to ~kcr'·as~- Ihe collision misl> rJtc. Mllhi mC"rgeson wilh TLIJ padding I\.u a gOltl to reducc TLB mi)\~~ created by multi mcrge~ort ;'Ilgorithm. These algorithms e;'lll be ~ illllliated by Vn lgri nd Pl. n progrnm designed to Illanngc the code management and thrend dcfcclS uUlom:l licully. and Cach.;grind, n Vnlgrind toulthllt find s the miss ralcs regnrding \0 prognull simuluting Level-Inlld Level·2 cill.;hcs.

III . Exr r; RIMr; NTS

A. Mellll>('/

In OTlleT to IInderstand the eachc pc tfonl1:lIlces on sorting algorithm'>, Ihe b :l.~ ic vnri:ltions of merge b.Hed algori thms liS b;.se mcrgcsorl. liled mergc:sort , multi lllCrgcsol', nnd tiled m(!(gl~orl with padding, arc examincd in Valgrind simllintor. In our tests, two-Ievcl cache structure of Northwood core Pentium 4 proccssor architecture [4J i.i ~im lll:!I cd . Level-I enchc is configured us 4-way associative wilh a 10lal capacity of X K. which hns a line size of 64 Gytes: Lcvd-2 cache is lI~ cd as R.way aSS(lciative with a capacity of 256 K. and wit h the line size of 128 Bytes. Random data scts 0 1" I K to 4096 K arc IIscd as inl)UIS 10 Ihe implemellt :J tions. E<leh din\: relll experimcnt is eontlucll-d for 5 times and the avcrnfle re.<;uhs :lte notco ~g<lrd ing to aceurney.

The d3t ~ miss ral~ of Level-I and Lcyel-2 coehe~ arc compared for each a lgorithm to undCr>tand the cache pcrfon1mnccs. Moreover. the running till"les of thc illlplcmelltation~ 3re measured. Although all lIlergesort implementations have 11 time complexity of O(lIfogll). Ihe more miss in the las t level of a cache the more access to thc memory, and so thc Illorc timc actually.

B. SiIJlllfoliol/ RI!.'·IIf1.~ (lml f1nflla/ioll

All misses of dat :l in bolh levels of C:lches a re measured since :my miss in data cache is important on Ihe perform:lllcc. In Ihe cxpcrimenlS. the loml miss number on a ll soninS process of ao input dala is measured and the miss ra1e is calculated as the nllmber of misses per one data elcmcll1. The miss mtcs in Le\le l ~ 1 ;'Ind Lcvcl-2 dnla caches on ditferent sizes of input arc depicted in rig. I nnd Fig. 2 rc~peetively .

241

leAST 20 I0 Izmir

The ll1i s~cs in Ihe instruction cache arc not under debate in Ihi s ~ llldy .

9

't--- --_ r-r-----,r ----+- Mergesort

7+-- - --+_I

-a-Tiled MergesOl'i

_ Mull Me.-O!sort

- x - Tiled Mergesort

O+-____~~~r_r-, L__~~~~~~~P~

~ iT ,oJ- J-.f.#"1''' Data Si2e (bit)

Fie L Lcvcl-I cac he mi,s mle." (m,mbl.. o f ",i,>('S I""r ,)ne ckm~nl ) ,)f diff"",,,, , ;ZO, or ;nru, data (or di ll "",n ' Incr!!" "',., ulgorid,,,,,,.

3

0,5

_ h4ergeSort

___ Tiled

Mergesort

- x­ ned Mergesort with F'addir.g

Olla Sire (bit)

~Ig. 2. ....,.,·d-2 ""cloc 1111)$ r:lIes (n,,,'lbo.'. (If m;'~ fO!r " nC d rolCnlj <If di fTcrCJII siz""' ...f inpUi d~la for difT",..,.., "'~<'....' <on algori,hnh .

As seen in Fig. I, mulli mcrgcsorl .a lgo rithm hOl s a beneT perrormance for d :l13 s i'lCS I (,~s Ihan I M. The reason is 1ha1 i1 gains the advantage of Icmporni localily due 10 the usc of OJ

priority quelle utility stl'\lcwrc. "Iowc\'cr, liled mcrgcson is morc Sl;a l:Jblc. On the other side, an )' miss on Level- J cache leads to the acccss to Leve l·2 ~tlche, which i~ slower bUI larger than Level· I. As in Fig. 2. multi mergl!sort overcomes Ihe LevC!-2 miss possibilities wi lh beuer miss rat L~ .

Fig. 3 shows the running t ime~ <in ~eeonds) of the algorilhl118. Although Ihe)' h3Y1! similar results for inputs less than 256 K, l11ulti mergesort has Ih l! wors t time when data enlarges 256 K threshold. The re:1son is the high IOlal miss rates (Level- l plus Level-2) of multi mergcsort (llgorithm lor input si;:o:c largcr Ihan 256 K. Any mi~s means an access to thc other storage area in :1 slower level. and so any miss leads to extra time in the process. There rore, according to the input

dal:1 s izc. IIlul li o r tiled varimions of merge sort algorithm lIlay be elected as till! winncrs ofperfornmoce results

400,00

350,00

300,00

250 ,00 ~

~ 200,00

150,00

100,00

50,00

0 ,00

~ .#.f 1'''

.t,-

j/X "T X "T X"TX~ ~ T"""""""r""""""

Oatil Size (bit)

F;&. ~. Ruom;" c Ii""", of dilT"""" ~ ,>vo- k'"d cache ~.chilcc\mc . inlt.bliOJl1>.

-+- Mer!Jesori

___ Tiled

Mergesort

........... Mufti Mergesort

-X- Tiled Mergesort m h POOdlrlg

sun implcmcIIL1tious (10' Ihe $.1 'n~

IV . DISCUSSION ANIl FuTUHt:: WonK

In this pa per. lhe e lTect of perfoormnce inc rement for act ive merge sort algorithms, which previolls ly dCI:li lc<l in (I), and [2). arc simulalcti within Valg rind [3 ) platfonn . The experiment results arc presented. As dcpic lcd in Fig. 3, Ihe best rUllIime performance was obtained by liled mergeson wilh p:ldding.

Cache design parameters include eachc si;:o:c. line size. assoeativity, and multi·level structure. According to these arguments, the quicksort algorithm has also to be covered. Different input data distributions C31l cU LI se diITcrcnt miss mtcs. For ex:ample, tiled mergesort has better performance Ihan multi mergesort execpt poisson distribut ion, We ~Ould not indieatc stich ex:perimental results here due to the Inek o f space in Ihis paper. As a future plan, different distribut ions may be examined for quicksort algorithm 3 111.1 the resull s can be eQmparcd wilh the ones pointed in our work . Quicksort needs an analys is for cache parntnctcrs while we know that quicksort h ~ls a beltcr pr.lct ical running lime Ihan mergc-son .

V. ACKNOWLIZOGM ENT

The :Iuthors grnle liJlly acknowledge the co ntributions of Prof. Dr. Mehme l Emili Dalkl h~ fo r his adviec On Silidying the c:lch(." performance subjeel .

V I. Rr:Fr;Rf.Ncr;s

I I I J\. \..:IM3'C""~ :md R. I:. Ladno:r. " The in nu.·.....-.; ,,( "';lCh... un Ih~

p<rfonn:l!lc~ of <oninc~. Jo,...,,,,, "I Alr:urifltm.< .11(1) (1999). pp. 66­,<»

12) l. Xi"o. X. Zll~n~, ",Id S. J\. KIl~ ';chl. ~ )mrro";ng memory pc,fmmanc" of ~orting n lgOtil h ln~~. ACM .J",,,,,,,/ 01' r:.T/'~ril/l~",,,1

A/g"mlm,in j(.I) (2000). PIl. 1·22. [:t ) http://v.I...,.inJ.org! (4) 11ltp:llcn. wikiptxJi" .org!wiki.ll·~n\ium_4~Norlllwoo.L

242