Datawarehouse Document

download Datawarehouse Document

of 74

Transcript of Datawarehouse Document

  • 7/31/2019 Datawarehouse Document

    1/74

    Kho d liu v cngngh OLAP: TngQuanGVLT: PSG.TS L Hoi BcSV: Nguyn Th An Nhn0712023

    2010

    Khoa Cng ngh thng tin H KHTN TPHCM6/1/2010

  • 7/31/2019 Datawarehouse Document

    2/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 2

    KHAI THC D LIU V NG DNG

    LP CNTN07, NM HC 2009-2010

    BO CO CH

    TNG QUAN V KHO D LIU V CNG NGH OLAP

  • 7/31/2019 Datawarehouse Document

    3/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 3

    MC LC

    1. KHO D LIU L G?...................... .............................................................. 51.1 KHC BIT GIA H C S D LIU GIAO DCH V KHO D LIU ...8

    1.2 TI SAO CN C KHO D LIU RING BIT ................................ 10

    2. M HNH D LIU A CHIU ............................................................ 12

    2.1 T BNG D LIU V SPREADSHEET TI KHI D LIU ... 12

    2.2 STARS, SNOWFLAKES, FACT CONSTELLATIONS: LC CHO C S D LIU A CHIU ............................................................... 17

    2.3 V D NH NGHA LC SAO, LC BNG TUYTV LC CHM SAO ........................................................................... 21

    2.4 O: PHN LOI V TNH TON ............................................. 24

    2.5 PHN CP KHI NIM ...................................................................... 272.6 CC PHP X L TRN OLAP TRONG M HNH D LIU ACHIU............................................................................................................... 30

    2.7 M HNH STARNET DNG CHO TRUY VN D LIU ACHIU............................................................................................................... 35

    3. KIN TRC KHO D LIU . ............................................................ 36

    4. CI T KHO D LIU. ......................................................... 50

    5. T VIC LU TR D LIU TRONG KHO D LIU TI KHAITHC D LIU .......................................... 62

    6. TM TT.70

  • 7/31/2019 Datawarehouse Document

    4/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 4

    GII THIU

    Kho d liu: tng qut ha v tng hp d liu vo mt khng gian achiu. Vic xy dng kho d liu lin quan ti vic lm sch d liu, chuyn id liu v c th c xem nh mt bc tin x l quan trng cho vic khaithc d liu. Hn na, kho d liu cung cp cng c cho qu trnh phn tch trctuyn (OLAP) cho vic phn tch tng tc ca d liu a chiu, to iu kintng qut ha d liu hiu qu v khai thc d liu. Nhiu chc nng khai thcd liu khc, chng hn nh lin kt, phn loi, d bo v phn nhm, c thc tch hp vi hot ng OLAP tng cng khai thc kin thc nhiucp tru tng. Do , cc kho d liu ngy cng tr nn quan trng cho phn tch d liu v phn tch online s cung cp mt nn tng hiu qu cho phn tch d liu. V vy, lu tr d liu trong kho d liu v OLAP to thnhmt bc quan trng trong qu trnh khai ph tri thc. Trong bo co ny strnh by tng quan v kho d liu v cng ngh OLAP. y l ci nhn tngquan cn thit cho hiu bit tng th khai thc d liu v qu trnh pht hin kinthc.

    Cc tc gi nghin cu mt nh ngha d chp nhn ca kho d liu vxem ti sao ngy cng nhiu t chc ang xy dng kho d liu cho vic phntch d liu. c bit, l nghin cu cc khi d liu, m hnh d liu a chiucho kho d liu v OLAP, cng nh cc hot ng trong OLAP roll-up, drill-down, slicing, v dicing. ng thi cng xem xt cc kin trc kho d liu, baogm cc bc thit k v xy dng kho d liu. Tng quan v ci t kho dliu kim tra chin lc ca tnh ton hiu qu khi d liu, ch mc trongOLAP, v truy vn trn OLAP. Cui cng l ci nhn vo vic khai thc online,mt m hnh mnh m tch hp kho d liu v cng ngh OLAP trong khai thcd liu.

  • 7/31/2019 Datawarehouse Document

    5/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 5

    1. KHO D LIU L G?Kho d liu cung cp kin trc v cc cng c cho doanh nghip t chc,

    hiu, v s dng d liu ca h a ra quyt nh chin lc. H thng khod liu l cc cng c c gi tr trong mi trng cnh tranh v pht trin nhanhhin nay. Trong nhng nm va qua, nhiu cng ty b ra hng triu USD vovic xy dng kho d liu ln. Nhiu ngi cm thy rng vi s cnh tranh giatng trong mi ngnh cng nghip, vic lu tr d liu l v kh ti tn trn thtrng gi khch hng bng cch tm hiu nhu cu ca h.

    "Vy, chnh xc kho d liu l g?". Kho d liu c nh ngha bngnhiu cch, rt kh c mt nh ngha chnh xc. Ni mt cch nm na, khod liu l mt c s d liu tn ti ring bit vi c s d liu ca t chc. Hthng kho d liu cho php tch hp vi mt lot cc ng dng. Chng h tr xl thng tin bng cch cung cp mt nn tng vng chc ca c s d liu mangtnh lch s v thng nht phn tch.

    Theo Inmon toWilliam H., mt chuyn gia hng u trong vic xy dngh thng kho d liu, "Mt kho d liu l mt tp hp d liu hng ch ,tch hp, bin th thi gian, t bin ng h tr cho qu trnh a ra quyt nhca doanh nghip". Tm li, Bn t kha: hng ch , tch hp, bin th thigian, t bin ng l bn c im phn bit kho d liu vi cc h thng d liukhc.

    Hng ch : Mt kho d liu c t chc xung quanh cc ch chnh, chng hn nh khch hng, nh cung cp, sn phm, v bn hng. Vtra khi s tp trung vo hot ng hng ngy v x l giao dch ca mt tchc, mt kho d liu tp trung vo m hnh ha v phn tch cc d liu gipcho vic a ra quyt nh. Do , kho d liu thng thng cung cp mt ci

  • 7/31/2019 Datawarehouse Document

    6/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 6

    nhn n gin v ngn gn v cc ch c th bng cch loi tr d liu thai vi qu trnh a ra quyt nh.

    Tch hp: Mt kho d liu thng c xy dng bi vic tch hp nhiungun d liu khc nhau chng hn nh c s d liu quan h, cc tp tin, v hs giao dch trc tuyn. Lm sch d liu v k thut tch hp d liu c pdng m bo tnh nht qun trong vic c t tn, m ha cu trc, o tcc thuc tnh,

    Bin th-thi gian: D liu c lu tr cung cp thng tin mang tnhlch s (E.g., trong 50-10 nm qua). Tt c cc cu trc quan trng trong kho dliu cha, hoc ngm cha mt phn t ca thi gian.

    t bin i: Mt kho d liu lun lun l kho ring bit v mt vt l ivi d liu trong x l giao tc hng ngy. Do vic tch bit ny, mt kho dliu khng yu cu x l giao dch, ph c hi, v c ch kim sot x l ng

    thi. N thng i hi ch c hai hot ng trn d liu l ti d liu v lmmi d liu.

    Tm li, mt kho d liu l mt kho d liu ng nht v ng ngha phcv cho vic a ra quyt nh, cung cp v lu tr cc thng tin m doanhnghip cn a ra quyt nh chin lc. Mt kho d liu cng thng cxem nh l mt kin trc c xy dng bng cch tch hp t nhiu ngun d

    liu khng ng nht.

    Da trn thng tin ny, ta xem lu kho d liu l qu trnh xy dng v sdng kho d liu. Vic xy dng mt kho d liu i hi phi lm sch d liu,tch hp d liu, v hp nht d liu. Vic s dng mt kho d liu thng ihi mt tp hp cc cng ngh h tr a ra quyt nh. iu ny cho phpcng nhn tri thc (V d, nh qun l, nh phn tch, v gim c iu hnh) s

  • 7/31/2019 Datawarehouse Document

    7/74

  • 7/31/2019 Datawarehouse Document

    8/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 8

    1.1 KHC BIT GIAH C S D LIU HOT NGVKHOD LIUBi v hu ht mi ngi u quen thuc vi cc h thng c s d liu

    quan h thng mi nn hiu kho d liu d dng ta s so snh hai loi hthng. Nhim v chnh ca h thng c s d liu hot ng l thc hingiao dch trc tuyn v truy vn. Cc h thng ny c gi l h thng x lhot ng on-line (OLTP). Chng bao gm hu ht cc hot ng hng ngyca mt t chc, chng hn nh thu mua, tn kho, sn xut, ngn hng, ng

    k, v k ton. H thng kho d liu, ngc li, phc v ngi s dng haycng nhn tri thc trong vai tr phn tch d liu v ra quyt nh. H thngnh vy c th t chc v th hin d liu trong nhiu nh dng, thchng vi nhu cu a dng ca ngi s dng khc nhau. Nhng h thng nyc bit l h thng phn tch trc tuyn (OLAP).

    im khc bit ch yu gi OLTP v OLAP c tm gn nh sau:

    Ngi s dng v nh hng ca h thng: Mt h thng OLTP lnh hng khch hng c s dng cho giao dch v x l truy vn bith k, khch hng, v cc chuyn gia cng ngh thng tin. Mt h thngOLAP l nh hng th trng v c s dng cho phn tch d liu bicng nhn tri thc, bao gm c qun l, iu hnh, v cc nh phn tch.

    Ni dung d liu: h thng OLTP qun l d liu mang tnh updatethng thng qu chi tit d dng s dng cho vic ra quyt nh. Mth thng OLAP qun l s lng ln d liu qu kh, cung cp cc tinch cho tng kt v tp hp, v lu tr, qun l thng tin cc cp khc. Nhng tnh nng ny lm cho cc d liu d dng s dng hntrong vic ra quyt nh.

  • 7/31/2019 Datawarehouse Document

    9/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 9

    Thit k d liu: mt h thng OLTP thng s dng m hnh thc thquan h v mt thit k hng ng dng. Mt h thng OLAP thng sdng m hnh sao hoc bng tuyt v thit k d liu hng ch .

    Khung nhn: Mt h thng OLTP thng tp trung ch yu vo d liuhin ti ca tp on hoc ca phng ban, khng ch ti d liu lch shay d liu cc t chc khc nhau. Ngc li, h thng OLAP thngm rng cc phin bn ca lc d liu do qu trnh pht trin ca tchc. H thng OLAP thng lm vic vi thng tin t nhiu t chc

    khc nhau, tng hp t nhiu ngun d liu khc nhau. M hnh truy xut: m hnh truy xut ca h thng OLTP bao gm cc

    giao dch. H thng nh vy yu cu c s kim sot x l ng thi vc ch hi phc. Tuy nhin, truy xut h thng OLAP ch yu l qu trnhc. Bi v hu ht cc kho d liu lu tr d liu lch s hn l d liumang tnh cp nht, mc d c rt nhiu truy vn phc tp.

    Bng 1: So snh gia OLTP v OLAP

    Thuc tnh OLTP OLAPTnh cht X l giao dch X l thng tinnh hng Giao dch Phn tch

    gi dng Th k, DBA, chuyngia c s d liu

    Cng nhn tr thc

    Chc nng Hot ng hng ngy Yu cu thng tin ludi, h tr a ra quytnh

    Thit k DB M hnh ER, hngng dng

    M hnh sao, bngtuyt, hng ch

  • 7/31/2019 Datawarehouse Document

    10/74

  • 7/31/2019 Datawarehouse Document

    11/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 11

    quan n vic tnh ton trn nhm ln ca d liu mc tng hp, v c th yucu vic s dng ca vic t chc d liu, truy xut, v phng php ci t datrn ci nhn a chiu. X l truy vn OLAP trong c s d liu giao dch c thlm chm qu trnh lm vic ca cc x l giao dch.

    Hn na, mt c s d liu giao dch h tr vic x l ng thi canhiu giao dch. ng thi cn phi kim sot x l ng thi v cc c ch phc hi, chng hn nh kha v ghi nht tr x l, bo m tnh nht qunv an ton cho giao dch. Mt truy vn OLAP thng ch c d liu tng ktv tp hp. Nu p dng cc c ch trn cho cc hot ng ca OLAP, c thgy nguy him cho vic thc hin cc giao dch ng thi v do gim ngk thng lng ca mt h thng OLTP.

    Sau cng, vic tch kho d liu vi c s d liu giao dch c da trncc cu trc khc nhau, ni dung, v s dng cc d liu trong hai h thng.

    Vic h tr a ra quyt nh yu cu d liu c tnh lch s, trong khi c s dliu giao dch khng thng duy tr d liu lch s. Trong bi cnh , cc dliu giao dch mc d di do nhng thng xa vi vic a ra quyt nh. Vich tr quyt nh i hi tnh hp nht (chng hn nh tp hp v tng kt) cad liu t cc ngun khng ng nht, kt qu trong d liu c cht lng cao,sch, v tch hp. Ngc li, c s d liu giao dch ch cha d liu chi titcha x l, chng hn nh giao dch, v cn phi c cng c trc khi phntch. Bi v hai h thng cung cp chc nng kh khc nhau v i hi khcnhau v cc loi d liu, nn rt cn thit duy tr 2 loi c s d liu ny ring bit. Tuy nhin, nhiu nh cung cp h thng qun l c s d liu giao dch

  • 7/31/2019 Datawarehouse Document

    12/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 12

    ang bt u ti u ha h thng h tr truy vn OLAP. Nu xu hng nytip tc pht trin, s tch bit gia cc h thng OLTP v OLAP c th s gimi.

    2. M HNH D LIU A CHIUKho d liu v cc cng c OLAP c trin khai trn mt m hnh d

    liu a chiu. M hnh d liu ny xem d liu dng khi. Trong phn ny sgii thiu cch khi d liu m hnh d liu n chiu. Ta cng s tm hiu v khi

    nim phn cp v lm th no chng c th c dng trong thao tc c bnOLAP khai thc nhiu cp .

    2.1 T BNG D LIU V BNG TNH TI KHI D LIU"Mt khi d liu l g?" Mt khi d liu cho php d liu c m hnh

    v c nhn nhiu chiu. N c xc nh bi cc chiu v cc o(facts). Ni chung, chiu l nhng kha cnh hoc thc th m t chc mun lu

    tr. V d, AllElectronics c th to ra kho d liu sales lu gi nhng recordv doanh s bn hng ca ca hng trong cc chiu v time, item, branch, vlocation. Nhng chiu ny cho php ca hng lu vt ca nhng th nh doanhs bn hng hng thng ca cc item, cc branch v cc location.

  • 7/31/2019 Datawarehouse Document

    13/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 13

    Bng 2: Mt th hin 2 chiu ca d liu bn hng cho AllElectronics theo ccchiu time, item ti a im ca chi nhnh l Vancouver.

    Mi chiu c th c mt bng lin kt vi n, c gi l bng chiu(dimension table) m t cc chiu. V d, mt bng chiu cho item c th chacc thuc tnh item_name, brand, v type. Bng chiu c th c ch nh bingi s dng hoc cc chuyn gia, hoc t ng to ra v iu chnh da trncc bng phn phi d liu.

    Mt m hnh d liu a chiu thng c t chc xung quanh mt ch trung tm, v d: sales. Ch ny c i din bi mt bng o (facttable). Facts l s o c dng s. Hy xem chng nh s lng m phn tchcc mi quan h gia cc chiu. V d v cc fact cho kho d liu sales bao gmdollars_sold, units_sold, amount_budgeted. Fact table bao gm tn ca facts,hay o, v kha ch ti cc chiu lin quan ca bng chiu (dimention table).Ta s c c mt hnh nh r rng v cch thc hot ng ca m hnh ny khichng ta nhn vo lc a chiu.

    Mc d ta thng ngh v hnh khi nh cu trc hnh hc 3-D, nhngtrong kho d liu cc khi d liu c th c n-chiu. c c mt s hiu bit tt hn v khi d liu v cc m hnh d liu a chiu, chng ta hy bt

  • 7/31/2019 Datawarehouse Document

    14/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 14

    u bng cch nhn vo mt khi lp phng d liu n gin 2-D thc ra lmt bng th hin doanh s bn hng t AllElectronics. c bit, ta s nhn dliu bn hng AllElectronics cho cc item c bn mi qu ti thnh phVancouver. Nhng d liu ny c hin th trong Bng 3.2. Trong th hin 2-Dny, doanh s bn hng Vancouver c t trong cc chiu time v item. o (fact) y l dollars_sold. By gi, gi s rng ta mun xem cc d liu bnhng vi mt chiu th ba. V d, gi s ta mun xem d liu theo chiu time vitem, cng nh location cho cc thnh ph Chicago, New York, Toronto, v

    Vancouver. Nhng d liu 3-D ny c th hin trong Bng 3.3. Cc d liu 3-D ca Bng 3.3 c th hin nh l mt lot cc bng 2-D. Trn l thuyt, tacng c th th hin d liu di dng ca mt khi d liu 3-D, nh trong Hnh3.1.

    Bng 3

  • 7/31/2019 Datawarehouse Document

    15/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 15

    Gi s by gi ta mun xem d liu sale vi mt chiu th 4 c thmvo chng hn nh supplier. Xem thng tin trong 4-D s tr nn kh khn. Tuyvy, chng ta c th suy ngh mt khi lp phng 4-D nh l mt lot cc hnhkhi 3-D, nh trong hnh 3.2. Nu ta tip tc theo cch ny, ta c th hin th bt k d liu no nh l mt lot khi (n - 1)-D. Cc khi d liu l mt phpm hnh ha cho vic lu tr d liu a chiu. Thc t vic lu tr vt l ca dliu c th khc vi biu din logic ca n. iu quan trng cn nh l d liukhi d liu l n-chiu v khng gii hn d liu n 3-D.

  • 7/31/2019 Datawarehouse Document

    16/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 16

    Cc bng trn th hin d liu cc mc tng hp khc nhau. Trongkho d liu, khi d liu lin h ti cuboid. Cho mt tp hp cc chiu, ta c th

    to ra mt li cc cuboid, mi cuboid th hin d liu cc mc tng hpkhc nhau, hay group-by. Li ca cc cuboid sau c xem nh khi dliu. Hnh 3.3 cho thy mt li ca cc cuboid to thnh mt khi d liu vicc chiu time, item, location, supplier.

    Cc cuboid gi mc tng kt thp nht c gi l cuboid c s. V d,cuboid 4-D trong hnh 3.2 l cuboid c s cho time, item, location, tng kt cho

    tt c supplier. Cuboid 0-D c mc tng kt cao nht, c gi l cuboidapex. Trong v d ca chng ta, y l tng doanh thu bn hng, haydollars_sold, tm tt trn tt c bn chiu. Cc apex cuboid c th hin bi ttc.

  • 7/31/2019 Datawarehouse Document

    17/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 17

    2.2 STARS, SNOWFLAKES, FACT CONSTELLATIONS: LC CHO C S D LIU A CHIUM hnh d liu thc th-quan h thng c s dng trong thit k ca

    c s d liu quan h, ni mt lc c s d liu bao gm mt tp cc thcth v cc mi lin h gia chng. Mt m hnh d liu nh vy ph hp vi xl giao dch trc tuyn. Tuy nhin, mt kho d liu i hi mt lc hngch sc tch to thun tin cho vic phn tch d liu online.

    Hu ht cc m hnh d liu ph bin l mt m hnh a chiu. Nh mtm hnh c th tn ti dng ca mt lc ngi sao, lc bng tuyt, haylc chm sao. Hy xem xt mi mt loi lc .

    Lc sao: l m hnh d liu ph bin nht, trong kho d liu cha(1) mt bng o trung tm ln (fact table) c cha phn ln d liu, khng cd tha d liu, v (2) mt tp hp cc bng nh hn (dimention table), mi

    chiu c 1 bng.V d 3.1.Lc sao. Mt lc cho doanh s bn hng

    AllElectronics c hin th trong hnh 3.4. Sales c xem xt di 4 chiutime, item, brand, v a im. Lc c bng fact trung tm cho sales c chacc key cho mi chiu, cng vi hai o: dollars_sold v units_sold. gimthiu kch thc ca bng fact, ID cho cc chiu do h thng to ra (nh

    time_key, item_key).Ch rng trong lc sao, mi chiu c i din ch bi mt bng,

    v mi bng c cha mt tp cc thuc tnh. V d, bng chiu location cha tpthuc tnh {location key, street, city, province or state, country}. iu ny c thdn ti d tha. V d, "Vancouver" v "Victoria" c c hai thnh ph tnhca ngi Canada ca British Columbia. Thc th cho cc thnh ph nh vy

  • 7/31/2019 Datawarehouse Document

    18/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 18

    trong bng chiu location s to ra s d tha trong s cc thuc tnh province_or_state v country, l, (,Vancouver, British Columbia, Canada)v (...,Victoria, British Columbia,Canada). Hn na, cc thuc tnh trong mt bng chiu c th to thnh hoc l mt h thng phn cp hoc mt li.

    Lc bng tuyt: Lc bng tuyt l mt bin th ca m hnhlc sao, trong mt s bng chiu c bnh thng ha, qua tip tcchia tch cc d liu vo cc bng mi. Lc kt qu hnh thnh mt th

    lc tng t nh mt bng tuyt.

    S khc bit ln gia cc m hnh lc bng tuyt v ngi sao l bngchiu ca m hnh bng tuyt c th c gi trong hnh thc bnh thng ha gim d tha. Bng nh vy l d dng duy tr v tit kim khng gian lutr. Tuy nhin, tit kim khng gian ny l khng ng k so vi mc in

  • 7/31/2019 Datawarehouse Document

    19/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 19

    hnh ca bng fact. Hn na, cu trc bng tuyt c th lm gim hiu qu cacc trnh duyt, v cn thc hin nhiu php kt. Do , hiu nng h thng cth b nh hng. Do , mc d cc lc bng tuyt lm gim s d tha,n khng phi l ph bin nh cc lc sao trong thit k kho d liu.

    V d 3.2Lc Bng tuyt. Mt lc bng tuyt cho doanh s bn hngAllElectronics c cho trong hnh 3.5. y, trong bng fact ging vi bngfact ca sales vi lc sao trong hnh 3.4. S khc bit gia hai lc l

    trong nh ngha ca bng chiu. Bng chiu cho item trong lc sao c bnh thng ha trong lc bng tuyt, kt qu cho ra bng item v bngsupplier mi. V d, bng chiu item by gi cha cc thuc tnhitem_key, itemname, brand, type, v supplier key,trong supplier_keylin kt vi bng chiusupplier, c cha supplier_keyv supplier_type. Tng t, bng chiu calocation trong lc sao c th c a vo 2 bng: location v city. city_key

    trong bng location mi lin kt ti chiu city. Ch rng c th thc hin trn province_or_state v country trong lc bng tuyt trong hnh 3.5 hnh, khimun.

  • 7/31/2019 Datawarehouse Document

    20/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 20

    Lc chm sao: Cc ng dng phc tp c th yu cu nhiu bngfact chia s cc bng chiu. Loi lc ny c th c xem nh l mt tp

    cc sao, v do c gi l galaxy schema hay fact constellation.

    V d 3.3Lc chm sao. Trong hnh 3.6 l mt lc chm sao. Lc ny nh ngha 2 bng fact, sales v shipping. nh ngha ca bng sales gingnh ca lc sao. Bng shipping c 5 chiu, hay 5 key:item_key,time_key, shipper_key, from_ location, vto_locationv 2 o ldollars_cost vunits_shipped . Mt lc chm sao cho php cc bng chiu c chia sgia cc bng fact. V d, bng chiu time, item v location c chia s giacc bng fact ca shipping v sales.

    Trong vic lu tr d liu trong kho c s phn bit gia data warehousev data mart. Mt data warehouse tp hp thng tin t nhiu ch ca t chcnh khch hng, mt hng, bn hng, ti sn, nhn s, nn gii hn ca n ltrn ton tp on. i vi kho d liu, lc chm sao c dng ph bin,

  • 7/31/2019 Datawarehouse Document

    21/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 21

    bi v n c th m hnh ha nhiu ch c lin h vi nhau. Mt data mart chl mt tp con ca data warehouse tp trung vo cc ch c chn, v do gii hn ca n ch l trong 1 phng ban. Trong data mart, lc sao v bngtuyt c dng ph bin hn v cc phng ban thng ch tp trung vo mtch .

    2.3 V D NH NGHA LC SAO, LC BNG TUYTV LC CHM SAO Ngn ng truy vn khai thc d liu c th c dng xc nh cc

    cng vic khai thc d liu. c bit, ta gii thch lm sao nh ngha datawarehouse v data mart trong ngn ng khai thc d liu da trn SQL, gi lDMQL.

    Data warehouse v data mart c th c nh ngha bng cch dng 2ngn ng gc, 1 cho nh ngha khi d liu v 1 cho nh ngha chiu. nhngha khi d liu theo c php sau:

    define cube []:

  • 7/31/2019 Datawarehouse Document

    22/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 22

    nh ngha chiu theo c php sau:

    define dimension as ()Cc v d nh ngha lc sao, bng tuyt v chm sao ca cc v d 3.1 ti3.3 trn dng DMQL.

    V d 3.4.nh ngha lc sao. Lc sao ca v d 3.1 v hnh 3.4 cnh ngha nh sau:

    define cubesales star [time, item, branch, location]:dollars sold = sum(sales in dollars), units sold = count(*)

    define dimensiontime as (time key, day, day of week, month, quarter, year)define dimensionitem as (item key, item name, brand, type, supplier type)define dimensionbranch as (branch key, branch name, branch type)define dimensionlocation as (location key, street, city, province or state,

    country)

    Cudefine cubexc nh khi d liu sales_star c bng fact salesca v d3.1. Cu lnh ny nh ngha cc chiu v 2 o,dollars_sold v units_sold .Khi d liu c 4 chiu time, item, branch, location. Cu lnhdefine dimension dng nh ngha cc chiu.

    V d 3.5nh ngha lc bng tuyt. Lc bng tuyt ca v d 3.2 vhnh 3.5 c nh ngha trong DMQL nh sau:

    define cubesales snowflake [time, item, branch, location]:dollars sold =sum(sales in dollars), units sold =count(*)

    define dimensiontimeas (time key, day, day of week, month, quarter, year)define dimensionitemas (item key, item name, brand, type, supplier

  • 7/31/2019 Datawarehouse Document

    23/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 23

    (supplier key, supplier type))define dimensionbranchas (branch key, branch name, branch type)define dimensionlocationas (location key, street, city

    (city key, city, province or state, country))

    nh ngha ny tng t nh nh ngha lc casales_star (V d3.4), khc l item v location c bnh thng ha. V d, chiu item ca khid liu sales-star c hin thc trong khi d liu sales_snowflake vo 2 bng chiu litemv supplier.Ch l nh ngha chiu supplier c ttrong nh ngha chiuitem.nh ngha supplier trong cch ny ngm to rasupplier_key trong nh ngha bng chiuitem.Tng t, chiulocationcakhi d liu sales_star c hin thc ha trong khi d liu sales_snowflakevo 2 bng chiulocationv city. nh ngha chiu city ttrong nh ngha chiulocation. Theo cch ny,city_keyc ngm to ra trongnh ngha bng chiulocation.

    V d 3.6. nh ngha lc chm sao ca lc trong v d 3.3 v hnh 3.6 bng DMQL nh sau:

    define cubesales [time, item, branch, location]:dollars sold =sum(sales in dollars), units sold =count(*)

    define dimensiontimeas (time key, day, day of week, month, quarter, year)

    define dimensionitemas (item key, item name, brand, type, supplier type)define dimensionbranchas (branch key, branch name, branch type)define dimensionlocationas (location key, street, city, province or state,

    country)

    define cubeshipping [time, item, shipper, from location, to location]:dollars cost =sum(cost in dollars), units shipped =count(*)

  • 7/31/2019 Datawarehouse Document

    24/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 24

    define dimensiontime as timein cubesalesdefine dimensionitem as itemin cubesalesdefine dimensionshipper as (shipper key, shipper name, location as

    locationin cubesales, shipper type)define dimensionfrom location as locationin cubesalesdefine dimensionto locationas locationin cubesales

    Cu lnhdefine cubec dng nh ngha khi d liu ca salesv shipping , tng ng vi 2 bng fact ca lc trong v d 3.3. Ch chiutime, item, location ca khi d liu sales c chia s gia khi sales vshipping. iu ny c th hin chiu time nh sau. Di cu lnhdefinecube cho shipping, cu lnh define dimensiontimeas timein cubesalesc xc nh.

    2.4 O: PHN LOI V TNH TON

    Mt im a chiu trong khi d liu c th c nh ngha bi mt tpca cc cp gi tr chiu, v d . Mt o ca khi d liu l mt hm s c th c nh giti mi im ca khng gian khi d liu. Mt gi tr o ti 1 im c tnhtng ng vi cc gi tr chiu ca im .

    o c th c phn ra thnh 3 loi da vo loi ca hm kt hp:distributive, algebraic, holistic.

    Distributive: Mt hm kt hp l distributive nu n c th c tnhton bng cch phn phi. Gi s d liu c chia ra thnh n tp hp. Tap dng hm ny vo mi tp hp, kt qu cho ra n gi tr. Nu kt qu rtra t n gi tr ny ging vi kt qu khi p dng hm vo ton b tp dliu cha phn chia th hm l hmdistributed . V d cc hm distributed

  • 7/31/2019 Datawarehouse Document

    25/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 25

    l: count(), sum(), min(), max(). Mt o ldistributed nu n c tnht hmdistributed .

    Algebraic: hm kt hp l algebraic nu n c th c tnh ton bi mthm algebraic vi M tham s, mi tham s c tnh bng hmdistributed, v d hm avg() c tnh bng sum()/count(). Tng t hmmin_M() v max_M() l hm kt hp algebraic. Mt o algebraic ctnh bng hm algebraic.

    Holistic: Hm holistic khng c gii hn v kch thc lu tr cn thit

    m t cc thnh phn kt hp. V d hm median(), mode(), rank().Mt o l holistic nu n c tnh bng cc hm holistic.Hu ht cc ng dng khi d liu yu cu tnh ton hiu qu c o

    distributed v algebraic. C nhiu k thut phc v cho vic ny. Nhng, rt kh tnh ton cc o holistic mt cch hiu qu mc d vn c mt s phng php xp x cc o holistic. V d thay v tnh median() mt cch chnh xc

    th ta c th p dng phng trnh trong chng 2 xp x gi tr ca tp dliu ln. Trong nhiu trng hp, cc k thut nh vy rt hiu qu vt quacc kh khn trong vic tnh ton cc o holistic.

    V d: Rt nhiu o ca khi d liu c tnh bng cc php tnh kthp. Trong hnh 3.4, ta thy lc sao cho sales c 2 o dollars_sold vunits_sold. Trong v d 3.4, khi d liu sales_star tng ng vi lc cnh ngha bng cc lnh DMQL. Lm sao cc cu lnh ny kt hp li to khi d liu.Gi s m hnh d liu quan h ca lc AllElectronics:

    time(time key, day, day of week, month, quarter, year)item(item key, item name, brand, type, supplier type)branch(branch key, branch name, branch type)location(location key, street, city, province or state, country)

  • 7/31/2019 Datawarehouse Document

    26/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 26

    sales(time key, item key, branch key, location key, number of units sold, price)

    Cc lnh DMQL trong v d 3.4 c dch sang cu truy vn SQL, to ra khid liu sales_star. y, hm sum() c dng tnh dollars_sold vunits_sold:

    selects.time key, s.item key, s.branch key, s.location key,sum(s.number of units sold _ s.price),sum(s.number of units sold)

    from time t, item i, branch b, location l, sales s,

    where s.time key = t.time keyand s.item key = i.item keyand s.branch key = b.branch keyand s.location key = l.location

    keygroup bys.time key, s.item key, s.branch key, s.location keyKhi d liu to ra trn l cuboid c s ca khi d liu sales_star. N

    cha tt c cc chiu trong nh ngha ca khi d liu. y c join key l key

    lin kt bng fact v bng chiu. Bng fact lin quan ti cuboid c s thngc xem l bng fact c s.

    Bng cch thay i mnh group by, ta c th to ra cuboids khc chokhi d liu sale_star. V d, thay v group by trn s.time, chng ta c th nhmtrnt.month tnh tng cc o ca mi group theo thng. Cng nh vy,loi b cc "group by s.branch_key" s to ra mt cuboid cp cao hn (nisales tng hp cho tt c cc chi nhnh, hn l c chia nh cho mi chinhnh). Gi s chng ta sa i cc truy vn SQL trn bng cch loi b tt cmnh group by. Kt qu cho ra tng ca dollars_sold v tng ca units_soldcho d liu cho. Cuboid 0-chiu ny l apex cuboid ca khi d liusales_star. Bn cnh , cc cuboid khc c th c to ra bng cch p dng php chn v php kt trn cuboid c s, kt qu cho ra li cuboid nh m t phn 2.1. Mi cuboid tng ng vi cc mc tng hp khc nhau ca d liu

  • 7/31/2019 Datawarehouse Document

    27/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 27

    cho. Hu ht cng ngh khi d liu hin ti chuyn o ca d liu achiu sang d liu s. Tuy nhin, cc o c th c p dng vo cc loi dliu khc nhau nh d liu text, d liu multimedia, d liu spatial.

    2.5 PHN CP KHI NIMPhn cp khi nim nh ngha mt chui cc mapping t mt tp cc

    khi nim cp thp n cc khi nim cp cao hn, tng qut hn. Xem mt phn cp khi nim cho mt chiu location. Gi tr city ca location bao gmVancouver, Toronto, New York, v Chicago. Mi city c map ti mt province hay mt state m n thuc v. V d, Vancouver c th c map voBritish Columbia, v Chicago c map vo Illinois. Cc province v state cth li c map n cc country m chng thuc v, nh Canada hay USA.Cc mapping ny hnh thnh nn phn cp khi nim cho chiu location,mapping mt tp cc khi nim cp thp n mt khi nim cp cao hn, tngqut hn. Phn cp khi nim m t trn c th hin bng hnh 3.7.

  • 7/31/2019 Datawarehouse Document

    28/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 28

    Rt nhiu phn cp khi nim c ngm t trong lc d liu. V d,gi s chiu location c m t bng cc thuc tnh number, street, city, province_or_state, zipcode, country. Nhng thuc tnh ny lin quan n th t

  • 7/31/2019 Datawarehouse Document

    29/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 29

    bao gm, hnh thnh mt phn cp khi nim nh street < city < province_or_state < country. Phn cp ny c th hin hnh 3.8(a). Thayvo , cc thuc tnh ca mt chiu c th c t chc vo mt th t khnglin tc, to ra mt li. V d ca th t lin tc tng phn cho chiu time lday

  • 7/31/2019 Datawarehouse Document

    30/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 30

    C th c nhiu hn mt phn cp khi nim cho mt thuc tnh hoc mtchiu, da vo cc im nhn khc nhau ca ngi dng. V d, mt ngi dngc th thch t chc price bng cch nh ngha cc on choinexpensive,moderately_priced , vexpensive.

    Phn cp khi nim c th c cung cp bng tay bi ngi dng hthng, chuyn gia, k s, hay nh phn tch d liu. S pht sinh t ng cacc phn cp khi nim c bn trong chng 2.

    Phn cp khi nim cho php d liu c kim sot nhiu cp khcnhau ca mc tru tng, ta s thy trong phn sau

    2.6 CC PHP X L TRN OLAP TRONG M HNH D LIU ACHIUPhn cp khi nim c tc dng nh th no trong OLAP? Trong m

    hnh a chiu, d liu c t chc vo nhiu chiu, mi chiu bao gm cc cp

    bc khc nhau ca tru tng c nh ngha bi phn cp khi nim. Ccht chc ny cung cp cho ngi dng linh hot nhn d liu di nhiugc khc nhau. Mt s php x l trn khi d liu OLAP tn ti hinthc nhng ci nhn khc nhau ny, cho php truy vn tch hp v phn tch dliu bng tay. Do , OLAP cung cp mi trng thn thin i vi ngi dng phn tch d liu tch hp.

    V d 3.8.Cc php x l OLAP. Nhn vo cc php x l c trng caOLAP cho d liu a chiu. Mi php x l m t di y c th hin tronghnh v 3.10. Ti tm ca hnh v l khi d liu cho sales. Khi d liu chanhiu chiu location, time, item, trong location c tng hp t cc gi trca city, time c tng hp t cc gi tr ca quarters, v item c tng hpt cc item type. h tr cho vic gii thch, ta s xem khi d liu ny nh

  • 7/31/2019 Datawarehouse Document

    31/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 31

    khi d liu trung tm. o l dollars_sold. D liu c xem xt y l chocc thnh ph Chicago, NewYork, Toronto, v Vancouver.

    Roll-up: Php roll-up th hin s tng hp ca cc khi d liu, bng cchtin ln cp cao hn trong phn cp khi nim ca chiu hay lm gim chiu.Hnh 3.10 th hin kt qu ca php roll-up th hin tai khi d liu trung tm bng cch tin ln cp cao hn trong phn cp khi nim cho chiu locationtrong hnh 3.7. Phn cp ny c nh ngha l phn cp c th t lin tchon ton street < city < province_or_state < country. Php roll-up th hin stng hp d liu bng cch tin ln trn phn cp khi nim ca location t cpcity n cp country. Ni cch khc, thay v gom nhm d liu da vo city thkhi d liu kt qu gom d liu bng country.

    Khi roll-up c th hin bng cch gim chiu, 1 hay nhiu hn 1 chiuc loi b ra khi khi d liu. V d, xem xt khi d liu sales bao gm ch

    2 chiu location v time. Roll-up c th c th hin bng cch loi b chiutime, kt qu cho ra l s tng kt ca tt c sales ca location thay v bnglocaton v time.

    Drill-down: Drill-down ngc li vi roll-up. N duyt t d liu t chitit hn n d liu chi tit hn. Drill-down c th c thc hin bng cch lixung trn phn cp khi nim cho mt chiu hay a ra thm o. Hnh 3.10

    hin th kt qu ca mt php drill-down thc hin trn khi d liu trung tm bng cch li xung trn phn cp khi nim cho phn cp khi nim ca time(phn cp khi nim ca time l "day < month < quarter < year"). Drill-up thchin bng cch gim dn phn cp time t cp ca quarter ti cp chi tithn. Kt qu l khi d liu lit k chi tit tng doanh s bn hng mi thngthay v tng hp chng bng quarter.

  • 7/31/2019 Datawarehouse Document

    32/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 32

    Bi v drill-down thm chi tit vo d liu cho, n c th c th hin bng cch thm chiu vo khi d liu. V d, mt php drill-down trn khi dliu trung tm ca hnh 3.10 c th thc hin bng cch thm vo mt chiumi, nh l customer_group.

    Slice v dice: Cc php slice la chn trn mt chiu ca khi d liu, ktqu cho ra mt khi d liu nh hn. Hnh 3.10 cho thy mt php slice, trong d liu sales c la chn t khi d liu trung tm theo chiu time vi tiuch time =Q1. Cc php dice nh ngha mt khi d liu nh hn bng cchthc hin la chn trn hai hay nhiu chiu. Hnh 3.10 cho thy mt php dicetrn khi d liu trung tm vi tiu ch chn trn 3 chiu: (location = Torontoor Vancouver) and (time = Q1or Q2) and (item =home entertainmentor computer).

    Pivot (xoay): Pivot (cn gi l xoay) hiu theo trc quan l quay trc dliu cung cp mt th hin thay th ca d liu. Hnh 3.10 hin th mt php pivot trong trc item v location trong lt ct 2-D c quay. Mt v d khc bao gm c quay trc trong khi d liu 3-D, hay chuyn khi 3-D thnh chuicc mt 2-D.

    Cc php OLAP khc: Mt s h thng OLAP cung cp thm php drill.V d, drill-across thc hin cc truy vn lin quan n nhiu hn 1 bng fact.Php drill-through dng cc tin ch ca SQL chuyn t cp tn cng ca

    khi d liu xung bng d liu quan h ca ngi dng cui. Php x l khcl ranking N mu hng u (top N) hoc N mu cui (bottom N) trong danhsch, cng nh tnh ton thay i trung bnh, t l tng trng, li xut, OLAP cung cp kh nng m hnh ha phn tch, bao gm b my tnh ton chovic rt ra cc t l, cc bin,, v cho vic tnh ton cc o trn cc chiu. N c th thit lp nn vic tng hp v phn cp cc chiu. OLAP cng cung

  • 7/31/2019 Datawarehouse Document

    33/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 33

    cp cc m hnh chc nng d on, phn tch cc xu hng, phn tch thngk. Trong ng cnh ny, mt b my OLAP l mt cng c phn tch d liumnh m.

    SO SNH H THNG OLAP VI C S D LIU THNG K Nhiu c tnh ca h thng OLAP, nh s dng m hnh d liu a chiu

    v phn cp khi nim, s kt hp ca cc o vi cc chiu, v cc php roll-up v drill-down, cng tn ti trong cc h c s d liu thng k (SDB). Mt hc s s liu thng k l h c s d liu c thit k h tr cc ng dng

    thng k. S tng t gia 2 loi h thng t khi c bn lun, ch yu l do skhc nhau v thut ng v min ng dng ca n.

    Tuy nhin h thng OLAP v ht thng SDB c nhng khc bit nh:SDB c xu hng tp trung vo cc ng dng kinh t x hi, OLAP hng ticc ng dng trong doanh nghip. Cc vn b mt ring t c th nh hngti vic cho php hay khng vic hin th cc chi tit cp thp trong phn cp.

    Cui cng, khng ging nh SDB, h thng OLAP c thit k kim sotlng ln d liu hiu qu.

  • 7/31/2019 Datawarehouse Document

    34/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 34

  • 7/31/2019 Datawarehouse Document

    35/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 35

    2.7 M HNH STARNET DNG CHO TRUY VN D LIU ACHIUTruy vn d liu a chiu c th c da trn m hnh starnet. M hnh

    starnet bao gm cc dng xut pht t cng 1 im trung tm, mi dng th hin phn cp khi nim cho mt chiu. Mi cp tru tng trn phn cp cgi l footprint. Chng h tr cho vic thc hin cc php x l trn OLAP nhdrill-down v roll-up

    V d 3.9.Startnet. Mt m hnh truy vn starnet cho kho d liu AllElectronicc hin th trong hnh 3.11. M hnh ny bao gm 4 ng thng, th hin phn cp khi nim ln lt cho cc chiu time, item, location, customer. Ngidng c th roll-up theo chiu time t month n quarter, hoc drill-down theolocation t country xung city. Phn cp khi nim c th c dng tngqut ha d liu bng cch thay th gi tr cp thp ln cp cao hn (vd t day ln year). Hoc chi tit ha d liu bng cch thay th d liu cp cao xung d liu cp thp.

  • 7/31/2019 Datawarehouse Document

    36/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 36

    3. KIN TRC KHO D LIUTrong phn ny ta s bn lun cc vn v kin trc kho d liu. Phn

    3.1 cho mt ti khon tng qut thit k v xy dng kho d liu. Phn 3.2m t kin trc 3 tng ca kho d liu. Phn 3.3 m t cc cng c cho ngidng cui v cc tin ch cho kho d liu. Phn 3.4 m t vic lu tr metadata.Phn 3.5 th hin cc loi khc nhau ca my ch kho d liu trong OLAP.

    3.1 CC BC THIT K V XY DNG KHO D LIUThit k ca Kho d liu: Framework phn tch kinh doanh.

    "Nhng g nh phn tch kinh doanh ly c t kho d liu?" Trc tin, cmt kho d liu c th cung cp mt li th cnh tranh bng vic a ra thngtin v hot ng ca doanh nghip, t c th a ra cc iu chnh ph hp

  • 7/31/2019 Datawarehouse Document

    37/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 37

    chin thng cc i th cnh tranh. Th hai, mt kho d liu c th nng caonng sut kinh doanh v n gip cho vic thu thp thng tin nhanh chng v hiuqu. Th ba, mt kho d liu gip cho vc qun l quan h khch hng bi v ncung cp mt ci nhn nht qun v khch hng v cc mt hng trn tt c ccngnh ngh kinh doanh, tt c cc chi nhnh, v tt c cc th trng. Cui cng,mt kho d liu c th gip gim chi ph bng cch theo di xu hng, m hnh,v cc ngoi l trong thi gian di mt cch nht qun v ng tin cy.

    thit k mt kho d liu hiu qu chng ta cn phi hiu v phn tchnhu cu kinh doanh v xy dng mt khun kh cho n. Vic xy dng mt hthng thng tin ln v phc tp nh vy c xem nh l vic xy dng mt tanh ln v phc tp, m ch s hu, kin trc s, v ngi xy dng c quanim khc nhau. Nhng quan im c kt hp to thnh mt framework th hin quan im t trn xung, nh iu hnh kinh doanh, hoc ch s hu,cng nh t di ln, quan im nh iu khin xy dng, hoc quan im nhthi cng ca ca h thng thng tin.

    Bn ci nhn khc nhau v thit k kho d liu cn c xemm xt: cinhn trn-xung, ci nhn v ngun d liu, ci nhn v kho d liu, ci nhn vtruy vn kinh doanh.

    Ci nhn trn-xung: cho php la chn cc thng tin thch hp cn thit

    cho kho d liu. Thng tin ny p ng nhu cu hin ti v tng lai cadoanh nghip.

    Ci nhn v ngun d liu: a ra thng tin c nm bt, thng tinc lu tr v thng tin c qun l bng h thng giao dch. Thng tinny c th c ti liu ha cc cp khc nhau v chi tit v chnh xc, t cc bng d liu ngun tch hp chng li. Cc ngun d

  • 7/31/2019 Datawarehouse Document

    38/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 38

    liu thng c m hnh ha bng cc k thut m hnh ha d liutruyn thng, nh m hnh thc th-quan h hoc cc cng c CASE.

    Ci nhn v kho d liu: bao gm cc bng fact v cc bng chiu. Ni din cho thng tin c lu tr bn trong kho d liu, bao gm cc sliu tnh tng, cng nh thng tin v ngun, ngy, thi gian ca ngun cung cp cc ng cnh lch s ca thng tin.

    Cui cng, ci nhn v truy vn kinh doanhl mc ch ca kho d liu v tr ca ngi dng cui.

    Xy dng v s dng mt kho d liu l mt nhim v phc tp v n ihi k nng kinh doanh, k nng cng ngh, v k nng qun l chng trnh.V k nng kinh doanh, xy dng mt kho d liu bao gm s hiu bit lm thno h thng lu tr v qun l d liu, lm th no xy dng mt thnh phn chuyn d liu t h thng hot ng sang kho d liu, v lm th no xy dng phn mm cp nht kho d liu. S dng mt kho d liu lin quann s hiu bit v tm quan trng ca d liu, cng nh s hiu bit v chuyncc nhu cu kinh doanh vo cc truy vn c x l bi kho d liu. V k nngcng ngh, cc nh phn tch d liu cn phi hiu lm sao to ra cc nhnxt t thng tin nh lng v rt ra cc kt qu da trn kt lun t cc thngtin lch s trong kho d liu. Cc k nng ny gm c kh nng khm ph ccmu v cc xu hng, d on xu hng da trn thng tin qu kh v tm

    kim cc gii php, chin lc v th hin cc xut da trn cc phn tch .Cui cng, k nng qun l chng trnh lin quan n nhu cu ng dngnhiu cng ngh, nhiu nh cung cp a ra kt qu ng lc v t hiu sutv chi ph.

    QU TRNH THIT K KHO D LIU

  • 7/31/2019 Datawarehouse Document

    39/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 39

    Mt kho d liu c th c xy dng dng phng php trn-xung,di-ln, hoc l s kt hp c 2. Phng php trn-xung bt u vi thit ktng th v lp k hoch. N rt c ch trong trng hp cng ngh vngchc v cc vn ca doanh nghip c xc nh r rng v c hiu kcng. Phng php di-ln bt u t cc th nghim v cc bng mu th. Nrt thch hp trong giai on u ca vic m hnh ha doanh nghip v phttrin cng ngh. N cho php mt t chc tit kim v chi ph v nh gi lich ca cng ngh trc khi a ra quyt nh quan trng. Trong phng php

    kt hp c hai, mt t chc c th p dng tnh c k hoch v chin lc ca phng php trn-xung trong khi vn gi tnh ci t nhanh chng ca phng php di-ln.

    T gc nhn ca k s phn mm, thit k v xy dng kho d liu c thc cc bc: lp k hoch, tm hiu yu cu, phn tch vn , thit k kho dliu, tch hp d liu v kim chng, v cui cng l ci t kho d liu. H phn mm ln c th c pht trin dng 2 phng php: phng php thcnc, phng php xon c lin quan n vic pht sinh nhanh ca h thngchc nng, vi khong thi gian ngn gia cc ln a ra bng release. yc gi l la chn tt cho vic pht trin kho d liu, c bit l data mart, biv thi gian xoay vng ngn, s iu chnh c th c lm nhanh chng, mtthit k mi v cng ngh c th c thc hin theo thi i.

    Tng qut, qu trnh thit k kho d liu bao gm cc bc sau:

    1. Chn mt qu trnh kinh doanh no m hnh ha, v d, t hng,vn chuyn, qun tr ti khon, bn hng. Nu qu trnh kinh doanh mangtnh t chc v lin quan ti tp hp cc i tng phc tp, nn to mt

  • 7/31/2019 Datawarehouse Document

    40/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 40

    kho d liu. Tuy nhin, nu mt qu trnh mang tnh phng ban v tptrung vo phn tch mt qu trnh kinh doanh, nn to mt data mart.

    2. Chn d liu nn tng v c cp nguyn t ca qu trnh kinh doanh th hin trong bng fact cho qu trnh ny, v d cc giao dch, cc muthng tin hng ngy,

    3. Chn cc chiu s ng dng vo cc record ca bng fact. Cc chiu ctrng l thi gian, mt hng, khch hng, nh cung cp, kho d liu, loigiao dch, tnh trng.

    4. Chn cc o a vo cc bng fact. Cc o thng l dng sdollars_sold, units_sold.

    Bi v vic xy dng kho d liu l mt nhim v kh khn v lu di, phm vi thc hin phi c xc nh r rng. Cc mc tiu ca mt kho d liu ban u phi c th, kh thi, v o lng c. iu ny lin quan n vic xcnh thi gian v phn b ngn sch, cc tp con ca t chc , s lng ccngun d liu c la chn, v s lng v loi ca cc phng ban c phcv.

    Mt khi mt kho d liu c thit k v xy dng, vic trin khai banu ca kho bao gm ci t ban u, lp k hoch, o to, v nh hng.Vic nng cp v bo tr cng phi c xem xt. Qun tr kho d liu bao gmupdate d liu, ng b ha ngun d liu, lp k hoch cho vic hi phc dliu, qun l, kim sot truy cp v bo mt, qun l vic tng trng d liu,qun l hiu sut c s d liu, v nng cp m rng kho d liu. Lnh vc qunl bao gm vic kim sot s lng v phm vi ca cu truy vn, chiu, v boco; gii hn chiu ca kho d liu; hoc gii hn thi gian, ngn sch, hocngun lc.

  • 7/31/2019 Datawarehouse Document

    41/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 41

    C cc loi khc nhau ca cc cng c thit k kho d liu. Cc cng c pht trin kho d liu cung cp chc nng nh ngha v bin tp ni dungca metadata (nh lc , m, quy lut), tr li truy vn, xut bo c, chuynmetadata qua li t metadata ca h CSDL quan h. Cc cng c lp k hochv phn tch nghin cu tm nh hng ca vic thay i lc v nng sutlm mi d liu khi thay i tn sut lm mi ca khung thi gian.

    3.2 KIN TRC 3 TNG CA KHO D LIU Kho d liu thng c cu trc 3 tng, xem hnh 3.12

    1. Tng di cng l server ca kho d liu v hu ht l h CSDL quan h.Cc cng c u cui v cc tin ch c dng a d liu vo tngdi cng t h c s d liu hot ng hoc t ngun bn ngoi. Nhngcng c v tin ch ny thc hin vic loi b d liu tha, lm sch dliu, chuyn i d liu, cp nht d liu. D liu c a vo bng

    chng trnh ng dng gi l gateway. Mt gateway c h tr bngDBMS bn di v cho php chng trnh client pht sinh m SQL thao tc trn server. V d v gateway bao gm ODBC v OLEDB vJDBC. Tng ny cng c cha metadata.

    2. Tng gia l mt server OLAP c ci t dng m hnh quan hOLAP, l, mt m hnh quan h m rng ca DBMS chuyn d liu tcc hot ng trn m hnh quan h chun sang m hnh a chiu OLAP(MOLAP), l server c mc ch c bit no c ci t trc tipd liu a chiu. Server OLAP c bn lun trong phn 3.5

    3. Tng trn cng l tng ngi dng cui, gm cc cu truy vn v cccng c lm bo co, phn tch, cng c khai thc d liu.

  • 7/31/2019 Datawarehouse Document

    42/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 42

    Kho d liu tp on: Mt kho d liu cha tt c cc thng tin v cci tng trong ton b t chc. N cung cp d liu c tp hp t nhiungun, thng t mt hoc nhiu h thng hot ng hoc cc ngun thng tin bn ngoi. N thng cha d liu chi tit cng nh d liu tm tt, v c th ckch thc t mt vi gigabyte n hng trm gigabyte, terabyte, hoc nhiu hnna. Mt kho d liu tp on c th c trin khai trn my tnh ln, mySuperServers, hay cc h thng song song. N i hi m hnh kinh doanh rngln v c th mt nhiu nm thit k v xy dng.

  • 7/31/2019 Datawarehouse Document

    43/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 43

    Data mart: Cha mt tp con ca d liu lin hp rng ln v c gi tr

    i vi mt nhm ngi dng c th. Phm vi gii hn cho cc ch cchn. V d, mt data mart c th gii hn cc ch ca n cho khch hng,mt hng, v bn hng. D liu cha trong data mart c xu hng c tm tt.

    Data mart thng c ci t trn server ca phng ban v chi ph thptrn nn UNIX/LINUX-hay Windows. Chu k ci t ca data mart c tnh

  • 7/31/2019 Datawarehouse Document

    44/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 44

    bng tun hn l thng hay nm. Tuy nhin c th lin quan ti vic tch hp ludi nu thit k v k hoch khng mang tnh tp on.

    Ty thuc vo ngun ca d liu, data mart c th c chia ra 2 loi: clp v khng c lp. Data mart c lp c ngun l d liu c ly t mt haynhiu h thng hot ng hay ngun d liu bn ngoi, hay t ngun d liu pht sinh cc b trong mt phng ban. Data mart khng c lp c ngun thngtin ly trc tip t kho d liu ca tp on.

    Kho d liu o: Mt kho d liu o l mt tp cc khung nhn qua h c s d liu giao dch. vic x l truy vn hiu qu, ch mt vi khung nhntng kt c hin thc ha. Mt kho d liu o c xy dng d dng nhngyu cu khng gian lu tr trn h CSDL hot ng.

    u im v nhc im ca phng php trn-xung v di-ln caserver kho d liu. Phng php trn-xung ca mt kho d liu tp on phc

    v cho cc gii php h thng v gim cc vn tch hp. Tuy nhin chi phca n rt cao, tn thi gian pht trin, v khng linh hot bi cc kh khn t c s ng nht cho tt c m hnh d liu chung ca ton b tp on.Phng php di-ln thit k, pht trin v ci t data mart c lp cungcp linh hot, chi ph thp, sm thu li li nhun. Tuy nhin n c th dn tivn khi tch hp cc data mart ri rc vo mt kho d liu ng nht.

    Mt phng php c ngh cho s pht trin ca h thng kho d liul ci t cc kho d liu vi tiu ch gia tng v tin ha, nh th hin trongHnh 3.13. Trc tin, mt m hnh d liu kt hp cp cao c nh nghatrong 1 thi gian ngn cung cp ci nhn rng, ng nht, tch hp ca d liugia cc ch khc nhau v cc ng dng khc nhau. M hnh cp cao ny,mc d s cn thit c iu chnh trong hng pht trin trong tng lai ca

  • 7/31/2019 Datawarehouse Document

    45/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 45

    kho d liu ca tp on v data mart ca phng ban, nhng s gim c ccvn v tch hp trong tng lai. Th hai, data mart c lp c th c trinkhai song song vi kho d liu ca tp on da trn tp m hnh d liu kthp nh trn. Th ba, data mart c th c xy dng tch hp cc datamart bng cc hub. Cui cng, mt kho d liu a tng c xy dng m trong, kho d liu l duy nht v phn phi d liu n cc data mart khng clp.

    3.3 CC CNG C U VO CA KHO D LIU V CC TINCHH thng kho d liu dng cc cng c u cui v cc tin ch pht

    sinh v lm mi d liu ca chng. Cc cng c v tin ch ny bao gm ccchc nng sau:

    Thu thp d liu: Thu thp d liu t nhiu ngun khc nhau. Lm sch d liu: Truy ra cc li ca d liu v sa li nu cn thit. Chuyn i d liu: Chuyn d liu t nh dng gc sang nh dng ca

    kho d liu.

  • 7/31/2019 Datawarehouse Document

    46/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 46

    Ti: Sp xp, tng kt, tnh ton khung nhn, kim tra tnh tch hp, xydng phn on.

    Lm mi: cp nht nhng thay i ca d liu gc ln d liu ca kho dliu.

    Ngoi ra h thng kho d liu cn cung cp mt tp cc cng c tt qun l kho d liu.

    Lm sch v chuyn i d liu l 2 bc quan trng ci thin cht

    lng ca d liu, v dn theo cht lng ca khai thc d liu.

    3.4 VNG CHA METADATAMetadata l d liu v d liu. Khi c dng trong kho d liu, metadata

    l d liu nh ngha cc i tng kho d liu. Hnh 3.12 cho thy vng chametadata tng di cng ca kin trc kho d liu. Metadata c to ra vnm bt nh du thi gian d liu c ly v, ngun ca d liu ly v,

    vng b mt c thm bng qu trnh lm sch v tch hp.

    Mt vng cha metadata c th cha:

    M t ca cu trc ca kho d liu, bao gm lc kho d liu, khungnhn, chiu, phn cp, nh ngha d liu c rt ra, cng nh v tr datamart v ni dung.

    Metadata v hot ng, bao gm lch s ca d liu chuyn n v chuicc bin i, hin trng ca d liu v cc thng tin hin th (thng k sln s dng kho d liu, bo co li)

    Thut ton dng tng kt, bao gm cc o, cc thut ton nhngha cc chiu, cc truy vn c nh ngha trc v cc bo co.

    Mapping t mi trng giao dch ti kho d liu, bao gm d liu ngunv ni dung ca chng, m t gateway, phn vng d liu, a d liu

  • 7/31/2019 Datawarehouse Document

    47/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 47

    vo, lm sch, quy lut chuyn i, lm mi d liu v bo mt-an tond liu.

    Metadata ca doanh nghip: bao gm cc thut ng v cc nh ngha cadoanh nghip, thng tin v ch s hu d liu, chnh sch.

    Kho d liu bao gm nhiu cp tng hp khc nhau, metadata l 1trong nhng loi . Cc loi khc bao gm d liu chi tit (d liu thng trtrn a), d liu chi tit v c, d liu c tnh tng kt cao v d liu c tnhtng kt thp.

    Metadata ng vai tr khc vi d liu trong kho d liu v rt quan trngv nhiu l do. V d, metadata c dng nh mt ch dn h thng nh vtr ca ni dung d liu trong kho d liu, ch dn cho mapping ca d liu khid liu c chuyn t mi trng hot ng sang mi trng kho d liu, chdn v thut ton dng tng kt gia d liu chi tit hin thi v d liu c

    tng kt mc thp, v gia d liu c tng kt mc thp v d liuc tng kt mc cao. Metadata nn c lu tr v qun l ng nht.

    3.5 CC LOI MY CH OLAP: ROLAP, MOLAP, HOLAPV mt logic, my ch OLAP cho ngi dng ca doanh nghip thy d

    liu a chiu t kho d liu hoc data mart, khng cn quan tm ti vic d liuc lu tr u v nh th no. Tuy nhin, kin trc vt l v vic ci t

    my ch OLAP cn phi xem xt cc vn v lu tr d liu. Vic ci t mych kho d liu cho qu trnh OLAP bao gm:

    My ch OLAP quan h (ROLAP): y l nhng my ch t pht nggia my ch quan h u cui v cc cng c ngi dng. Chng dng mtH qun tr c s d liu quan h lu tr v qun l kho d liu, v OLAPv bn trung gian h tr cc cng vic cn thiu. My ch ROLAP bao gm

  • 7/31/2019 Datawarehouse Document

    48/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 48

    phn ti u cho mi DBMS u cui, ci t phn tng hp, v cc cng cthm vo v cc dch v.

    My ch OLAP a chiu (MOLAP): nhng my ch ny h tr nhngkhung nhn a chiu trong b my lu tr a chiu da trn mng. Chngchuyn cc khung nhn a chiu trc tip sang cu trc mng cc khi d liu achiu. Li ch ca vic dng khi d liu l cho php ch mc nhanh tnhtrc d liu tng hp. Ch rng vi vic lu tr d liu a chiu, vic sdng kho cha c th b chm li nu tp d liu b tha. Trong trng hp ,k thut nn cc ma trn tha c th c tm thy (Chng 4). Nhiu my chOLAP s dng vic lu tr d liu 2-cp kim sot s dy c v tha ca dliu: cc khi d liu con c xc nh v lu tr bng cu trc mng, trongkhi cc khi d liu con dng cng ngh nn lm cho vic lu tr hiuqu.

    My ch OLAP lai (HOLAP):phng php ny kt hp c cng nghROLAP v MOLAP, lm m rng gii hn s dng ca ROLAP v MOLAPc nhanh hn. V d, mt my HOLAP c th cho php mt lng ln dliu chi tit c lu tr trong c s d liu qun h, trong khi vic tnghp c lm trong kho lu tr MOLAP ring bit. Microsoft SQL Server 2000h tr my ch HOLAP.

    My ch SQL chuyn bit: p ng nhu cu ngy cng tng ca qutrnh OLAP trong c s d liu quan h, mt s ci t h c s d liu dngtrn SQL server (SQL server cung cp ngn ng truy vn v x l truy vn htr truy vn SQL trn lc sao v bng tuyt trong mt mi trng ch c)(read-only)

  • 7/31/2019 Datawarehouse Document

    49/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 49

    Lm sao d liu c lu tr trong kin trc ROLAP v MOLAP? VROLAP, n dng bng d liu quan h lu tr d liu phn tch trctuyn. Bng fact lin quan ti cuboid c s c xem nh bng fact c s. Bngfact c s lu d liu mc tru tng c ch ra bi kha kt trong khi dliu cho. D liu c tng hp c th c lu tr trong bng fact, cxem l bng fact tng hp. Mt s bng fact tng hp lu tr c d liu ca bng fact c s v d liu tng hp, trong v d 3.10. Thay vo , cc bng facttng hp ring bit c th c dng cho mi mc tru tng, ch lu tr

    d liu tng hp.

    V d 3.10.Mt kho cha d liu ROLAP. Bng 3.4 ch ra mt bng fact tnghp c c d liu c s v d liu tng hp. Lc ca bng l , trong day, month, quarter, year nh ngha ngy bn hng, v dollars_sold l tngdoanh thu bn hng. Xem 2 b vi RID l 1001 v 1002. D liu ca nhng bny bng fact c s, trong ngy bn hng l 15/10/2003 v 23/10/2003.Xem xt mt b vi RID l 5001. B ny mc tru tng tng qut hn bc RID l 1001 v 1002. Gi tr day c tng qut ha thnh all, cho nn gitr time tng ng l thng 10/2003. l, tng dollars_sold l mt tng ktcho c thng 10/2003.

    MOLAP s dng cu trc mng a chiu lu tr d liu cho qu trnh

    phn tch trc tuyn. Kin trc ny c bn lun trong phn sau v vic ci tkho d liu, v c m t chi tit hn trong chng 4.Hu ht h thng kho d liu dng kin trc client-server. Mt kho d liu quanh lun t trong vng ca my ch kho d liu v data mart. Mt kho d liua chiu c th t trong vng ca my ch c s d liu hoc vng my khchc s d liu.

  • 7/31/2019 Datawarehouse Document

    50/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 50

    4. CI T KHO D LIUKho d liu cha lng ln d liu. My ch OLAP yu cu vic truy

    vn h tr a ra quyt nh c tr li trong vi giy.Do , rt cn thit hthng kho d liu h tr k thut tnh ton khi d liu, phng php truy xut,v k thut x l truy vn. Trong phn ny, ta a ra mt ci nhn tng quan cacc phng php ci t h thng kho d liu hiu qu.

    4.1 TNH TON KHI D LIU HIU QUCt li ca phn tch d liu a chiu l vic tnh ton hiu qu cc php

    tng hp cc chiu. trong thut ng SQL, nhng php tng hp ny c xemnh l php group-by. Mi group-by c th biu din bi 1 cuboid, trong tp cc group-by hnh thnh nn mt li cuboids nh ngha cho khi d liu.Trong phn ny, ta s tm ra cc vn lin quan n vic tnh ton hiu qukhi d liu.

    CC PHP TNH KHI D LIU V KH KHN CA VICLP RA CHIU

    Mt phng php tnh ton khi d liu m rng SQL bao gm phpcompute cube. Php compute cube tnh ton tng kt trn tt c cc tpcon ca cc chiu xc nh trong qu trnh hot ng. N c th yu cu khnggian lu tr vt qu gii hn, c bit i vi s lng ln cc chiu. Ta bt

    u vi ci nhn trc quan vo nhng th lin quan n vic tnh ton khi dliu hiu qu.

    V d 3.11.Mt khi d liu l mt li cc cuboids. Gi s ta cn to ra mtkhi d liu cho vic bn hng AllElectronics bao gm: city, item, year, vsale_in_dollars. Bn c th phn tch d liu vi cc cu truy vn nh sau:

    Computethe sum of sales, grouping bycity and item.

  • 7/31/2019 Datawarehouse Document

    51/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 51

    Computethe sum of sales, grouping bycity.Computethe sum of sales, grouping byitem.

    Tng s lng cuboid l bao nhiu, hay group-by, c th c tnh chokhi d liu? Xem xt 3 thuc tnh city, item, year nh l cc chiu ca khi dliu, v sales_in_dollars l mt o, tng s lng cuboid ca khi d liu l23 = 8. Cc php gom nhm c th da trn {(city, item, year ), (city, item), (city, year ), (item, year ), (city), (item), ( year ), ()}, trong () ngha l group-by trnnhm khng c phn t. Nhng php gom nhm ny hnh thnh mt li cuboidca khi d liu, c hin th trong hnh 3.14. Cuboid c s cha tt c 3 chiuca khi d liu, city, item, year. N tr v tng doanh thu bn hng cho bt ks kt hp no ca 3 chiu. Cuboid apex, cuboid 0-D, c lin h n trnghp nhm trong mnh group-by rng. N cha tng tt c doanh thu ca bnhng. Cuboid c s l cuboid t tng qut nht. Cuboid apex l cuboid tng qutnht, thng c gi l all. Nu ta bt u t cuboid apex v truy trn litheo hng i xung, n tng ng vi php drill-down trong khi d liu. Nu ta bt u t cuboid c s ta truy theo hng i ln, n tng t nh phproll-up.

    Mt cu truy vn SQL khng cha group-by, v d computethe sum of total sales l mt php tnh c s chiu l 0. Mt cu truy vn SQL cha mt php group-by, v d computethe sum of sales,group bycity l mt php

    tnh c 1 chiu. Mt php tnh ton khi d liu trn n chiu tng ng vi

  • 7/31/2019 Datawarehouse Document

    52/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 52

    mt tp cc cu group-by, mi

    tp con ca n chiu. Do php tnh khi d liu l php tng qut ha n chiuca php tnh group-by.

    Da trn c php ca DMQL c gii thiu trong phn 2.3, khi d liutrong v d 3.11 c th c nh ngha nh sau:

    define cubesales cube [city, item, year]:sum (sales in dollars)

    Vi khi d liu n-chiu, c tt c 2n, bao gm cc cuboid c s. Mt cu lnhnh sau

    compute cubesales cube

    c th gin tip xy dng h thng tnh cc cuboid tng kt bn hngcho ttc 8 tp con ca {city, item, year}, bao gm c tp rng. Php tnh khi d liuc a vo nghin cu bi Gray et tal.

  • 7/31/2019 Datawarehouse Document

    53/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 53

    Qu trnh phn tch trc tuyn c th cn phi truy xut cc cuboid khcnhau cho cc cu truy vn khc nhau. Do , n c th l mt tng hay tnh ton tt c t nht l vi cuboid trong khi d liu. Vic tnh ton trc dnn vic thi gian tr li nhanh v trnh c mt s php tnh b trng. Huht, cc sn phm OLAP phi lm i lm li mt mc no vic tnh tontrc cc tng kt a chiu.

    a s cc kh khn lin quan ti vic tnh ton trc, tuy nhin, l do yucu v khng gian lu tr b bng n nu tt c cc cuboid trong khi d liuc tnh ton trc, c bit l khi khi d liu c nhiu chiu. Nhu cu vkhng gian lu tr c khi cn cp bch hn nu c nhiu chiu c phn cp.Vn ny c xem nh kh khn ca vic to ra cc chiu. Vic m rng cakh khn ny c m t nh sau:

    Nh ta bit khi d liu n chiu c 2n cuboid. Nu cc chiu c phn

    cp, v d time c phn cp thnh day

  • 7/31/2019 Datawarehouse Document

    54/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 54

    hng, th c th c |city|x|item| b trong group-by. Khi s lng cc chiu, slng cc phn cp khi nim tng ln, khng gian lu tr cho cc group-by svt qua khi kch thc cc quan h u vo mt cch ng .

    Ta c th nhn ra rng vic tnh trc v tng qut ha tt c cc cuboidl mt iu khng thc t v c th c pht sinh cho khi d liu (hoc tcuboid c s). Nu c nhiu cuboid, v nhng cuboid c kch thc ln, v lachn hp l hn l hin thc tng phn, l, hin thc ch mt s cuboid cnthit.

    Hin thc tng phn: Tnh ton cc cuboid

    1. Khng hin thc: Khng thc hin vic tnh ton trc cc cuboid khngc s. N dn n vic tnh ton tng kt s c thc hin trong qutrnh x l truy vn, nh vy vic x l rt chm.

    2. Hin thc hon ton: Tnh ton trc tt c cc cuboid. Li kt qu ca

    cc cuboid c tnh ton c xem nh ton b khi d liu. La chnny yu cu s khng gian lu tr ln lu tr tt c cc cuboid ctnh ton trc.

    3. Hin thc tng phn: Ch la chn mt s cuboid c th cn c tnh.Ta c th tnh mt tp con ca khi d liu, cha nhng khi n v tha o no do ngi dng nh ngha, v d s lng cc b ln hn

    mt ngng no . Ta s dng thut ng khi d liu con ch trnghp sau ny, trong ch mt s khi n v c th c tnh ton trccho cc cuboid khc nhau. Hin thc ha tng phn a ra mt s trao igia thi gian tr li v khng gian lu tr.

    Hin thc ha tng phn ca cc cuboid hay cc khi d liu con nn c xemxt 3 yu t: (1) xc nh tp con ca cc cuboid hoc khi d liu con hin

  • 7/31/2019 Datawarehouse Document

    55/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 55

    thc ha; (2) s dng cc cuboid c hin thc ha trong qu trnh x ltruy vn; (3) cp nht hiu qu cuboid c hin thc ha trong qu trnh ti vlm mi.

    Vic la chn cc tp con ca cc cuboid hay cc khi d liu con nnc xem xt cc cu truy vn, tn sut ca chng, chi ph truy xut chng. Bncnh , nn xem xt tnh cht ca lng cng vic, chi ph cp nht, khnggian lu tr cn thit. Vic la chn cng cn xem xt trn ng cnh rng cathit k c s d liu, v d s pht sinh hay la chn ca cc ch mc. Mt ssn phm OLAP dng phng php heuristic cho vic la chn cuboid v cckhi d liu con. Mt phng php ph bin l hin thc ha tp cc cuboid mcc cc cuboid khc thng s dng. Mt phng php khc l ta c th tnhiceberg cube, l khi d liu lu tr ch nhng khi d liu n v c gi tr tngkt ln hn ngng nh nht no . Mt phng php na l shell cube, ntnh trc cc cuboid cho s lng nh cc chiu ca khi d liu. Cc cu truyvn khc c th c tnh ngay khi x l. Bi v mc ch ca ta trong chngny l gii thiu v tng quan v kho d liu dng trong khai thc d liu, ta s bn lun c th hn trong chng 4.

    Khi cuboid c hin thc ha, mt iu quan trng l s dng chngtrong qu trnh x l truy vn. N lin quan ti mt s vn nh lm sao xc nh cuboid thch hp t cc cuboid, lm sao dng cu trc ch mc c sntrn cc cuboid c hin thc ha, lm sao chuyn cc x l OLAP trn cccuboid c chn. Nhng vn ny c bn lun trong phn 4.3 trongchng 4.

  • 7/31/2019 Datawarehouse Document

    56/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 56

    Cui cng, trong qu trnh ti v lm mi, cc cuboid c hin thc nnc cp nht hiu qu. K thut cp nht song song v tng dn cho qu trnhny nn c khai thc.

    4.2 CH MC D LIU OLAP vic truy xut d liu hiu qu, hu ht cc h thng kho d liu h tr

    cu trc ch mc v khung nhn c hin thc ha (dng cuboid). Phng phpthng thng la chn cuboid cho vic hin thc ha c bn lun trong phn trc. trong phn ny, ta s kim tra lm sao nh ch mc cho d liuOLAP bngbitmap indexing v join indexing.

    Phng php bitmap indexing c dng ph bin trong cc sn phmOLAP bi n cho php vic tm kim nhanh chng trn khi d liu. Bitmapindex l mt biu din khc ca danh schrecord_ID(RID).Trong bitmap indexcho mt thuc tnh, c mt vector bit Bv, vi mi gi tr v trong min ca thuc

    tnh. Nu min ca cc thuc tnh c n gi tr th n bit cn cho mi mc trong bitmap index (v d: c n bit vector). Nu thuc tnh c gi tr v cho 1 dngtrong bng d liu, th bit biu din cho gi tr c t = 1 ti dng tngng ca bitmap index. Tt c cc bit khc c t = 0.

    V d 3.12. Bitmap indexing. Trong kho d liu AllElectronic, gi s chiu item cp cao nht c 4 gi tr (biu din cho cc loi item):home entertainment,

    computer, phonev security.Mi gi tr c biu din bi 1 bit vector trong bng bitmap index ca chiu item. Gi s khi d liu c lu tr nhmt bng d liu quan h vi 100 000 dng. Bi v min ca item cha 4 gi tr, bng bitmap index yu cu 4 bit vector, mi vector c 100 000 bit. Hnh 3.15 biu din mt bng d liu c s cha chiu item v city, v bng mapping vo bng bitmap index cho mi chiu.

  • 7/31/2019 Datawarehouse Document

    57/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 57

    nh ch mc bitmap index thun li hn dng phng php s dnghash v cy. N c bit c ch cho min gi tr thp bi vic so snh, kt, tnghp c a v cc php ton dng trn bit, lm gim thi gian x l. nh chmc bitmap index dn ti vic lm gim khng gian v input/output v mtchui k t c th c biu din bi 1 bit. i vi min gi tr cao hn, phng php c th c dng l k thut nn.

    Phng php nh ch mc join index c dng ph bin t khi n cdng trong x l truy vn kt trn c s d liu quan h. Vic nh ch mc

    truyn thng ch gi tr trong mt ct ti mt danh sch cc dng c gi tr . Ngc li, nh ch mc join index thc hin trn nhng dng kt c ca 2quan h t c s d liu quan h. V d, nu 2 quan h R(RID, A) v S(B, SID)kt trn thuc tnh A v B, sau cc record join index cha cp (RID, SID),trong RID v SID l nhng kha xc nh record t quan h R v S. Do ,cc record join index c th xc nh cc b kt c, khng cn thc hin php

    kt thng thng. nh ch mc c bit c ch trong vic duy tr quan h giacc kha ngoi v kha chnh tng ng, t quan h kt c.

  • 7/31/2019 Datawarehouse Document

    58/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 58

    M hnh lc d liu hnh sao ca kho d liu lm cho vic nh chmc join index rt c ch trong vic tm kim trn nhiu bng, v s lin kt gia bng fact v bng chiu tng ng bao gm kha ngoi ca bng fact v khachnh ca bng chiu. nh ch mc join index duy tr quan h gia cc gi trthuc tnh ca mt chiu v cc hng tng ng trn bng fact. Join index c thm rng a chiu hnh thnh ch mc phc tp. Ta c th s dng ch mc nh ngha khi d liu con.V d 3.13Join indexing. Trong v d 3.4, ta nh ngha lc sao cho

    AllElectronics ca m hnh sales star [time, item, branch, location]: dollars sold = sum(sales in dollars). Mt v d ca quan h join index gia bng factsales v bng chiu ca location v item c hin th trong hnh 3.16. V d,gi tr Main Street trong bng chiu location kt vi cc b T57, T238 vT884 ca bng fact sales. Tng t, gi tr Sony-TV ca bng chiu item ktvi cc b T57 v T459 ca bng fact sales. Bng ch mc tng ng c hin

    th trong hnh 3.17.

  • 7/31/2019 Datawarehouse Document

    59/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 59

    Gi s rng c 360 gi tr time, 100 item, 50 branch, 30 location, v 10triu b sales trong khi d liu sales_star. Nu bng fact ca sales ghi nhn ch30 item, 70 item cn li khng tham gia vo php kt. Nu ch s kt khngc dng, vic I/O cn phi c thc hin kt bng fact v bng chiu vinhau.

    lm tng tc x l cu truy vn, phng php nh ch mc joinindex v bitmap index c th c tch hp hnh thnh ch mc bitmapped join.

    4.3 X L TRUY VN OLAP HIU QUMc ch ca vic hin thc ha cc cuboid v xy dng cu trc ch mcOLAP l tng tc x l truy vn ca khi d liu. Cho nhng khung nhn chin thc ha, x l truy vn nn c thc hin nh sau:

    1. Xc nh qu trnh no nn c thc hin trn cc cuboid: n lin quann vic chuyn cc php chn, kt, roll-up, drill-down trong cc cu truy

  • 7/31/2019 Datawarehouse Document

    60/74

  • 7/31/2019 Datawarehouse Document

    61/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 61

    mc tru tng cho cc chiu item v location trong nhng cuboid ny l mcchi tit hnbrand v province_or_state

    Chi ph cho mi cuboid c so snh nh th no nu c dng xl cu truy vn? N ging nh s dng cuboid 1 c th c chi ph cao nht vitem_name v city mc thp hn brand v province_or_state xc nh trongcu truy vn. Nu khng c nhiu gi tr year lin quan n item trong khi dliu, nhng c mt sitem_namescho mibrand,sau cuboid 3 s nh hncuboid 4, v do cuboid 3 nn c chn x l cu truy vn. Tuy nhin,nu tn ti ch mc hiu qu cho cuboid 4 th cuboid 4 l la chn tt hn. Do, mt s thit lp da trn chi ph c yu cu quyt nh tp cuboid nonn c chn cho qu trnh x l truy vn.

    V m hnh lu tr ca my ch MOLAP l mng n chiu, cu truy vn achiu c map trc tip n cu trc ca my ch lu tr, n cung cp kh

    nng xc nh a ch trc tip. Vic biu din mng n gin ca khi d liuc c im nh ch mc tt, nhng vic lu tr li km khi d liu tha. ivi vic lu tr v x l hiu qu, cc ma trn tha v k thut nn d liu nnc p dng. Chi tit ca mt s phng php tnh ton c a ra trongchng 4.

    Cu trc lu tr c dng bi nhng mng dy v tha c th khc

    nhau, rt thun li cho vic s dng phng php 2 cp x l truy vnMOLAP: dng cu trc mng cho mng dy, v cu trc ma trn tha cho mngtha. Mng dy 2 chiu c th c th c nh ch mc bng B-tree.

    x l truy vn trong MOLAP, mng dy 1 chiu v 2 chiu cn cnh ngha. Ch mc sau c xy dng nhng mng ny dng cu trc

  • 7/31/2019 Datawarehouse Document

    62/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 62

    ch mc truyn thng. Phng php 2 cp lm tng vic lu tr m khng cn b kh nng nh a ch trc tip.

    C chin lc no khc tr li cu truy vn nhanh khng ? Mt schin lc tr li truy vn nhanh tp trung vo vic cung cp phn hi ngaylp tc cho ngi dng. V d, trong vic tng kt trc tuyn, h thng khai thcd liu c th hin th nhng g n x l xong thay v ch ti khi cu truyvn c x l hon ton.

    Mt phng php khc l ci t N cu truy vn hng u (top N). Gi sta cn tm mt hng bn chy nht gia hng triu mt hng bn AllElectronics. Thay v ch t c mt danh sch tt c cc mt hng csp xp tng dn hay gim dn theo th t ca bn hng, ta c th thy top Nmt hng thay v c danh sch mt hng. N cho ra kt qu trong thi gian trli nhanh hn trong khi lm tng tnh tch hp v lm gim lng ph ti

    nguyn.Mc ch ca phn ny l cung cp mt ci nhn tng quan ca vic ci

    t kho d liu. Chng 4 s bn lun r hn v ch ny. N kim tra cc php tnh ton ca khi d liu v x l truy vn OLAP su hn, cung cp thutton chi tit hn.

    5. T VIC LU TR D LIU TRONG KHO D LIU TI KHAITHC D LIULm sao lu kho d liu v OLAP lin quan ti khai thc d liu ?.

    Trong phn ny, ta s nghin cu vic s dng kho d liu cho vic x l thngtin, x l phn tch, v khai thc d liu. Ta cng gii thiu khai thc h thng

  • 7/31/2019 Datawarehouse Document

    63/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 63

    khai thc phn tch trc tuyn (OLAM), mt m hnh mnh m tch hp OLAPvi cng ngh khai thc d liu.

    5.1 S DNG KHO D LIUKho d liu v data mart c s dng trong mt lot cc ng dng. Nh

    iu hnh kinh doanh, gim c s dng d liu trong kho d liu v data marts thc hin phn tch d liu v a ra cc quyt nh chin lc. nhiu cngty, kho d liu c s dng nh l mt phn khng th thiu ca h thng nhgi k hoch iu hnh dnh cho nh qun l doanh nghip.

    Kho d liu c s dng rng ri ti cc ngn hng v dch v ti chnh,ngi tiu dng hng ha v cc lnh vc phn phi bn l. Thng thng, mtkho d liu c s dng cng lu, n s cng tin trin. S tin ha ny din ratrong sut mt s giai on. Ban u, kho d liu c dng ch yu to racc bo co v tr li cc truy vn c xc nh trc. Dn dn, n c sdng phn tch d liu tm tt v d liu chi tit, cc kt qu c trnh bytrong mu bo co v biu . Sau , cc kho d liu c s dng cho mcch chin lc, thc hin phn tch a chiu v thc hin cc php slice v dice phc tp.

    Cui cng, kho d liu c th c s dng cho vic pht hin kin thcv quyt nh chin lc bng cch s dng cc cng c khai thc d liu.Trong bi cnh ny, cc cng c cho kho d liu c th c xp vo cc cng

    c truy xut, cng c bo co c s d liu, cng c phn tch d liu, v cngc khai thc d liu.

    Ngi dng doanh nghip cn phi c phng tin bit nhng g tnti trong kho d liu (Thng qua metadata), lm th no truy cp cc nidung ca kho d liu, lm th no kim tra cc ni dung bng cch s dngcc cng c phn tch, v lm th no trnh by cc kt qu phn tch nh

  • 7/31/2019 Datawarehouse Document

    64/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 64

    vy. C ba loi ng dng ca kho d liu: x l thng tin, phn tch, v khaithc d liu:

    X l thng tin h tr truy vn, phn tch thng k c bn, v bo co sdng crosstabs, bng biu, biu , hoc th. Mt xu hng hin titrong x l thng tin kho d liu l xy dng cng c truy cp web chi phthp v sau c tch hp vo trnh duyt web.

    X l phn tch h tr cc hot ng c bn ca OLAP, bao gm slice-and-dice, drill-down, roll-up, v pivoting. N thng hot ng trn d

    liu lch s trong c hai d liu tm tt v d liu chi tit. Sc mnh chnhca vic x l phn tch on-line hn l phn tch d liu trn d liu achiu trong kho d liu.

    Khai thc d liu h tr khm ph kin thc bng cch tm cc mu n vcc lin kt, xy dng cc m hnh phn tch, thc hin phn loi v don, v trnh by cc kt qu s dng cng c trc quan.

    "Lm th no khai thc d liu lin quan n x l thng tin v phntch trc tuyn ?" X l thng tin da trn truy vn, c th tm thy thng tinhu ch. Tuy nhin, cu tr li cho truy vn nh vy phn nh trc tip thng tinc lu gi trong c s d liu hoc tnh ton bi chc nng tng hp. Chngkhng phn nh cc mu phc tp hoc cc qui lut n trong c s d liu. Vvy, x l thng tin khng phi l khai thc d liu.

    X l phn tch on-line bc mt bc gn hn ti khai thc d liu v nc th ly c thng tin tm tt t cc tp con ngi s dng quy nh ca mtkho d liu. M t nh vy l tng ng vi lp / m t khi nim c tholun trong Chng 1. Bi v h thng d liu cng c th khai thc lp / m tkhi nim tng qut, iu ny t ra mt s cu hi th v: "H thng OLAP

  • 7/31/2019 Datawarehouse Document

    65/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 65

    thc hin khai thc d liu? H thng OLAP c l h thng khai thc d liuthc s khng? "

    Cc chc nng ca OLAP v khai thc d liu c th c xem nh lkhng giao nhau: OLAP l mt cng c tng hp d liu / gip n gin havic phn tch d liu, trong khi h thng khai thc d liu cho php pht hin tng ca cc m hnh kin thc tim n v th v n trong lng ln d liu.Cc cng c OLAP c nhm mc tiu hng n gin ha v h tr tng tc phn tch d liu, trong khi mc tiu ca cc cng c khai thc d liu l tng ho cng nhiu cc tin trnh nu c th, trong khi vn cho php ngidng dn dt qu trnh. Trong ngha ny, khai thc d liu i mt bc xa hnqu trnh phn tch d liu online truyn thng.

    Mt thay th v ci nhn rng hn ca khai thc d liu c th c sdng trong khai thc d liu bao gm c m t d liu v m hnh ha d

    liu. Bi v cc h thng OLAP c th biu din cc m t chung ca d liu tkho d liu, chc nng OLAP ch yu l cho bn tm tt d liu v so snh(bng cch drill-down, pivoting, slicing, dicing, v cc hot ng khc). y lcc chc nng khai thc d liu, mc d hn ch. Tuy nhin, theo quan imny, khai thc d liu bao gm mt tp rng hn nhiu so vi cc hot ng ngin ca OLAP bi v n thc hin khng ch d liu tm tt v so snh nhngcng lin kt, phn loi, d bo, phn nhm, phn tch chui thi gian, v ccnhim v phn tch d liu khc.

    Khai thc d liu khng gii hn trong phn tch d liu c lu trtrong kho d liu. N c th phn tch d liu chi tit. N cng c th phn tchcc giao dch, khng gian, vn bn, v d liu a phng tin m kh c th mhnh vi cng ngh c s d liu a chiu hin ti. Trong bi cnh ny, khai

  • 7/31/2019 Datawarehouse Document

    66/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 66

    thc d liu bao qut mt ph rng hn so vi OLAP vi tnh nng khai thc dliu v phc tp ca cc d liu x l.

    Bi v khai thc d liu lin quan nhiu n vic phn tch t ng v suhn OLAP, khai thc d liu d kin s c ng dng rng hn. Khai thc dliu c th gip doanh nghip, cc nh qun l tm kim v tip cn khch hng ph hp hn, cng nh t c hiu bit kinh doanh quan trng c th giptng th phn v nng cao li nhun. Ngoi ra, khai thc d liu c th gip ccnh qun l hiu c c im ca nhm khch hng v pht trin chin lcgi c ti u ph hp, ng mc ch khng da vo trc gic, nhng trn thct cc nhm mt hng c ngun gc t cc m hnh mua hng ca khch hng,gim chi tiu qung co.

    5.2 T X L PHN TCH TRC TUYN N KHAI THC PHNTCH ONLINE

    Trong lnh vc khai thc d liu, c cc nghin cu quan trng v victhc hin khai thc d liu trn cc nn tng khc nhau, bao gm c c s dliu giao dch, c s d liu quan h, c s d liu khng gian, c s d liu vn bn, c s d liu chui thi gian, cc tp tin, kho d liu,

    Khai thc phn tch on-line (OLAM) (cn gi l khai thc OLAP) tchhp x l phn tch on-line (OLAP) vi khai thc d liu v khai thc kin thc

    trong c s d liu a chiu. Trong s cc m hnh khc nhau v kin trc cah thng khai thc d liu, OLAM l c bit quan trng v nhng l do sauy:

    Cht lng cao ca d liu trong kho d liu: Hu ht cc cng c khaithc d liu cn phi lm vic trn d liu tch hp, nht qun, v sch,m i hi qu trnh lm sch d liu, tch hp d liu, v chuyn i d

  • 7/31/2019 Datawarehouse Document

    67/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 67

    liu tn chi ph nh l bc tin x l. Mt kho d liu c xy dng bng cc bc tin x l nh vy c xem nh mt ngun d liu ccht lng cao phc v cho OLAP cng nh khai thc d liu. Ch rngc th khai thc d liu cng nh l mt cng c ng gi lm sch dliu v tch hp d liu.

    C s h tng x l thng tin d liu xung quanh kho d liu: C s h tng x l thng tin ton din v phn tch d liu c xy dng xungquanh c h thng kho d liu, trong bao gm qu trnh truy xut, tch

    hp, hp nht, v chuyn i c s d liu khng ng nht, kt niODBC / OLE DB, truy cp Web v cc tin ch dch v, v lp bo co vcng c phn tch OLAP. Ta nn s dng nhng c s h tng c sn thayv xy dng mi th t u.

    Phn tch d liu thm d da trn OLAP: khai thc d liu hiu qucn phn tch d liu thm d. Mt ngi dng s thng xuyn mun

    duyt mt c s d liu, chn phn d liu c lin quan, phn tch chngvo cc mc khc nhau, v biu din kin thc / kt qu trong cc hnhthc khc nhau. Khai thc phn tch on-line cung cp phng tin chokhai thc d liu trn cc tp con khc nhau ca d liu v cc cp tru tng khc nhau bng cch drill-down, pivoting, filtering, dicing, vslicing trn mt khi d liu v trn mt s kt qu khai thc d liu. iuny, cng vi cng c biu din d liu / kin thc trc quan, s tngcng sc mnh v tnh linh hot ca khai thc d liu thm d.

    La chn trc tuyn cc chc nng khai thc d liu: Thng thngmt ngi dng c th khng bit mnh mun khai thc nhng loi kinthc g. Bng cch kt hp OLAP vi nhiu chc nng khai thc d liu,khai thc phn tch on-line cung cp cho ngi s dng tnh linh hot

  • 7/31/2019 Datawarehouse Document

    68/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 68

    chn chc nng khai thc d liu mong mun v chuyn i nhim v tng.

    KIN TRC H THNG KHAI THC PHN TCH ON-LINE

    Mt my ch OLAM thc hin khai thc phn tch trong khi d liu mtcch tng t nh mt my ch OLAP thc hin qu trnh phn tch online. Mtkin trc OLAM tch hp vi OLAP c th hin trong hnh 3.18, ni m ccmy ch OLAM v OLAP u chp nhn truy vn online ca ngi s dng

    thng qua mt giao din ha API v lm vic vi cc khi d liu trong phntch d liu thng qua mt khi lp phng API. Mt th mc metadata c sdng hng dn truy cp cc khi d liu. Cc khi d liu c th c xydng bng cch truy cp v / hoc tch hp nhiu c s d liu thng qua mtAPI MDDB v / hoc bng cch lc d liu trong kho d liu thng qua mtAPI m c th h tr kt ni OLE DB hay ODBC. V mt my ch OLAM c

    th thc hin nhiu nhim v khai thc d liu, chng hn nh m t khi nim,kt hp, phn loi, d bo, phn nhm, phn tch chui thi gian, , n thng bao gm nhiu module tch hp khai thc d liu v phc tp hn so vi mtmy ch OLAP.

    Chng 4 m t kho d liu trn mt cp r rng hn bng cch khaithc cc vn ci t kho d liu nh tnh ton khi d liu, chin lc tr li

    truy vn OLAP, v cc phng php tng qut ha. Cc chng sau cdnh cho vic nghin cu cc k thut khai thc d liu. Nh chng ta thy,vic gii thiu v kho d liu v cng ngh OLAP trnh by trong chng ny liu cn thit nghin cu khai thc d liu. iu ny l bi v kho d liucung cp cho ngi s dng vi s lng ln d liu sch, c t chc, v ctm tt, v c ch cho khai thc d liu. V d, thay v lu tr cc chi tit ca

  • 7/31/2019 Datawarehouse Document

    69/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 69

    mi giao dch bn hng, mt kho d liu c th lu tr mt bn tm tt ca ccgiao dch cho mi loi mt hng cho tng chi nhnh hoc, tm tt n mt cp cao hn, cho mi quc gia. Kh nng ca OLAP cung cp d liu c tngkt nhiu ci nhn trong mt kho d liu to mt nn tng vng chc cho vickhai thc d liu thnh cng.

  • 7/31/2019 Datawarehouse Document

    70/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 70

    Hn na, chng ti cng tin rng khai thc d liu phi l mt qu trnhcon ngi lm trung tm. Thay v yu cu mt h thng khai thc d liu to racc mu v kin thc t ng, ngi dng cn thng xuyn tng tc vi hthng, thc hin phn tch cc d liu thm d. OLAP to ra mt v d ttcho phn tch d liu tch hp v cung cp s chun b cn thit khai thc dliu thm d. Hy xem xt s khm ph cc mu lin kt. Thay v khai thc dliu th mc thp trong cc giao dch, ngi dng nn c cho php xc nhcc hot ng roll-up theo hng no . V d, ngi dng c th mun roll-up

    theo chiu item i t vic xem d liu tp hp TV no c mua nvic xem cc thng hiu ca cc TV ny, chng hn nh Sony hay Panasonic. Ngi dng cng c th di chuyn t cp giao dch n cp khch hnghay cp loi khch hng tm s kt hp th v. Nh vy phong cch OLAPtrong khai thc d liu l c trng ca khai thc OLAP. Trong nghin cu vcc nguyn tc ca khai thc d liu trong sch ny, khai thc trn OLAP c

    nhn mnh, y da trn cng ngh tch hp gia khai thc d liu v OLAP.

    6. TM TT Mt kho d liu l mt tp hng ch , tch hp, bin th v thi gian,

    v t thay i c t chc h tr a ra quyt nh qun l. Mt syu t phn bit kho d liu vi c s d liu hot ng. Bi v hai hthng cung cp chc nng kh khc nhau v i hi cc loi d liu khcnhau, cn phi duy tr kho d liu tch bit vi c s d liu hot ng.

    Mt m hnh d liu a chiu c s dng cho vic thit k cc kho dliu ca tp on v cc data mart ca phng ban. M hnh nh vy c ths dng mt lc sao, lc bng tuyt, hoc lc chm sao. Ctli ca m hnh a chiu l cc khi d liu, trong bao gm mt tp

  • 7/31/2019 Datawarehouse Document

    71/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 71

    ln cc o (fact hay measure) v mt s lng chiu. Chiu l cc thcth hay nhng kha cnh m mt t chc mun ghi nhn li v phn cp.

    Mt khi d liu cha mt li cc cuboid, mi cuboid tng ng vi cccp tng kt khc nhau ca d liu a chiu.

    Phn cp khi nim t chc cc gi tr ca cc thuc tnh hay cc chiuvo cc mc tru tng khc nhau. Chng rt c ch trong vic khaithc mc nhiu mc tru tng.

    Phn tch x l on-line (OLAP) c th c thc hin trong kho d liu /

    data marts s dng m hnh d liu a chiu. Cc php x l OLAP tiu biu bao gm roll-up, drill-down, slice-and-dice, pivot (xoay), cng nhthng k hot ng nh xp hng v tnh ton s bin ng trung bnh vtc tng trng. Hot ng OLAP c th c thc hin c hiu qu bng cch s dng cu trc khi d liu.

    Kho d liu thng s dng kin trc 3-tng. Tng di cng l my ch

    kho d liu, thng thng l h c s d liu quan h. Tng gia l mych OLAP, tng trn cng l my client, cha cc cng c truy vn v lp bo co.

    Mt kho d liu cha cc cng c u cui v dng cho vic pht sinh vlm mi kho d liu. Chng lm cc cng vic a d liu vo, lm schd liu, chuyn i d liu, ti d liu, lm mi d liu, qun l kho dliu.

    Metadata ca kho d liu nh ngha cc i tng d liu. Mt kho chametadata cung cp chi tit v cu trc kho d liu, lch s d liu, thutton tng kt, mapping t ngun d liu n m hnh kho d liu, hiusut ca kho d liu, cc thut ng kinh doanh v cc vn .

    My ch OLAP c th s dng OLAP quan h (ROLAP), hay OLAP achiu (MOLAP), hay OLAP lai (HOLAP). Mt my ch ROLAP dng

  • 7/31/2019 Datawarehouse Document

    72/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 72

    DBMS quan h m rng nh x cc hot ng OLAP trn d liu achiu vo cc hot ng ca quan h chun. Mt my ch MOLAP linh khung nhn d liu a chiu trc tip vo cu trc mng. Mt my chHOLAP kt hp ROLAP v MOLAP. V d, n c th s dng ROLAPcho d liu lch s trong khi duy tr d liu c truy xut thng xuynvo kho MOLAP ring bit.

    Hin thc ha hon ton cp n vic tnh ton tt c cc cuboids trongli xc nh mt khi d liu. N thng i hi khng gian lu tr

    vt qu lng cho php rt ln, c bit khi s lng chiu v kchthc ca phn cp tng trng. Vn ny c bit n nh kh khnca vic to cc chiu. Thay vo , hin thc ha tng phn l vic tnhton c chn lc ca mt tp hp con ca cuboids hoc tp con khi dliu trong li. V d, mt iceberg l mt khi d liu cha nhng khin v c tng gi tr tng hp (v d, s) trn vi ngng h tr nh nht

    no . X l truy vn OLAP c th c thc hin hiu qu hn so vi vic s

    dng cc k thut lp ch mc. Trong ch mc bitmap, mi thuc tnh c bng ch mc bitmap ca ring n. nh ch mc bitmap lm gim cchot ng kt, tng hp, v so snh bng cch s dng cc php tnh trn bit. nh ch mc kt ng k cc hng kt c ca hai hay nhiu quanh t mt c s d liu quan h, lm gim chi ph tng th ca qu trnhkt OLAP. nh ch mc bitmapped-join, trong kt hp phng phpch mc bitmap v ch mc join, c th c s dng tng tc x ltruy vn OLAP.

    Kho d liu c dng x l thng tin (truy vn v lp bo co), qutrnh phn tch (qu trnh cho php ngi dng duyt qua d liu tng ktv d liu chi tit bng cc hot ng OLAP), v khai thc d liu (h tr

  • 7/31/2019 Datawarehouse Document

    73/74

    Kho d liu v cng ngh OLAP: Tng Quan

    Khai Thc D Liu Page 73

    khai ph kin thc). Khai thc d liu da trn OLAP c xem nh khaithc OLAP, hay khai thc phn tch online (OLAM), nhn mnh tnh tchhp v khai ph ca khai thc OLAP.

  • 7/31/2019 Datawarehouse Document

    74/74

    Kho d liu v cng ngh OLAP: Tng Quan

    TI LIU THAM KHOJiawei Han and Micheline Kamber , Data Mining: Concepts and Techniques,Second Edition, Morgan Kaufmann, San Francisco, 2006