Tổng quan về NoSQL

25
Scalable SQL and NoSQL Data Stores GVHD: PGS.TS. Đồng Thị Bích Thủy HVTH: Huỳnh Thị Thu Nga– 14 12 007

description

Vì sao cần CSDL, NoSQL, đặc điểm, ưu và nhược điểm, các loại mô hình CSDL NoSQL....

Transcript of Tổng quan về NoSQL

  • Scalable SQL and NoSQL Data StoresGVHD: PGS.TS. ng Th Bch ThyHVTH: Hunh Th Thu Nga 14 12 007

    Scalable SQL and NoSQL Data Stores

  • V sao cn NoSQL?*Maximal Objects and the Semantics of Universal Relation Databases*Khng ph hp trong thi i Internet

    Maximal Objects and the Semantics of Universal Relation Databases

  • V sao cn NoSQL?*Maximal Objects and the Semantics of Universal Relation Databases*Post A has ID: 1234Post A has ID: 1234Post B has ID: 1234Xung t d liu

    Maximal Objects and the Semantics of Universal Relation Databases

  • V sao cn NoSQL?*Maximal Objects and the Semantics of Universal Relation Databases*- c/ghi chmHN CH CA RDBMS- Lu tr b hn ch- Kh m rng - Chi ph vn hnh cao- c/ghi nhanhNHU CU THI I- Lu d liu ln, Big Data- D dng m rng - Chi ph vn hnh thp

    Maximal Objects and the Semantics of Universal Relation Databases

  • NoSQL *Scalable SQL and NoSQL Data Stores*Non-Relational19982009

    Scalable SQL and NoSQL Data Stores

  • c im nhn dng NoSQL*Scalable SQL and NoSQL Data Stores*Lc t do(Schema-free).H tr m rng d dng.API n gin.Eventual consistency (nht qun cui) v transactions hn ch trn cc thnh phn d liu n l.Khng gii hn khng gian d liu

    Scalable SQL and NoSQL Data Stores

  • Mt s khi nim ca NoSQL*Scalable SQL and NoSQL Data Stores*

    RDBMSNoSQLColumnsFieldsRowDocumentTableCollectionQuery: SQLQuery: using APIForeign keysNon Foreign keysSchemaFre schema

    Scalable SQL and NoSQL Data Stores

  • c im ca NoSQL*Scalable SQL and NoSQL Data Stores* Khng c tnh ACID Tnh cht BASE tng phn vi tnh ACID:- Basically Available- Soft state- Eventually consistent

    Scalable SQL and NoSQL Data Stores

  • u im ca NoSQL *Scalable SQL and NoSQL Data Stores* Hiu sut hot ng cao Kh nng phn trang Ngun m Kh nng m rng phm vi C cc CSDL NoSQL khc nhau cho nhng d n khc nhau Kinh t

    Scalable SQL and NoSQL Data Stores

  • Nhc im ca NoSQL *Scalable SQL and NoSQL Data Stores* Cu trc d liu phi quan h Open source h tr khng ng u gia cc doanh nghip Hn ch v tri thc nghip v Thiu s thng minh Nhng vn v tnh tng thch

    Scalable SQL and NoSQL Data Stores

  • Phn loi NoSQL *Scalable SQL and NoSQL Data Stores*

    Scalable SQL and NoSQL Data Stores

  • Key Value Store API (Application Programming Interface - Giao din lp trnh ng dng) n ginvoid Put(string key, byte[] data);byte[] Get(string key);void Remove(string key);

    *Scalable SQL and NoSQL Data Stores*

    Scalable SQL and NoSQL Data Stores

  • Key Value Store Truy xut, xa, cp nht gi tr thc (value) u thng qua key tng ng Gi tr c lu di dng BLOB (Binary large object) Hiu sut tt Xy dng n gin v d m rng L c s cho nhng loi CSDL NoSQL khc V d: gi mua hng Amazon (Amazon Dynamo)*Scalable SQL and NoSQL Data Stores*

    Scalable SQL and NoSQL Data Stores

  • Key Value StoreMt s loi key-value store ph bin:Key/value cache in RAM: memcached, Citrusleaf database, Velocity, Redis, Tuple space...Key/value save on disk: Memcachedb, Berkeley DB, Tokyo Cabinet, Redis...Eventually Consistent Key Value Store: Amazon Dynamo, Voldemort, Dynomite, KAI, Cassandra, Hibari, Project VoldemortOrdered key-value store: NMDB, Memcachedb, Berkeley DB...Distributed systems: Apache River, MEMBASE, Azure Table Storage, Amazon Dynamo ...

    *Scalable SQL and NoSQL Data Stores*

    Scalable SQL and NoSQL Data Stores

  • Column Families / Wide Column StoreColumn families database l h CSDL phn tn cho php truy xut ngu nhin/tc thi vi kh nng lu tr mt lng cc ln d liu c cu trc.Column families: Mt column family l cch thc d liu c lu tr trn a cng. Tt c d liu trong mt ct s c lu trn cng mt file. Mt column family c th cha super column hoc column.*Scalable SQL and NoSQL Data Stores*

    Scalable SQL and NoSQL Data Stores

  • Column Families / Wide Column StoreColumn: Mt column l mt b gm tn, gi tr v du thi gian (thng thng ch quan tm ti key-value). Super column: Mt super column c th c dng nh mt dictionary(kiu t in). N l mt column c th cha nhng column khc (m khng phi l super column).

    *Scalable SQL and NoSQL Data Stores*

    Scalable SQL and NoSQL Data Stores

  • Document Database*Scalable SQL and NoSQL Data Stores*V c bn th document database l mt key-value store vi value nm trong mt nh dng: XML, YAML, JSON, v BSON, kiu nh phn Key l chui n gin: URI hoc path

    Scalable SQL and NoSQL Data Stores

  • Document Database*Scalable SQL and NoSQL Data Stores*

    Document 1Document 2{ FirstName:"Bob", Address:"5 Oak St.", Hobby:"sailing" } { FirstName:"Jonathan", Address:"15 Wanamassa Point Road", Children:[ {Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2} ] }

    Scalable SQL and NoSQL Data Stores

  • Document Database*Scalable SQL and NoSQL Data Stores*Thc hin php chiu d liu ca mt document sang mt nh dng khc.Chy php tnh tp hp trn mt tp hp cc document.Cp nht mt phn d liu D phn tn

    Scalable SQL and NoSQL Data Stores

  • Graph Database*Scalable SQL and NoSQL Data Stores*Graph database l mt dng CSDL c thit k ring cho vic lu tr thng tin th nh cnh, nt, cc thuc tnh. Graph database document database vi cc kiu document c bit v cc mi quan h.

    Scalable SQL and NoSQL Data Stores

  • Graph Database*Scalable SQL and NoSQL Data Stores*

    Scalable SQL and NoSQL Data Stores

  • Graph Database*Scalable SQL and NoSQL Data Stores*Graph database thng c s dng gii quyn vn v mng. M rng kh v kh tm th con c lp Mt s sn phm tiu biu ca graph database l: Neo4J, Sones, AllegroGraph, Core Data, DEX, FlockDB, InfoGrid, OpenLink Virtuoso,...

    Scalable SQL and NoSQL Data Stores

  • KT LUN*Scalable SQL and NoSQL Data Stores*

    Scalable SQL and NoSQL Data Stores

  • Scalable SQL and NoSQL Data Stores

  • Tnh ACIDAtomicity: Thuc tnh ny m bo mi transaction l mt khi duy nht, c thc hin trn vn hoc hon ton khng c thc hin. Nu c mt li no xy ra trong transaction, n s c quay tr li (rollback) trng thi ban u. Khi bn gom nhiu lnh vo mt transaction (bao gia BEGIN TRAN v COMMIT), s ch c hai kh nng c php xy ra l, tt c cc lnh ny s c thc hin hoc khng c lnh no c thc hin. mc tng lnh, SQL Server cng m bo tnh atomicity, v d mt lnh INSERT cho 10 bn ghi, nu ang thm c 5 bn ghi th gp li, h thng s hy b v khng bn ghi no c thm. Nu lnh c km theo trigger, li trigger cng ko theo lnh b hy b. Khi bn pht ra lnh ROLLBACK, tt c cc lnh thc hin cng b quay lui v transaction tr li trng thi nh trc khi thc hin.Consistency: SQL Server m bo mi thi im d liu lun lun phi nht qun, tc l tun theo cc rng buc c nh ngha (v d trng kiu ngy phi cha d liu kiu ngy, bn ghi bn hng phi c m sn phm hp l). Khi transaction c thc hin, d liu sau khi cp nht cng phi trng thi nht qun. Nu transaction gy ra nhng vi phm v rng buc d liu, h thng s khng cho php thc hin tip v hy b ton b transaction.Isolation: Cng nh cc h thng server khc, SQL Server c th p ng nhiu yu cu xy ra ng thi. Nhng mi transaction c m bo thc hin trong mt ng cnh ring bit ca n v khng b nh hng bi cc transaction khc. Khi hai transaction cng cp nht mt d liu, SQL Server m bo chng c thc hin tun t khng dm ln chn ca nhau.Durability: Khi transaction thc hin xong ( commit), nhng cp nht tr nn c nh v d liu s lun lun l nh vy. Khi h thng gp s c bt ng, trong qu trnh khi phc li n s m bo khi phc li d liu cho nhng transaction c commit.*Scalable SQL and NoSQL Data Stores*

    Scalable SQL and NoSQL Data Stores

    Tc gi: David Maier SUNY at Stony Brook v Jeffrey D. Ullman i hc Stanford*Atomicity (nguyn t), Consitency (nht qun), Isolation (C lp), v Durability (Lu bn)Trong 40 nm qua, SQL v c s d liu quan h (RDBMS) lun l s la chn tin cy trong cc h thng lu tr d liu vi tnh ACID vn l im mnh ca m hnh d liu quan h, tuy nhin t sau khi Internet ra i, c bit l vo thi im Web 2.0 bung n th chnh im mnh ny li tr thnh nhc im ln nht ca n khi p ng vo mi trng Internet. SQL v m hnh d liu quan h gi y khng cn theo kp s pht trin ca Internet.

    *Trong mt th gii kt ni, m bo tnh tc thi ca truy cp, cc cng ty ln cn xy dng nhiu trung tm d liu nhiu ni khc nhau trn th gii, nhng trung tm d liu ny cn ng b vi nhau.H thng phn tn s dng RDBMS yu cu kt ni gia cc my ch d liu phi lin tc, lin mch. Nu xy ra li kt ni gia cc my ch d liu, s rt d pht sinh d liu trng lp, iu ny gy ra xung t d liu v vi phm tnh nht qun ca m hnh d liu quan h.Cc server kt ni di dng Master Master/s dng giao thc 2PC*Cc ng dng internet ngy nay c hng trm triu thm tr hng t ngi dng, iu ny khin cho cc my ch phi thc hin mt lng cc k ln cc lnh c ghi trong cng mt thi im. Nhng h thng my ch s dng c s d liu quan h v c nhiu rng buc ln nhau nn cc truy vn c x l chm nn khng cn p ng c nhng i hi ny.

    trong nhng nm qua, s pht trin ca Internet v cng ngh hin i ko theo s ra i ca hng lot cc nh dng d liu mi, c bit l cc loi d liu media.Trong khi RDBMS ch c thit k lu tr nhng d liu c dung lng ti a vi trm MegaByte, th nhng nh dng d liu mi c kch thc ln ti vi GigaByte, thm ch nh chp siu nt gi y c dung lng n hng Terabyte.Cng vi l vic scc loi d liu phi cu trc nh d liu thng tin v tr a l - GIS, d liu phin lm vic ngi dng, d liu thng tin hot ng ca thit b phn cng, d liu ng c my bay, d liu cm bin, Nhng loi d liu ny c gi chung l Big Data v chng vt ra ngoi kh nng x l ca SQL v c s d liu quan h.

    Nhu cu:- Kh nng x l s lng hng triu lt c/ghi tc nhanh ( tr thp)

    *NoSQL c ngha l Non-Relational - khng rng buc, tuy nhin hin nay ngi ta thng dch NoSQL l Not Only SQL - Khng ch SQL. y l thut ng chung cho cc h CSDL khng s dng m hnh d liu quan h.

    Thut ng NoSQL c gii thiu ln u vo nm 1998 s dng lm tn gi chung cho cc h CSDL quan h ngun m nh khng s dng SQL truy vn. Vo nm 2009, Eric Evans, nhn vin ca Rackspace gii thiu li thut ng NoSQL trong mt hi tho v c s d liu ngun m phn tn. Thut ng NoSQL nh du bc pht trin cath h database mi: distributed (phn tn) + non-relational (khng rng buc).

    **Fields: tng ng vi khi nim Columns trong SQLDocument: thay th khi nim row trong SQL. y cng chnh l khi nim lm nn s khc bit gia NoSQL v SQL, 1 document cha s ct (fields) khng c nh trong khi 1 row th s ct(columns) l nh sn trc.Collection: tng ng vi khi nim table trong SQL. Mt collection l tp hp cc document. iu c bit l mt collection c th cha cc document hon ton khc nhau.Key-value: cp kha - gi tr c dng lu tr d liu trong NoSQLCursor: tm dch l con tr. Chng ta s s dng cursor ly d liu t database.

    *Eventual consistency (nht qun cui): tnh nht qun ca d liu khng cn phi m bo ngay tc khc sau mi php write. Mt h thng phn tn chp nhn nhng nh hng theo phng thc lan truyn v sau mt khong thi gian (khng phi ngay tc khc), thay i s i n mi im trong h thng, tc l cui cng (eventually) d liu trn h thng s tr li trng thi nht qun.

    *Cc c s d liuNoSQLthng s dngcmmy chgi rqun l vic khai ph d liuvkhi lng giao dch,trong khiRDBMSc xu hng datrn cc my chc quyn t tinvh thng lu tr.Kt qu lchi ph cho miGBhocgiao dch/ giychoNoSQLc ththp hn chi phchoRDBMSnhiu ln, cho php bnlu tr vx l d liuhnvi mt mc githp hn nhiu.***API (Application Programming Interface - Giao din lp trnh ng dng)*Xy dng mt key/value store rt n gin v m rng chng cng rt d dng. *D liu c th tn ti dng bng vi hng t bng ghi v mi bng ghi c th cha hng triu ct. Mt trin khai t vi trm cho ti hng nghn node/commodity hardware dn n kh nng lu tr hng Petabytes d liu nhng vn m bo hiu sut cao.

    **Khi nim trung tm ca document database l khi nim document. Mi loi document database c trin khai khc nhau phn ci t chi tit nhng tt c documents u c ng gi v m ha d liu trong mt s nh dng tiu chun hoc m ha. Mt s kiu m ha c s dng bao gm XML, YAML, JSON, v BSON, cng nh kiu nh phn nh PDF v cc ti liu Microsoft Office (MS Word, Excel ). Trn thc t, tt c document database u s dng JSON(hoc BSON) hoc XML.Cc document c nh du trong document database thng qua mt kha duy nht i din cho documnet . Thng thng, kha ny l mt chui n gin. Trong mt s trng hp, chui ny c th l mt URI hoc ng dn (path). Chng ta c th s dng kha ny ly document t c s d liu. Thng thng, c s d liu vn lu li mt ch s (index) trong kha ca document document c th c tm kim nhanh chng. Ngoi ra, c s d liu s cung cp mt API hoc ngn ng truy vn cho php bn ly cc document da trn ni dung. V d, chng ta mun truy vn ly nhng document m nhng document c tp trng d liu nht nh vi nhng gi tr nht nh.

    *Khi nim trung tm ca document database l khi nim document. V c bn th document database l mt key-value store vi value nm trong mt nh dng c bit n (known format). Mi loi document database c trin khai khc nhau phn ci t chi tit nhng tt c documents u c ng gi v m ha d liu trong mt s nh dng tiu chun hoc m ha. Mt s kiu m ha c s dng bao gm XML, YAML, JSON, v BSON, cng nh kiu nh phn nh PDF v cc ti liu Microsoft Office (MS Word, Excel ). Trn thc t, tt c document database u s dng JSON(hoc BSON) hoc XML.C hai document trn c mt s thng tin tng t v mt s thng tin khc nhau. Khng ging nh mt c s d liu quan h truyn thng, ni mi record(row) c cng mt tp hp trng d liu (fields hay columns) v cc trng d liu ny nu khng s dng th c th c lu tr rng(empty), cn trong document database th khng c trng d liu rng trong document. H thng ny cho php thng tin mi c thm vo m khng cn phi khai bo r rng.

    *Li ch quan trng ca vic s dng document database l lm vic vi cc documents. Khng c hoc c rt t tr khng khng ph hp gia i tng v document. iu ny c ngha l vic lu tr d liu trong document database s d dng hn rt nhiu so vi vic s dng RDBMS trong trng hp m d liu cn lu tr c cu trc phc tp. Chng ta thng kh vt v thit k m hnh d liu vt l trong RDBMS bi v cch chng ta t d liu trong c s d liu v cch chng ta ngh v n trong ng dng hon ton khc nhau. Hn na trong RDBMS cn c khi nim lc v sa i lc l mt iu thc s kh khn nu chng ta trin khai trn nhiu node ca h thng.Document khng h tr mi quan h. iu c ngha l mi document l c lp v chng ta s d dng phn tn d liu hn so vi RDBMS bi v chng ta khng cn lu tr tt c cc quan h trn cng mt mnh ca h thng v khng cn h tr php join trn h thng phn tn.

    **Mt v d in hnh chnh l mng x hi, c th xem hnh bn di:Trong v d trn ta c 4 document v 3 mi quan h. Mi quan h trong graph database th c ngha nhiu hn con tr n thun. Mt mi quan h c th mt chiu hoc hai chiu nhng quan trng hn l mi quan h c phn loi. Mt ngi c th lin kt vi ngi khc theo nhiu cch, c th l khch hng, c th l ngi trong gia nhMi quan h t bn thn n c th mang thng tin. Trong v d trn ta ch n gin lu li li loi quan h v mc gn gi (bn b, ngi trong gia nh, ngi yu).Graph database thng c s dng gii quyt cc vn v mng. Trong thc t, hu ht cc trang web mng x hi u s dng mt s hnh thc ca graph database lm nhng vic m chng ta bit nh: kt bn, bn ca bn

    *Graph database thng c s dng gii quyt cc vn v mng. Trong thc t, hu ht cc trang web mng x hi u s dng mt s hnh thc ca graph database lm nhng vic m chng ta bit nh: kt bn, bn ca bnMt vn i vi vic m rng graph database l rt kh tm thy mt th con c lp, c ngha l rt kh ta phn tn graph database thnh nhiu mnh.

    *