UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

download UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

of 47

Transcript of UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    1/47

    TheRelationalModel

    Ser eAbiteboul

    INRIASaclay,CollgedeFranceetENSCachan

    20/03/2012 1

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    2/47

    epr nc p es

    Abstraction n versa ty

    Independence

    s rac on:

    e

    re a ona

    mo eUniversality: mainfunctionalities

    Independence: theviewsrevisited

    OptimizationComplexityandexpressiveness

    Conclusion

    20/03/2012 2

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    3/47

    Theprinciples

    3/20/2012 3

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    4/47

    Goal:the

    management

    of

    large

    amounts

    of

    data

    argeamoun so a a: a a ase

    Softwarethatdoesthis:DBMS

    ,

    Characteristicsofthedata

    Size(giga,tera,etc.).

    Sharedamong

    many

    users

    and

    programs

    Maybedistributedgeographically

    Heterogeneousstorage:harddisk,network

    3/20/2012 4

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    5/47

    Thedatamanagementsystemactsasamediatorbetweenintelligentusers

    andobjectsthatstoreinformation

    , m , ,

    Bogart)Sance(t,s,h))

    O et quelle

    heure puis-je

    voir un film

    intget(intkey){

    inthash=(key%T

    S);while(table[h

    ash =NULL&&ta

    Thequestions

    are

    translated

    into

    first

    order

    logic

    and

    then

    into

    programs

    ble[hash]

    >getKey()=key)

    hash=(hash

    withpreciseandunambiguoussyntaxandsemantics

    Alicedoesnotwanttowritethisprogram;shedoesnothaveto

    3/20/2012 5

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    6/47

    st

    Datamodel Definitionlanguage fordescribingthedata

    Manipulationlanguage

    (queries

    and

    updates)

    Simpledatastructure Relations

    Trees

    Graphs

    Formallanguageforqueries og cs

    Declarativevs.Procedural

    Graphical

    languages

    3/20/2012 6ComplexgraphicalquerieswithMSAccess

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    7/47

    Therelationalmodel:Codd1970

    Dataare

    represented

    as

    tables

    Queriesareexpressedinrelationalcalculus:

    declarative

    Inpractice,

    aricher

    language:

    SQL

    ery success u o sc en ca yan n us r a y CommercialsystemssuchasOracle,IBMsDB2

    DBMSonpersonalcomputerssuchasMSAccess

    3/20/2012 7

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    8/47

    nd

    sare es gne ocap urea a a n ewor

    for

    allkinds

    of

    applications Richfunctionalities:seefurther

    Inreality

    Toointenseapplicationsrequirespecialized

    softwareTodaymoreandmorespecializedsystems

    3/20/2012 8

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    9/47

    Weneedservicessuchas

    Concurrency

    and

    transactions e a yan ecur y

    Datadistribution

    Scaling

    Volumeofdata

    Volumeofrequests

    Performance Responsetime: Thetimeperoperation

    Throughput: Thenumberofoperationspertimeunit

    3/20/2012 9

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    10/47

    Largevarietyofapplicationswithimportant

    needsfor

    data

    management

    Twomainclasses

    OLTP:Online

    Transaction

    Processing

    Transactional

    Ecommerce,banking,etc..

    Simpletransactions,knowninadvance

    *

    OLAP:Online

    Analytical

    Processing

    Decision

    making

    Businessintelli ence ueries

    Oftenverycomplexqueriesinvolvingaggregatefunctions

    Multidimensionalqueries:

    e.g.,

    date,

    country,

    product

    3/20/2012 10

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    11/47

    3rd principle:

    Independencephysical/logical/external

    Separationinto

    three

    levels

    External level

    Physicallevel:physicalorganizationofdataondisk,diskmanagement,schemas,indexes,transaction,log

    Lo ic:lo icalor anizationofdatainaschema uer

    Logicallevel

    andupdate

    processing

    Externally: views,API,programmingenvironments

    Independence

    Physical:

    We

    can

    change

    the

    physical

    organization

    w ou c ang ng e og ca eve

    Logical:Wecanevolvethelogicallevelwithoutmodifyingtheapplications

    External:

    We

    can

    change

    or

    add

    views

    without

    affectingthelogicallevel

    3/20/2012 11

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    12/47

    Abstraction

    Therelationalmodel

    20/03/2012 12

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    13/47

    20/03/2012 13

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    14/47

    qHB ={s,h| d,t(Film(t,d, HumphreyBogart)Sance

    (t,

    s,

    h

    )

    }

    Inpractice,usingasyntaxthatiseasiertounderstand:

    :

    select salle,heure

    ,

    where Film.titre =Sance.titre and acteur= HumphreyBogart

    3/20/2012 14

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    15/47

    inalgebraic

    evaluated

    efficiently

    20/03/2012 15

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    16/47

    Trees

    IMS,IBM

    late

    60s,

    70s

    Graphs

    Codasyl

    Stillveryused

    Ahierarchyofrecordswith

    Agraphofrecordswithkeys

    keys

    Supplier(sno,

    sname,sadd)Supplier(sno,

    sname,sadd)

    Part(pno,

    pname)

    Littleabstraction

    Part(pno,

    pname,qty,Order(ono,

    Languages Navigational

    3/20/201216

    price)q y,pr ce

    Procedural

    Recordatattime

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    17/47

    Trees

    XML

    Graphs

    SemanticWeb

    &

    RDF

    ExchangeformatfortheWeb

    Formatforrepresentingknowledge

    Standard Querylanguages:Xpath,

    Standard Querylanguage:SPARQL

    Xquery

    Developing

    very

    fast

    Developingveryfast

    Abstraction

    3/20/2012 17

    Highlevel

    languages

    Wewilldiscussthem

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    18/47

    Universality:functionalities

    Herewithaver relationalview oint

    3/20/2012 18

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    19/47

    Thecoreoftheproblem

    Beable

    to

    support

    Terabytesofdata

    Millionsofrequestsperday

    Forthis

    two

    main

    tools

    Parallelism

    20/03/2012 19

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    20/47

    awsa ou e a a Toprotectdata Todesignschemas

    Tooptimize

    queries

    To

    explain

    data

    Examples Sance[titre] Film[titre] Inclusiondependency

    Sance:salle heure titre Functional

    dependencies

    Onlyonemovieisshownatatimeinatheater

    t,s,h(Sance(t,s,h) d,a(Film(t,d,a))) tgds

    t,t,

    s,

    h(Sance(t,

    s,

    h)

    Sance(t,s,

    h) t=t)

    egds

    Someofthemostsophisticatedevelopmentsindbtheory

    3/20/2012 20

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    21/47

    ses mp e epen enc esup ocomp exseman c a amodels

    Person Child Car

    John Toto BMW

    John Toto 2chevauxJohn Zaza BMW

    John Zaza 2Chevaux

    Sue Lulu

    Updateanomalies

    u va ues

    3/20/2012 21

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    22/47

    Atomicity:thesequenceofoperationsisindivisible;incaseoffailure,either

    all operationsarecompletedorallarecanceled

    Consistenc :The

    consistenc

    ro ert

    ensures

    that

    an

    transaction

    the

    databaseperformswilltakeitfromoneconsistentstatetoanother.(So,

    consistencystatesthatonlyconsistentdatawillbewrittentothe

    .

    Isolation:When

    two

    transactions

    A

    and

    B

    are

    executed

    at

    the

    same

    time,

    the

    changesmadebyAarenotvisibletoBuntiltransactionAiscompleted

    an va ate comm t .

    Durability:Oncevalidated,thestateofthedatabasemustbepermanent,and

    notechnical

    problem

    should

    lead

    to

    cancelling

    of

    transaction

    operations

    20/03/2012 22

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    23/47

    TheDBMSmustresisttofailures

    A

    variety

    of

    techniques Journal

    Backupcopies

    Hotstandby:

    second

    system

    running

    simultaneously

    asreasonableforanapplication

    3/20/2012 23

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    24/47

    Typicallythecase

    Whenintegratingseveraldatasources

    Organizationswith

    many

    branches

    Activitiesinvolvingseveralcompanies

    Whenusingdistributiontogetbetterperformance

    Datalocalization

    &

    global

    query

    optimization

    Datafragmentation

    Typicallyhorizontalpartitioning

    Distributed

    transactions Twophasecommit

    TypicallytooheavyforWebapplications

    3/20/2012 24

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    25/47

    Security

    Protect

    content

    against

    unauthorized

    users

    (humans

    or

    programs

    Confidentiality:accesscontrol,authentication,authorization

    Datacleaning

    Datamining

    Data

    streaming

    Spatiotemporaldata

    Etc.

    20/03/2012 25

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    26/47

    Independence:views

    20/03/2012 26

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    27/47

    Definition:

    Functionf:

    Database

    View

    Oneofthemostfundamentaltopicsindatabases

    db1

    db2

    v1

    db3Database

    states

    states

    db4

    3/20/2012 27db6

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    28/47

    Classicalquery

    Defineview

    Unisys.com/snow(Aspen)

    Implicitdefinitionandrecursion

    Yahoo.com/GetHotels(Aspen)

    DatalogDependencies(tgds)

    Mixbetween

    XMLn

    Colorado

    resor resort

    3/20/2012 28

    n

    Aspen

    n

    LakeTahoe

    f gt

    2meters1

    meter

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    29/47

    ntent ona ater a ze

    Update:do

    nothing

    uer :com lexUpdate:

    propagate

    Base view:costly

    viewmaintenance

    View

    base:

    ambiguous

    Query:simpleQueryvs.Update

    The databasetradeoff

    3/20/2012 29

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    30/47

    Intentional:mediatior Materialized:warehouse

    Queries

    are

    complex Updates

    are

    complex

    Definitions

    Global

    as

    view:

    v

    =

    (db1,

    ,

    dbn) Localasview: dbi=i(v) foreachI

    Arbitrarycomplexconstraints betweenthedatabaseandtheviews Sometimescalledalignmentsbetweenthem

    3/20/2012 30

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    31/47

    Optimization

    20/03/2012 31

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    32/47

    Thequeriesarebasedonrelationalcalculus,alogicallanguage,

    simpleandunderstandablebypeopleespeciallyinvariants

    Acalculusquerycaneasilybetranslatedintoanexpressionof

    Relationalalgebra

    is

    alimited

    model

    of

    computation

    (it

    does

    not

    allowcom utin arbitrar functions .Thatiswh itis ossible

    tooptimizealgebraicexpressionsevaluation

    Finally,for

    this

    language,

    parallelism

    allows

    scaling

    to

    very

    large

    databases(classAC0)

    3/20/2012 32

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    33/47

    Foreachsinsancedo complexityin n2

    (b) If fewtuplespasstheselection complexityin n

    (c) Usingtheindex complexity constant

    3/20/2012 33

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    34/47

    20/03/2012 34

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    35/47

    Usingaccessstructures

    Hash

    Usingsophisticatedalgorithm

    Costevaluation

    to

    select

    an

    execution

    plan

    Technique:Rewritequeriesbasedonheuristicstoexploreonly

    art of it

    3/20/2012 35

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    36/47

    Theseproblemscangreatly

    benefitforparallelismFiltre

    fyp ca y v e e a a

    Thisisnottrueforallproblems

    ff

    3/20/2012 36

    ff

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    37/47

    Complexityandexpressivity

    20/03/2012 37

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    38/47

    Complexity

    http://www.cs.rice.edu/~vardi/papers/sigmod08.pdf

    omp ex ty: ora xe queryq,

    Testinggiven(I,t)whethertisinq(I)asafunctionofthe

    FocusonBooleanquerytonotdependonoutputsize

    Verydifferent

    and

    if

    mixed

    the

    dependency

    on

    query

    t icall hidesthede endenc onthedata

    Datacomplexityasafunctionofthesizeofthedata(queryfixed)

    Querycomplexityasafunctionofthesizeofthequery(datafixed)

    3/20/2012 38

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    39/47

    Relationalcalculusisinlogspace

    The

    test

    can

    be

    performed

    using

    a

    space

    logarithmic

    in

    the

    sizeofthedata

    Thisisprimarilybecausethearityoftablesisfixed;soa

    tup euses ogspace

    logspace NC ( ptime)

    Goodpotentialforparallelization;seefurther

    20/03/2012 39

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    40/47

    Thecomplexityispspace

    Intuition:an

    intermediary

    result

    may

    be

    very

    large

    is

    it

    isthejoinofmanyrelations

    Dependsmoreonthenumberofvariablesusedinthe

    query

    that

    in

    its

    actual

    sizeNaiveevaluationof(PiA(RjoinS))requiresmorespacethat

    thatof(PiA(R)cap^PiA(S))

    Polynomial

    in

    the

    tree

    width

    3/20/2012 40

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    41/47

    Datacomplexity:Constantparalleltime

    AC0

    Acomplexityclassusedincircuitcomplexity

    Theproblemsthatmaybesolvedwithcircuitsofconstantdepth

    an po ynom a s ze,w t un m te an n gatesan

    gates.

    20/03/2012 41

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    42/47

    Onecannotcomputetransitiveclosure

    Addafixed

    point

    Inflationary:fixpoint

    Ornot:while

    Vardi theorem:with

    an

    order

    on

    the

    domain

    while=pspace

    3/20/2012 42

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    43/47

    Onecannottestifarelationhasanevennumberof

    tuplesAbiteboulVianu

    Characterizationofwhatcanbecomputedwithfixpoint

    and

    whileTheorem:fixpoint=whileiff ptime=pspace

    3/20/2012 43

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    44/47

    Conclusion

    20/03/2012 44

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    45/47

    Andthen:alwaysquestioneverything

    Revisitthe

    models,

    languages,

    principles

    Why?

    Toscaletoalwaysmoredataandqueries

    Tosupportextremeapplicationsthatcannotbesupportedbystandardtechnology:

    Visatransactions

    To

    facilitate

    application

    development Tooffermoreintermsofperformance,reliability,security,etc..

    3/20/2012 45

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    46/47

    Relationalmodel BeyondEntriesinrelations =atomic values Entriesaresetofvalues

    Missingdata,probabilisiticdata

    ACID Weakerconcurrency

    Universal Specialized: noSQL

    Dataarepersistant Queriesondataflows

    Dataarestatic Data&behavior:Objectdatabases

    c ve

    a a ases

    Constraints arestatic(FDs,etc.) Triggers

    3/20/2012 46

  • 8/2/2019 UPL628050451174393135 Abiteboul Cours 14 Mars Relationnel

    47/47

    Merci!

    20/03/2012 47