1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond...

21
1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary, 2011 Luc Véro Another one of these No-SQL talks? IMS, hierarchical model, V-relations, Jacobs’s calculus, Hardgrave’s broom, nested relations, format model, complex objects, logical data model, object databases, lambda calculus, regular trees, F-logic, NF1F, NF2, COL, IFO, LDL, IQL,

Transcript of 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond...

Page 1: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

1

S. Abiteboul – INRIA Saclay

Trees, semistructured data,and other strange ways to go beyond tables

Serge Abiteboul INRIA & ENS CachanPODS 30th Anniversary, 2011

Luc Véro

Another one of these No-SQL

talks?IMS, hierarchical model, V-relations, Jacobs’s calculus, Hardgrave’s broom, nested relations, format model, complex objects, logical data model, object databases, lambda calculus, regular trees, F-logic, NF1F, NF2, COL, IFO, LDL, IQL, SGML, HTML, ASN.1, XML, YAML, JSON…

serge
Les grands succès : 10 mn Le futur: séquence environ de 20 mn)Les révolutions du Web et leurs risqueson pourrait, sur une question posée par l’animateur, avoir un premier éclairage de Serge puis 2/3 questions de la salle dont on précise qu’elles se rapportent à la question posée (séquence environ 20 mn).
Page 2: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

2

S. Abiteboul – INRIA Saclay

Trees are useless n

A tree is a tree. How many more do you have to look at?

Ronald Reagan, governor of California, opposing the expansion of Redwood National Park (1966)

We don’t need anything beyond relations. These things are useless. Reject!

Anonymous referee (circa 1990)

Knowledge lives in trees

But of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die.Genesis, 2. 17

Introduction

Theorem: Information lives in trees and not in relations

Proof: the Bible does not say « But of the two dimensional table of knowledge of good and evil … »

Page 3: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

3

S. Abiteboul – INRIA Saclay

Organization

Introduction

Hierarchical data model 60s

Nested relations 80s

Complex objects early 90s

Semistructured data & unranked labeled trees late 90s

Unranked labeled ordered trees, aka XML early 00s

Evolving trees, aka Active XML mid 00s

Cycles 90s to now

Conclusion

More or less chronological

Page 4: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

4

S. Abiteboul – INRIA Saclay

For lack of time, we will ignore IMS and the hierarchical model• The language was purely navigational anyway

We will also ignore early works such as Makinouchi, Jacobs or Hardgrave

We will start with N1NF• François Bancilhon in France

• Hans Schek in Germany

• PhD thesis of Nicole Bidoit

Page 5: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

5

S. Abiteboul – INRIA Saclay

Non-First-Normal-Form N1NF

Name Child Car

Alice Toto Jaguar

Alice Lulu 2CV

Bob Mimi Mustang

Bob Zaza PriusA quarter on tables. Now what?

Trees!

Name Child Car

Alice TotoLulu

Jaguar2CV

Bob MimiZaza

MustangPrius

Data would prefer to live in infamous nested relations aka V-relationsaka N1NF relationsaka NF2 relations

Data live in 1NF relationsDB101

Page 6: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

6

S. Abiteboul – INRIA Saclay

The devil is in the details

V-relations N1NF-relationsA B

1 1

1 2

2 2

2 3

3 1

3 3

A B C

1 12

1

2 23

3 13

34

A C

1 1

3 3

3 4

A

1

2

3

A B

1

1 1

1 2

1 3

1 12

1 13

1 23

1 123

A is not a keyThe size is now possibly exponential in the size of the domain

A is a keyNo new power

Page 7: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

7

S. Abiteboul – INRIA Saclay

Complex object model tuple and set constructors used freely

*

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

*

* *

Name

Peter

Cars

Name

2CV

Year

1976

Name

Mimi

Sex

F

Children

*

Name

Zaza

Sex

F

Page 8: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

8

S. Abiteboul – INRIA Saclay

A logic and algebra for complex objects

Logic: main novelty is set variables – non first-order

Example: AbouBanat Query

{ T.Father | Families(T) X T.Children ( X.Sex = F ) }

Algebra: powerset operation, unnest/nest

Name Child Car

Alice Toto

Bob MimiZaza

Mustang

Bob Lulu Prius

Name Child Car

Bob Mimi Mustang

Bob Zaza Mustang

Bob Lulu Prius

Name Child Car

Bob MimiZazaLulu

MustangPrius

Page 9: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

9

S. Abiteboul – INRIA Saclay

Results

Equivalence theorem: algebra and logic have same expressive power

Remark: one can compute TC using algebra/logic (waoh! Cool!)

Also studied: fixpoint, datalog, while…

Complexity: each new level of nesting introduces one more exponential

Need to control the use of powerset

2n 2 2n….

Page 10: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

10

S. Abiteboul – INRIA Saclay

From complex objects to semistructured data

*

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

*

* *

Name

Peter

Cars

Name

2CV

Year

1976

Name

Mimi

Sex

F

Children

*

Name

Zaza

Sex

F

Page 11: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

11

S. Abiteboul – INRIA Saclay

Revolution 1: more flexibility

*

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

*

* *

Name

Peter

Cars

Name

2CV

Year

1976

Name

Mimi

Sex

F

Children

*

Name

Zaza

Sex

F

Annotations

Trash

Page 12: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

12

S. Abiteboul – INRIA Saclay

Revolution 2: Remove some nodes; name all

*

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

*

* *

Name

Peter

Cars

Name

2CV

Year

1976

Name

Zaza

Sex

F

Ann.

Trash

Family Family

Car CarChild Child

Page 13: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

13

S. Abiteboul – INRIA Saclay

Unranked label trees

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

Name

Peter

Cars

Name

2CV

Year

1976

Name

Zaza

Sex

F

Ann.

Trash

Family Family

Car CarChild Child

Page 14: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

14

S. Abiteboul – INRIA Saclay

This is better adapted to a Web context

Self describing data: No separation between schema and data

Flexibility

Not such a big deal

May be the main contribution is the format?

<families><family><name>Peter</Name><Cars><Car><Name>BMW</Name><Year>2010</Year></Car></Cars><Children><Child> …

Plus ça change, plus c’est la même choseThe more things change, the more they stay the same

Page 15: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

15

S. Abiteboul – INRIA Saclay

What else? The trees are unbounded

Like nested relations, trees are unbounded in width

Unlike nested relations, they are unbounded in depth

One can simulate 2 counter machines with 2 branches• Do applications simulate 2 counter machines with XML

documents?

• I am still looking for one

• XML documents are rarely deep

But even for bounded trees there are fun questions: e.g., is the equivalence of monadic datalog decidable for bounded data trees

r

a$

aa

aa

aa

aa

a

a$

ab

ab

ab

Page 16: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

16

S. Abiteboul – INRIA Saclay

What else? the trees are orderedUnranked labeled ordered trees = XML

Ignore order

Classical optimization

Respect order

Totally new ball game

Bring in tree automata

Reconcile

Order is often painful for optimization

Page 17: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

17

S. Abiteboul – INRIA Saclay

Selling argument is the Web…

The move from relations to trees is interesting

But the move from centralized to distributed as well

and much less investigated

Where the fun is:• Scale is beyond what we though was thinkable

• Machines are totally autonomous

• Schema replaced by numerous ontologies

• True/false logic replaced by inconsistency, probabilities, trust, belief…

Page 18: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

18

S. Abiteboul – INRIA Saclay

And the trees are evolving (aka Active XML)

An old idea from object databases: mix data and computation

Resort

Resorts

snowcondName

Aspen

State

Colorado

!Unisys.com/snow(“Aspen”)

hotels

Unit DepthMeter 1

!Yahoo.com/GetHotels<city name=“Aspen”/>)

snow

Page 19: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

19

S. Abiteboul – INRIA Saclay

And there are cycles

For lack of time, I will not mention the network model [Codasyl 1969]

• The language was purely navigational anyway

If I would add references to XML, I’d get cycles

Lots of models for graph data, e.g., IQL

Some fun results: e.g., some copy elimination problem when trying to obtain a ChandraHarel completeness for IQL

• Similar issue for unordered trees [recent result with Vianu]

Person

Name Spouse

Adam Person

Name Spouse

Eve

Paris C. Kanellakis

Page 20: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

20

S. Abiteboul – INRIA Saclay

Conclusion

Is this a good time to do research on trees in databases?

The best time to plant a tree was 20 years ago. 

The next best time is now. 

Chinese Proverb

Page 21: 1 S. Abiteboul – INRIA Saclay Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary,

Advertisement

Book on Web data management to appear at Cambridge University Press http://webdam.inria.fr/Jorge