1 Algorithmic Aspects of Searching in the Past Thomas Ottmann Institut für Informatik, Universität...

Post on 20-Dec-2015

216 views 2 download

Tags:

Transcript of 1 Algorithmic Aspects of Searching in the Past Thomas Ottmann Institut für Informatik, Universität...

1

Algorithmic Aspects of Searching in the Past

Thomas OttmannInstitut für Informatik, Universität Freiburg, Germany

ottmann@informatik.uni-freiburg.de

(Lecture 13: Persistence and Oblivious Data Structures)Advanced Algorithms & Data Structures

2

Overview

• Motivation: Oblivious and persistent structures

• Examples: Arrays, search trees, Z-stratified search trees, relaxation

• Making structures persistent: Structure-copying, path-copying-, DSST-method

• Application: Pointlocation

• Application: Time-evolving data: Capture and replay of whiteboard data,

in particular handwriting traces

• Oblivious structures: Randomized and uniquely represented structures, c-level

jump lists

3

Motivation

A structure storing a set of keys is called oblivious, if it is not possible to infer its

generation history from its current shape.

A structure is called persistent, if it supports access to multiple versions.

Partially persistent: All versions can be accessed but only the newest version can be

modified.

Fully persistent: All versions can be accessed and modified.

Confluently persistent: Two or more old versions can be combined into one new

version.

4

Example: Arrays

Array:

2 4 8 15 17 43 47 ……

Uniquely represented structure, hence, oblivious!

Access: In time O(log n) by binary search.

Update (Insertion, Deletion): (n)

Caution:Storage structure may still depend on generation history!

5

Example: Natural search trees

Only partially oblivious!

• Insertion history can sometimes be reconstructed.

• Deleted keys are not visible.

Access, insertion, deletion of keys may take time (n)

1, 3, 5, 7 5, 1, 3, 7

13

57 3

1

5

7

6

Example: Balanced search tree

Problem:Updates come in sudden bursts (Example: Recording ink-traces from pen input)Not enough time to serialize insertions and rebalancing transformationsSolution:Relaxed balancing: Carry out updates and rebalancing transformations concurrently!

10

6 15

2 9 11 23

5 7 20 30

6 7 9

10 11 15

5

2

20 23

7

Stratified search trees

....

…..

… …

… …

8

Example

9

Example

10

Insertion

Insert the new key among the leaves at the expected positionand deposit a „push-up-request“

… …

… … ....

…..

…..

x p

11

Iterative sequence of insertions

12

Handling of push-up-requests (1)

• A push-up-request either leads to a local structural change and halt, which can be

carried out in time O(1) (Case 1)

• or (exclusively) to a recursive shift of the push-up-requests to the next higher

stratum without any structural change (Case 2)

Case 1 [There is still room on the next higher stratum]

1 2 31 2

3

1 2

3 4 1 2 3 4

1

2 3

4 2 31 4

13

Handling of push-up-requests (2)

Case 2 [Next higher stratum is full]

Append a new apex, if node is pushed over topmost stratum boarder

1

2 3

4 5 1

2 3

4 5

14

Deletion

Locate x among the leaves.Deposit a removal request at x.Handle removal request.

… …

… … ....

…..

… …

15

Handling removal requests

Case 1 [Enough nodes at bottommost stratum]

Case 2 [Bottommost stratum too sparse]

Deposit „pull-down-request“ p q q

16

Handling of pull-down-requests (1)

1p 2 3 1p 2 3

1p 2 3 4 p 1 2 3 41 p2 3 4

1 2 3 4p

Case1 [There are enough nodes on next higher stratum]

Finite structural change andHalt!

17

Handling of pull-down-requests (2)

p

qq

p

Case 2 [Not enough nodes on next higher stratum]

Recursively shift pull-down-request to next higher stratum,but no structural change!

18

Z-stratified search trees: Observations

Insertions, deletions, and rebalancing-transformations (removal of , ) can be

arbitrarily interleaved.

The amortized restructuring costs per insertion or deletion are constant.

The generation history of a current version may be partially reconstructed (Sequence

of insertions and deletions are partially visible)

But:

• Update operations are always applied to the current version

• Z-stratified search trees are not persistent

19

Overview

• Motivation: Oblivious and persistent structures

• Examples: Arrays, search trees, Z-stratified search trees, relaxation

• Making structures persistent: Structure-copying, path-copying-, DSST-method

• Application: Pointlocation

• Application: Time-evolving data: Capture and replay of whiteboard data,

in particular handwriting traces

• Oblivious structures: Randomized and uniquely represented structures, c-level

jump lists

20

Simple methods for making structures persistent

• Copy structure and apply an update-operation to the copy, yields fully persistence

at the price of (n) time per update and space (m n) for m updates applied to

structures of size n. (Structure-copying method)

• Do nothing, but store a log-file of all updates! In order to access version i, first carry

out i updates, starting with the initial structure, and generate version i. (i) time per

access, (m) space for m operations.

• Hybrid-method: Store the complete sequence of updates and additionally each k-th

version for a suitably chosen k. Result: Time and space requirement increases at

least with a faktor sqr(m) !

Are there any better methods? …. for search trees….

21

Persistent search trees (1)

Path-copying method

5

1 7

3

0

version 0:

22

Persistent search trees (1)

Path-copying method

5 5

1 1 7

3 3

2

0 1

version 1:Insert (2)

23

Persistent search trees (1)

Path-copying method

5 5 5

1 1 1 7

3 3 3

2 4

0 1 2

version 1:Insert (2)version 2:Insert (4)

24

Persistent search trees (1)

Path-copying method

Restructuring costs: O(log n) per update operation

5 5 5

1 1 1 7

3 3 3

2 4

0 1 2

version 1:Insert (2)version 2:Insert (4)

25

Persistent search trees (2)

DSST-method: Extend each node by a time-stamped modification box

? All versionsbefore time t

All versionsafter time t

Modification boxes• initially empty• are filled bottom up

k

t: rplp rp

26

DSST method

5

1

3

7

version 0

27

DSST method

5

1

3

2

7

1 lp

version 0:

28

DSST method

5

1

3

2

3

4

7

1 lp

version 1:Insert (2)version 2:Insert (4)

29

DSST method

The amortized costs (time and space) per update operation are O(1)

5

1

3

2

3

4

72 rp

1 lp

version 1:Insert (2)version 2:Insert (4)

30

Overview

• Motivation: Oblivious and persistent structures

• Examples: Arrays, search trees, Z-stratified search trees, relaxation

• Making structures persistent: Structure-copying, path-copying-, DSST-method

• Application: Pointlocation

• Application: Time-evolving data: Capture and replay of whiteboard data,

in particular handwriting traces

• Oblivious structures: Randomized and uniquely represented structures, c-level

jump lists

31

Application: Planar Pointlocation

Suppose that the Euclidian plane is subdivided into polygons by n line segments that intersect only at their endpoints.

Given such a polygonal subdivision and an on-line sequence of query points in the plane, the planar point location problem, is to determine for each query point the polygon containing it.

Measure an algorithm by three parameters:

1) The preprocessing time.

2) The space required for the data structure.

3) The time per query.

32

Planar point location -- example

33

Planar point location -- example

34

Solving planar point location (Cont.)

Partition the plane into vertical slabs by drawing a vertical line through each endpoint.

Within each slab the lines are totally ordered.

Allocate a search tree per slab containing the lines at the leaves with each line associate the polygon above it.

Allocate another search tree on the x-coordinates of the vertical lines

35

Solving planar point location (Cont.)

To answer query

first find the appropriate slab

then search the slab to find the polygon

36

Planar point location -- example

37

Planar point location -- analysis

Query time is O(log n)

How about the space ?

(n2)

And so could be the preprocessing time

38

Planar point location -- bad example

Total # lines O(n), and number of lines in each slab is O(n).

39

Planar point location & persistence

So how do we improve the space bound ?

Key observation: The lists of the lines in adjacent slabs are very similar.

Create the search tree for the first slab.

Then obtain the next one by deleting the lines that end at the corresponding vertex and adding the lines that start at that vertex

How many insertions/deletions are there alltogether ?

2n

40

Planar point location & persistence (cont)

Updates should be persistent since we need all search trees at the end.

Partial persistence is enough.

Well, we already have the path copying method, lets use it.What do we get ?

O(n logn) space and O(n log n) preprocessing time.

We can improve the space bound to O(n) by using the DSST method.

41

Overview

• Motivation: Oblivious and persistent structures

• Examples: Arrays, search trees, Z-stratified search trees, relaxation

• Making structures persistent: Structure-copying, path-copying-, DSST-method

• Application: Pointlocation

• Application: Time-evolving data: Capture and replay of whiteboard data,

in particular handwriting traces

• Oblivious structures: Randomized and uniquely represented structures, c-level

jump lists

42

Author Audience

Data sources

Lightweight

content creation

Recorded learning module Document

Input media• Whiteboard• TouchScreen• Tablet PC

Time evolving data: Presentation recording

43

Cintiq Tablet (Wacom)

• Pen input, large display

• Eye contact with audience

44

Random access facility

Access of an ink-object sj corresponding to time tj requires the immediate presentation

of sj and of all ink-objects since t0

45

Whiteboard data

Whiteboard data-stream requires

• Fast insertion and deletion of graphical objects (lines, circles, pen-traces, …) in

large quantities,

• Partially persistent storage which allows:

• Fast access (display and „rendering“) of all data for a given time stamp,

• Synchronisability (as slave) with audio-stream (master).

Problem: Find a suitable method for storing the whiteboard-action stream!

46

Postprocessing

Whiteboard-stream is made persistent by the structure-copying method:

For each time stamp t a complete list of all objects visible on the board at time t is (pre-)computed and stored for random access.

Disadvantage: Highly redundant, very large data-volume

Advantage: Visible scrolling

Storage and representation of freehand ink-traces: Find a suitable compromise between conflicting goals:

Data-volume

Access cost (time) and dynamic replay (visible scrolling)

Individual, personal style

Skalability (vector- vs. raster-based-representation)

47

Overview

• Motivation: Oblivious and persistent structures

• Examples: Arrays, search trees, Z-stratified search trees, relaxation

• Making structures persistent: Structure-copying, path-copying-, DSST-method

• Application: Pointlocation

• Application: Time-evolving data: Capture and replay of whiteboard data,

in particular handwriting traces

• Oblivious structures: Randomized and uniquely represented structures, c-level

jump lists

48

Methods for making structures oblivious

Unique representation of the structure:

• Set/size uniqueness: For each set of n keys there is exactly one structure which

can store such a set.

• The storage is order unique, i.e. the nodes of the strucure are ordered and the

keys are stored in ascending order in nodes with ascending numbers.

Randomise the structure:

Assure that the expectation for the occurrence of a structure storing a set M of

keys is independent of the way how M was generated.

Observation: The address-assingment of pointers has to be subject under a

randomised regime!

49

Example of a randomised structure

Z-stratified search tree

On each stratum, randomlychoose the distribution oftrees from Z.

Insertion?Deletion?

… …

… … ....

…..

…..

50

Uniquely represented structures

(a) Generation history determines structure

(b) Set-uniqueness:Set determines structure

1, 3, 5, 7 5, 1, 3, 7

1, 3, 5, 7

13

57

3

1

5

7

13

57

51

Uniquely represented structures

(c) Size-uniqueness:Size determines structure

1, 3, 5, 7

2, 4, 5, 8 Common structure

Order-uniqueness: Fixed ordering of nodes determines where the keys are to be stored.

1

3

2

4

2

4

5

8

1

3

5

7

52

Set- and order-unique structures

Lower bounds?

Assumptions: A dictionary of size n is represented by a graph of n nodes.

Node degree finite (fixed),

Fixed order of the nodes,

i-th node stores i-largest key.

Operations allowed to change a graph:

Creation | Removal of a node

Pointer change

Exchange of keys

Theorem: For each set- and order-unique representation of a dictionary with n keys, at

least one of the operations access, insertion, or deletion must require time (n1/3).

53

Uniquely represented dictionaries

Problem: Find set-unique oder size-unique representations of the ADT „dictionary“

Known solutions:

(1) set-unique, oder-unique

Aragon/Seidel, FOCS 1989: Randomized Search Trees

universal

hash-function

Update as for priority search trees!

Search, insert, delete can be carried out in O(log n) expected time.

(s, h(s))

priority

s Î X

54

The Jelly Fish

(2) L. Snyder, 1976, set-unique, oder-unique

Upper Bound: Jelly Fish, search insert delete in time O(n).

body: n nodes

n tentacles of length n each

10

5

1

2

3

6

7

8

11

12

55

Lower bound for tree-based structures

set-unique, oder-unique

Lower bound: For “tree-based” structures the following holds:

Update-time · Search-time = Ω (n)

Number of nodes n ≤ h L + 1

L ≥ (n – 1)/h

At least L-1 keys must have moved from leaves to internal nodes. Therefore, update requires time Ω(L).

Delete x1

Insert xn+1 > xn

L leaves

·

xnx1

h

56

Cons-structures

(3) Sunder/Tarjan, STOC 1990, Upper bound: (Nearly) full, binary search trees

Einzige erlaubte Operation für Updates:

Search time O(log n)

EinfügenEntfernen in Zeit O(n) möglich

·

··

·31 15 353

L Rx L R

x

Cons, ,

57

Jump-lists

(Half-dynamic) 2-level jump-list

2-level jump-liste of size n

niini 22 )1(

Search: O(i) = O( ) timeInsertion: Deletion: O( ) time

n

n

22 4113

tail

0 i 2i n

(n-1)/i · i

2 3 5 7 8 10 11 12 14 17 19

58

Jump-lists: Dynamization

2-level-jump-list of size n niini 22 )1(

22 4113

search: O(i) = O(n) timeinsert delete

: O(n) time

Can be made fully dynamic:

(i-1)2 i2 n (i+1)2 (i+2)2

tail

0 i 2i n

(n-1)/i · i

2 3 5 7 8 10 11 12 14 17 19

59

3-level jump-lists

33 )1( ini

33 43,30 nnin 3

level 2

Search(x): locate x by followinglevel-2-pointers identifying i2 keys among which x may occur,level-1-pointers identifying i keys among which x may occur,level-0-pointers identifying x

time: O(i) = O(n1/3)

0 i 2i i2 i2+i 2·i2

60

3-level jump-lists

33 )1( ini

33 43,30 nnin 3

level 2

Update requiresChanging of 2 pointers on level 0Changing of i pointers on level 1Changing of all i pointers onlevel 2

Update time O(i) = O(n1/3)

0 i 2i i2 i2+i 2·i2

61

c-level jump-lists

Let

Lower levels:

level 0: all pointers of length 1:

...

level j: all pointers of legth ij-1:

...

level c/2 : ...

Upper levels:

level j: connect in a in list all nodes

1, 1·ij-1+1, 2· ij-1+1, 3· ij-1+1, ...

level c:

cc ini )1(

62

c-level jump-lists

Theorem:

For each c ≥ 3, the c-level jump-list is a size and order-unique representation

of dictionaries with the following characteristics:

Space requirement O(c·n)

Access time O(c·n1/c)

Update time , if n is even

, if n is odd

)( nO

)( 2/)1( ccnO

63

1 top-level tree with n leavesAll low-level trees for each sequence of n consecutive keys

Top-level tree direct search to the root of the currently active low-level treesSemi-dynamic structure: )1(22 22 kk n

12,)12( kk rrsn

low-level-tree-sizes+1 = top-level-tree-size )( nO

Shared-search-trees

Reduction of search time

64

Shared-search-trees

Pointers at:

Level 0 : (p-20) p (p+ 20)

Level 1 : (p-21) p (p+ 21)

Level k-2 : (p-2k-2) p (p+ 2k-2)

2(k-1) Pointers per node p, k = O(log n)

Search time O(log n)Space O(n log n)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

1

0

65

Insertion: Determine insertion position;Change all pointers jumping over the insertion position;Add 2 new pointers per level;Completely rebuild top-level tree.

)1(22 22 kk n

0

1

66

Number of pointerchanges: level 0:

2·20+2level 1:

2·21+2…level k-2:

2·2k-2+2 2·(2k-1-1)+2(k-1) =

)nO(

)1(22 22 kk n

0

1

67

Shared search trees: Summary

Theorem: Shared search trees are a size- und order-unique representation of

dictionaries with the following characteristics:

Space requirement: O(n log n)

Search time: O(log n)

Upadate time: O( n )

Open problem:

Is there a size- and order-unique representation of by graphs with bounded node

degree, search time O(log n), and update time o(n) (e.g.. O(n))?