Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

48
Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen

Transcript of Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

Page 1: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

Sum Selection in Arrays

Allan Grønlund Jørgensen

Kvalifikationseksamen

Page 2: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

2 KvalifikationseksamenAllan Grønlund Jørgensen

Progress Report

Priority Queues Resilient to Memory Faults, with Moruz, Mølhave (WADS 07)Optimal Resilient Dictionaries, with Brodal, Fagerberg, Finocchi, Grandoni, Italiano, Moruz, Mølhave (ESA07)Comparison Based Dictionaries: Fault Tolerance versus I/O Efficiency, with Brodal and Mølhave (Manuscript-ICALP08)

A Linear Time Algorithm for the k Maximal Sums Problem, with Brodal (MFCS 07)Sum Selection, with Brodal. (Manuscript-ICALP08)

Fault Tolerance: Sum Selection:

Page 3: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

3 KvalifikationseksamenAllan Grønlund Jørgensen

-8 742 2 -52

34-1

9-50

411

-4343

-51-9

-8742

2-52

Page 4: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

4 KvalifikationseksamenAllan Grønlund Jørgensen

Outline

Introduction

The k maximal sums problem

Length constrained k maximal sums problem

Sum selection problem

Summary and plans for the future

Page 5: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

5 KvalifikationseksamenAllan Grønlund Jørgensen

The Maximum Sum Problem

Given array of numbers, find the largest sum

7 -12-3 1 6 -3 5 -2

(4,7,9)

Page 6: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

6 KvalifikationseksamenAllan Grønlund Jørgensen

4 911 -4 778

Kadanes Algorithm(’77)

Scan array from left and in step i update:

Largest suffix sum (Largest sum ending at A[i])

Largest sum so far (Largest sum in A[1,…,i])

7 -121 1 6 -3 5 -2

1 8 9

Page 7: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

7 KvalifikationseksamenAllan Grønlund Jørgensen

Outline

Introduction

The k maximal sums problem

Length constrained k maximal sums problem

Sum selection problem

Summary and plans for the future

Page 8: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

8 KvalifikationseksamenAllan Grønlund Jørgensen

The k Maximal Sums Problem

Given array of numbers, find the k largest sums (they may overlap)

Example with k=2

7 -12-3 1 6 -3 5 -2

9

8

Page 9: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

9 KvalifikationseksamenAllan Grønlund Jørgensen

Goal

Optimal O(n+k) time algorithm

outputting the k maximal sums

Page 10: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

10 KvalifikationseksamenAllan Grønlund Jørgensen

Main Idea(Intuition)

Build all sums and insert them into a heap ordered binary tree

Find the k largest sums using Frederickson’s heap selection algorithm(’93) in O(k) time

Page 11: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

11 KvalifikationseksamenAllan Grønlund Jørgensen

Example(k=4)

9

86

4

-8

-12

73

-11

-3

-3

-5 5

21

-12 1 6 -3 5

Fredericksons algorithm finds the red nodes in O(k) time (no particular order)

Page 12: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

12 KvalifikationseksamenAllan Grønlund Jørgensen

The Iheap

It is a heap ordered binary tree

Supports insertions in amortized constant time

Page 13: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

13 KvalifikationseksamenAllan Grønlund Jørgensen

Inserting 7 in an Iheap

9

3

4

5

7

7

3

7

4

5

7T1

T2

T3

T4

T4

T3

T2

3

4

5

T3

T4

T4

T3

T2

Page 14: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

14 KvalifikationseksamenAllan Grønlund Jørgensen

Main Issue

There are n(n+1)/2 = (n2) sums

Constructing and inserting (n2) sums into a heap ordered binary tree takes (n2) time

Page 15: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

15 KvalifikationseksamenAllan Grønlund Jørgensen

Grouping Sums

The sums are grouped by their endpoint in the array

j

isj sAsumjisumjiQ 1|),,(

7 -12-3 1 6 -3 5 -2

(1,4,-7)

(2,4,-4)

(3,4,-11)

(4,4,1)

Q4:

Page 16: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

16 KvalifikationseksamenAllan Grønlund Jørgensen

(4,5,7)

(3,5,-5)

(2,5,2)

(1,5,-1)

7 -12-3 1 6 -3 5 -2

(1,4,-7)

(2,4,-4)

(3,4,-11)

(4,4,1)

(5,5,6)

Constructing Q5 from Q4

}),1,(|])[,,{(])}[,,{( 1j-j QsjijAsjijAjjQ

Q4:Q5:

Page 17: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

17 KvalifikationseksamenAllan Grønlund Jørgensen

Main Idea Continued

Represent each Q set as a heap ordered binary tree H

Combine all heaps by assembling them into one big heap using dummy infinity keys

Page 18: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

18 KvalifikationseksamenAllan Grønlund Jørgensen

The Assembled Heap

H5H4H3

H2H1

Page 19: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

19 KvalifikationseksamenAllan Grønlund Jørgensen

Representing Q Sets:

Each set Qj is represent by a tuple < j , Hj >

Hj is an Iheap containing all j sums from Qj

j is a number must be added to all elements

We get the following construction equation

< 0 , H0 > = < 0, { } >

< j+1 , Hj+1 > = < j + A[j+1], Hj {-j}>

Page 20: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

20 KvalifikationseksamenAllan Grønlund Jørgensen

Example

31 0

7 -12-3

0

342

0

3

-4

83

)3(3Set

)0(-Insert

01

0

δ

)4(7Set

)3(Insert

12

1

)8(12Set

)4(Insert

23

2

{-3}

{4,7}

{-8,-5,-12}

< 0 , H0 > = < 0, { } >

< j+1 , Hj+1 > = < j + A[j+1], Hj {-j}>

Page 21: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

21 KvalifikationseksamenAllan Grønlund Jørgensen

Analysis of Pair Construction

Building each pair takes amortized constant time (One insertion into Iheap)!! But the old version disappearsSolution: Partial Persistence (Driscoll.. ‘89)

9

3

4

5T1

T2

T3

T4

9

57T1

3

4

5

T3

T4

T4

T3

T2

7

insert

Version i Version i+1

Page 22: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

25 KvalifikationseksamenAllan Grønlund Jørgensen

Resume

Build all pairs in O(n) time

Join them into a single heap in O(n) time

Use Fredericksons algorithm to get the k+n-1 largest and discard the dummies in O(n+k) time

O(n+k) time algorithm

H5H4H3

H2H1

Page 23: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

26 KvalifikationseksamenAllan Grønlund Jørgensen

Space Reduction

Current algorithm uses O(n+k) time and additional space

The input array is considered read only

Kadanes algorithm uses O(1) additional space

Reduce the additional space usage to O(k)

Page 24: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

29 KvalifikationseksamenAllan Grønlund Jørgensen

Higher Dimensions

Can be reduced to 1D case.

……..

space )( time,)( 2 knOknmO

space )( time,)(1

12

21 knOknnO

d

ii

d

ii

For an m x n matrix, we get

In general we get

Page 25: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

30 KvalifikationseksamenAllan Grønlund Jørgensen

Outline

Introduction

The k maximal sums problem

Length constrained k maximal sums problem

Sum selection problem

Summary and plans for the future

Page 26: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

31 KvalifikationseksamenAllan Grønlund Jørgensen

Length Constrained k Maximal Sums Problem

Each sum must be an aggregate of at least l numbers and at most u numbers

Example with l=3 and u=5

7 -66612 8 7 -6 4 -2

Best: 19

Best Valid: 13

Page 27: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

32 KvalifikationseksamenAllan Grønlund Jørgensen

Goal

Optimal O(n+k) time algorithm outputting the k maximal sums

with length between l and u

Page 28: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

33 KvalifikationseksamenAllan Grønlund Jørgensen

First Approach

Use the same idea as before but redefine Q to match the length criteria

Constructing equation is almost identical but requires a deletion

H5H4H3

H2H1

Page 29: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

34 KvalifikationseksamenAllan Grønlund Jørgensen

(5,7,2)

(4,7,-8)

(3,7,34)

(2,7,51)

(1,7,46)

Constructing Q SetsUsing Deletions (l=3,u=6)

(1,6,56)

(2,6,61)

(3,6,44)

(4,6,2)

-5 17 -10 042 12 -10 666

Page 30: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

35 KvalifikationseksamenAllan Grønlund Jørgensen

Result

Same algorithm as before using the new way of constructing the next heap

Deleting an element in a heap of size n with constant time insertion takes O(log n)

O(nlog(u-l) +k) time alg.

H5H4H3

H2H1

Page 31: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

36 KvalifikationseksamenAllan Grønlund Jørgensen

A Better Way of Constructing the Q

sets(u=8,l=4)

-5 17 -10 042 12 -10 11 7 7 666

Slab 1 Slab 2

0

-10

1

11

13

l -1

j + l -1

Divide into slabs of size u-l+1For each slab build two sets of heaps: One from left (L) and one from right (R) For each index j group all sums of length between l and u ending at j+l-1 using the sets from above and two constantsExample j=3 in slab 2

32

0+693=693

-10+693=683

1+680=681

11+680=691

13+680=693

:L3

:R3

:R2

Page 32: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

37 KvalifikationseksamenAllan Grønlund Jørgensen

Result

Same algorithm using the new way to group sums.

Building the L and R sets takes O(u-l) time for each slab.

O(n+k) time algorithm

H5H4H3

H2H1

Page 33: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

38 KvalifikationseksamenAllan Grønlund Jørgensen

Outline

Introduction

The k maximal sums problem

Length constrained k maximal sums problem

Sum selection problem

Summary and plans for the future

Page 34: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

39 KvalifikationseksamenAllan Grønlund Jørgensen

Sum Selection

Given array of numbers, find the k’th largest sumExample with k=5

-56 -52 -50 -43 -14 -13 -6 -4 2 7 9 29 36 38 42

The 15 sums in sorted order:

-8 742 2 -52

34-1

9-50

411

-4343

-51-9

-8742

2-52

Page 35: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

40 KvalifikationseksamenAllan Grønlund Jørgensen

First Solution

Use the algorithm finding the k maximal sums to find the k largest and output the smallest of these

Algorithm uses O(n+k) time.

What if is large?

Page 36: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

41 KvalifikationseksamenAllan Grønlund Jørgensen

Lower BoundReduction from the Cartesian Sum Problem (X+Y)A lower bound of (|Y| + |Y|log(k/|Y|)) (Frederickson and Johnson ’82)

7

-5

9

13

2

12

1

-3

8

X Y

3

Page 37: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

42 KvalifikationseksamenAllan Grønlund Jørgensen

Reduction

21 xx

7 -5 9 13 2 12 1 -3 8X Y

12 -14 -4 117+15 10 -11 -4 11

14 yx

32 xx 43 xx 21 yy 32 yy 43 yy 54 yy

=

-4

113 = 117 - 4

|)||(|}||max{ | YXYzXzz

Page 38: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

43 KvalifikationseksamenAllan Grønlund Jørgensen

Result

An (n+nlog(k/n)) lower bound for the sum selection problem

Page 39: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

44 KvalifikationseksamenAllan Grønlund Jørgensen

Goal

Optimal O(n+nlog(k/n)) time algorithm for selecting the k’th

largest sum

Page 40: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

45 KvalifikationseksamenAllan Grønlund Jørgensen

Algorithm

Reduction to selection in sorted arrays and weight balanced search trees

Frederickson and Johnson(’82) already solved selection in n arrays in optimal O(n + nlog(k/n)) time

Adapt this algorithm such that it also works on weight balanced trees

Page 41: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

46 KvalifikationseksamenAllan Grønlund Jørgensen

Block Heap

54,49,42

39,31,25

24,12,7 23,22,21

17,13,11

10,5,1 9,6,3

Heap ordered binary tree

Each node stores B sorted elements

Inserting a block of B elements takes O(B) time.

Page 42: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

47 KvalifikationseksamenAllan Grønlund Jørgensen

22

54

Reducing Sum Selection to Selection in Arrays and

Trees-10 042 12 -10 11 7 2 666

Slab

Divide into slabs of size k/nEach index j is associated with two data structures that together cover all sums ending at index jFirst data structure is all sums starting in current slab and is named WBj

The second is the rest and is named BHj

ExampleExtending within a slabExtending to new slab - a block of k/n elements is inserted to BH

2

9

20

666

668

675

686

0

0

WB:WB:

BH:

676

720

688

10

BH:

WB:

666

668

675

686

676

720

688

Page 43: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

48 KvalifikationseksamenAllan Grønlund Jørgensen

Reducing Problem

One insert in tree per step and one insert in Block heap every k/n steps.

n trees of size at most k/n and n Block heaps.

Join all Block heaps together and use Frederickson to find the 4n blocks with largest minimum

n trees and O(n) sorted

arrays left

H5H4H3

H2H1

Page 44: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

49 KvalifikationseksamenAllan Grønlund Jørgensen

Result

Selection in O(n) trees and sorted arrays storing O(k) elements can be done in O(n+nlog(k/n)) time

Result is an O(n+nlog(k/n)) time algorithm.

Page 45: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

50 KvalifikationseksamenAllan Grønlund Jørgensen

Outline

Introduction

The k maximal sums problem

Length constrained k maximal sums problem

Sum selection problem

Summary and plans for the future

Page 46: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

51 KvalifikationseksamenAllan Grønlund Jørgensen

Summary of Results

Problem Time Complexity

k Maximal Sums O(n+k)

Length Constrained k

Maximal Sums

O(n+k)

Sum Selection (n+nlog(k/n))

Sum Selection:

Page 47: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

52 KvalifikationseksamenAllan Grønlund Jørgensen

Summary of Results

Data Structure Query Update

Priority Queue (log(n)+) O(log(n)+ ) am

Sorted Array O(log(n)+) -

Rand. Dictionary (log(n)+) exp. O(log(n)+) exp.

Dictionary (log(n)+) O(log(n)+ ) am.

I/O Dictionary I/O I/O

Fault Tolerant Data Structures:

BB

Nc cB

1

)(log1

BB

Nc cB

1

)(log1

Page 48: Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.

53 KvalifikationseksamenAllan Grønlund Jørgensen

Progress and Future

TimePhD Start Qualification Exam

Priority Queue

Searching

I/O Eff. Search

k Max Sums

Sum Selection

Fault Tolerance

Sums in Arrays

I/O Eff. Sorting

Cache Oblivious

MITSelection in arb. Trees

(l,u) k Max Sums

Dictionary