27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

31
27-Apr-2009 SUNY Buffalo 1 Data Structure – Final Review

Transcript of 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

Page 1: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 SUNY Buffalo 1

Data Structure – Final Review

Page 2: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 2

About this review

I’ve been asked to review several data structures covered in class.

May not be totally complete as it is unrealistic to cover all the materials in ~40mins!

Exam may ask questions that weren’t covered in this review but were covered in class.

If you have questions, ask your instructor ASAP. I’ve used a different book than the one in this

class. Materials were mostly from “Data structure with C”.

I have years of hands on experiences with data structures/algorithms.

If you wonder how data structures are used in the “real world”, ask them.

Page 3: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 3

Review Topics

Tree ADT: Heap, AVL Tree, Red-Black Tree, and 2-3 Tree (B-tree).

Dictionary (map) ADT: Hash tables and hash

functions. Graph ADT:

(?) Breadth-First Search (BFS), and

(?) Depth-First Search (DFS).

For each topic, you should prepare to answer:

What is it? How to represent it? What operations does it

support? How each operation

works? Practice your drawing; do

as much examples as you can!

How long each operation takes?

Best-case, Average-case, and Worst-case.

Page 4: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 4

Review: Trees

Terminology: Size, height, depth (level),

link (edge), path. Root, parent, children,

sibling, leaves, ancestor, descendant, etc.

Representation: Node structure. Storage: Array, Linked list.

Types: Binary tree:

Binary Heap. Binary search tree (BST):

AVL and R-B. B-tree:

2-3 tree.

Operations: insert(), delete(),

search(), sort() and etc. Binary tree-walks:

Pre-order (Root,L,R), In-order (L,Root,R), Post-order (L,R, Root), Level-order.

Time complexity: Insertion: O (log n), Searching: O (log n), Deletion: O (log n), Sorting: O (n log n).

Page 5: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 5

Binary Tree: Importance of Balance

Binary tree, in general, is useful for implementing many operations:

For examples, search(), successor(), predecessor(), minimum(), maximum(), insert(), and delete() can be achieved in O(h) time, where h is the height of the tree.

That is, the average running time of above operations on a balanced tree is h = O(lg n).

But, the insert() and delete() alter the shape of the tree and can result in an unbalanced tree.

In the worst case, h = O(n) no better than a linked list! So, we want to correct the imbalance in at most

O(lg n) time no complexity overhead.

Page 6: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009x Data Structure Review 6

Review: Balanced Trees

To make sure a binary tree is balanced, add a requirement, called the heap property, to the binary tree. Binary heap is commonly use for implementing

Priority Queue ADT. Aside, heap could also mean the memory space

used for dynamic allocation. To make sure a BST is balanced, add a

constrain on the height of BST trees. The most popular data structures are AVL and

Red-Black trees.

Page 7: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 7

Review: Binary Heap

A binary heap extends binary tree data structure and has the following properties: Each node has a key <greater|less> than or

equal to the key of its children. Greater - Max heap; Less - Min heap;

The tree is a complete binary tree. A complete binary tree is a binary tree in

which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. Longest path is ceiling(lg n) for n nodes.

Page 8: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 8

Heap: Maintaining the Heap Property

heapifyUp() and heaifyDown() are the key operations for maintaining the heap property in O(lg n) time.

How does heapifyDown() work? Given a node i in the heap. For maxheap: A[i] < A[left(i)] or A[i] < A[right(i)], swap A[i]

with the largest of A[left(i)] and A[right(i)]. Recurs on that sub-tree.

How does heapifyUp() work? Given a node i in the heap. For maxheap: A[i] > A[parent[i]], swap A[i] with A[parent[i]. Recurs on parent[i].

What about other operations and their running time? delete(), insert(), buildHeap(), heapSort().

Page 9: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 9

Review: AVL Tree

An AVL tree extends BST data structure and include the following property:

Any node in the tree has the height difference between its left and right sub-trees is at most one.

Observe that: The smallest AVL tree of depth 1 has 1 node. The smallest AVL tree of depth 2 has 2 nodes.

Th-1Th-2 h-2h-1

hTh

x

AVL: Adelson-Velsky and Landis, 1962

Th = Th–1 + Th–2 + 1 Size of tree:

Page 10: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 10

AVL: Maintaining the AVL Property

Tree rotation is the key operations for maintaining the AVL property in O(lg n) time.

If a node is not balanced, the difference between its children heights is 2. 4 possible cases with a height difference of 2.

y

x

BC

A

x

y

BA

C

x

y

CA

B

y

x

CA

B

(1) (2) (3) (4)

Page 11: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 11

AVL: Maintaining the AVL Property (2)

Case 1: rightRotate (y); x = y.getLeftChild(); y.setLeftChild(x.getRightChi

ld()); x.setRightChild(y); x = y;

Case 3: leftRotate (y); rightRotate (x);

Case 2: leftRotate(x); y= x.getRightChild(); x.setRightChild(y.getLeftChi

ld()); y.setLeftChild(x) x = y;

Case 4: rightRotate(x); leftRotate (y);

C

y

x C

A B

x

A y

B

rightRotate(y)

leftRotate(x)

Page 12: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 12

AVL: Insert/Delete

Insertion is similar to a regular BST Insert:

Search for the position : Keep going left (or right) in

the tree until a null child is reached.

Insert a new node in this position.

An inserted node is always a leaf.

Rebalance the tree: Search from inserted node

to root looking for any node that violate the AVL property.

Use rotation to fix. Only require to find the

first unbalanced node.

Deletion is similar to a regular BST Delete:

Search for the node. Remove it :

0 children: replace it with null

1 child: replace it with the only

child 2 children:

replace it with right-most node in the left subtree

Rebalance the tree: Search from inserted node

to root for all node that violate the AVL property.

Use rotation to fix. Require to work all the way

back to the root.

Page 13: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 13

Review: Red-Black Trees

Red-black trees extends BST data structure and include the following properties:

The root is always black. Every node is either red or black. Every leaf (NULL pointer) is black (every “real” node has 2

children). Both children of every red node are black (can’t have 2

consecutive reds on a path). Every simple path from node to descendent leaf contains

the same number of black nodes.

RB tree has height h 2 lg(n+1). So, operation is guaranteed to be the height h = O(lg n).

Page 14: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 14

RB Trees: Maintaining RB Tree Property

Tree rotation is the key operation for maintaining a RB tree property in O(lg n) time:

Rotation preserves in-order key ordering Rotation takes O(1) time (just swaps pointers)

C

y

x C

A B

x

A y

B

rightRotate(y)

leftRotate(x)

Page 15: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 15

RB Trees: Insert/Delete

Insertion is similar to BST’s insert: BST Insert. Color the new node red. Rebalance the tree:

If parent is black, done. Otherwise:

1. Parent’s sibling is red.2. Parent’s sibling is

black and new node is a right child.

3. Parent’s sibling is black and new node is a left child.

Repeat, moving up the tree until there are no violation.

Deletion is similar to BST’s delete: BST Delete; Rebalance the tree:

If node is red, color black, done.

Otherwise:1. Sibling has two black

children.2. Sibling’s children are

both black.3. Sibling's left child is red.

sibling's right child is black,

4. Sibling is black, sibling's right child is red.

Repeat, moving up the tree until there are no violation.

Page 16: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 16

Review: 2-3 B-Trees

A B-tree extends tree data structure and has the following properties:

The root is either a leaf or has between 2 and m children. Each internal node has between ceiling(m/2) and m

children. Each internal node has between ceiling(m/2)-1 and m-1

keys. A leaf node has between 1 and m-1 keys. The tree is perfectly balanced.

So, a 2-3 B-tree is a B-tree of 3 order. A node can have 2 or 3 children, which means that a node

can have 1, 2 or 3 keys. R-B tree is a B-tree with degree 2.

< x , y>

<=x >x and <=y >y

Page 17: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 17

2-3: Insert/Delete

Insertion is similar to insert in a BST:

Searching for the item. If found, done. Otherwise,

Stop at a 2-node? Upgrade the 2-node to a

3-node. Stop at a 3-node?

Replace the 3-node by 2 2-nodes and push the middle value up to the parent node.

Repeat recursively until you upgrade a 2-node or create a new root.

When is a new root created?

Deletion is similar to delete in a BST.

Start deletion at a leaf. Swap the value to be

deleted with its immediate successor in the tree.

Delete the value from the node.

If the node still has a value, done.

We’ve changed a 3-node into a 2-node;

Otherwise, Find a value from sibling

or parent.

Page 18: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 18

Review: Hash Tables

Given n elements, each with a key and satellite data, we need to support:

insert (T, x), delete (T, x), and search(T, x), But, don’t care about sorting the elements.

Suppose no two elements have the same key and the range of keys is 0…m-1, where m is not too large.

Set up an array T[0…m-1] in which. T[i] = x if x T and i=h(key(x)); T[i] = NULL otherwise.

h() is called the hash function (or hashing) and T is called a direct-address table.

Hash tables support insert, delete, and search in O(1) expected time.

Page 19: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 19

Hash: Resolving Collisions

Collision happens when two keys hash to the same memory location.

Two ways to resolve collisions: Open addressing:

To insert, if slot is full, try another slot, and another, until an open slot is found (probing).

To search, follow same sequence of probes as would be used when inserting the element.

Chaining: To insert, keep linked list of elements in slots. Upon

collision, just add new element to list. To search: search the linked list.

Page 20: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 21

Review: Graphs

A graph G = (V, E), where V = set of vertices, E = set of edges. Dense graph: |E| |V|2 Sparse graph: |E| |V| Undirected graph:

Edge (u,v) = edge (v,u) No self-loops

Directed graph: Edge (u,v) goes from vertex u to vertex v, notated uv

A weighted graph associates weights with either the edges or the vertices.

Page 21: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 22

Graphs: Adjacency Matrix

Assume V = {1, 2, …, n}. An adjacency matrix represents the graph

as a n x n matrix A: A[i, j] = 1 if edge (i, j) E (or weight of edge)

= 0 if edge (i, j) E

1

2 4

3

a

d

b c

A 1 2 3 4

1 0 1 1 0

2 0 0 1 0

3 0 0 0 0

4 0 0 1 0

Page 22: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 23

Graphs: Adjacency List

An adjacency list represents the graph as an array of linked list. For each vertex v V, store a list of vertices

adjacent to v. Example:

Adj[1] = {2,3} Adj[2] = {3} Adj[3] = {} Adj[4] = {3}

Variation: can also keep a list of edges coming into vertex.

1

2 4

3

Page 23: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 24

Graphs: Storage

Adjacency matrix takes O(V2) storage. Usually too much storage for large graphs. But can be very efficient for small graphs.

Adjacency list takes O(V+E) storage: The degree of a vertex v = # incident edges. For directed graphs, # of items in adjacency lists is:

out-degree(v) = |E| takes (V + E) storage. For undirected graphs, # items in adjacency lists is:

degree(v) = 2 |E| (handshaking lemma) also takes (V + E) storage.

Most large interesting graphs are sparse. E.g., planar graphs, in which no edges cross, have |E| =

O(|V|) by Euler’s formula; So, the adjacency list is often a more appropriate

representation.

Page 24: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 25

Review: Graph Searching

Given: a graph G = (V, E), directed or undirected.

Goal: systematically explore every vertex and every edge.

General idea: build a tree on the graph. Pick a vertex as the root, Choose certain edges to produce a tree. Note: might also build a forest if graph is not

connected.

Page 25: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 26

Breadth-First Search

General idea: Expand frontier of explored vertices across the breadth of

the frontier. Pick a source vertex to be the root. Find (“discover”) its children, then their children, etc.

Associate vertex “colours”: White vertices have not been discovered.

All vertices start out white. Grey vertices are discovered but not fully explored.

They may be adjacent to white vertices. Black vertices are discovered and fully explored.

They are adjacent only to black and grey vertices. Explore vertices by scanning the FIFO queue of

grey vertices.

Page 26: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 27

BFS and Shortest-path

BFS can thought of as being like Dijkstra’s for shortest-path except every edge has the same weight.

BFS calculates the shortest-path distance to the source node. Shortest-path distance (s,v) = minimum

number of edges from s to v, or if v not reachable from s.

Proof should be in the book. BFS builds breadth-first tree, in which paths

to root represent shortest paths in G. Thus can use BFS to calculate shortest path from

one vertex to another in O(V+E) time.

Page 27: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 28

Depth-First Search

General idea: Explore “deeper” in the graph whenever possible. Edges are explored out of the most recently

discovered vertex v that still has unexplored edges. When all of v’s edges have been explored,

backtrack to the vertex from which v was discovered.

Like BFS, associate vertex “colours”: Vertices initially white. Then coloured grey when discovered. Then coloured black when finished.

Explore vertices by scanning the stack of grey vertices

Page 28: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 29

DFS And Cycles

An undirected graph is acyclic iff a DFS yields no back edges

If acyclic, no back edges (because a back edge implies a cycle).

If no back edges, acyclic. No back edges implies only tree edges (Why?)

Only tree edges implies we have a tree or a forest, which by definition is acyclic.

Thus, can run DFS to find whether a graph has a cycle. We can actually determine if cycles exist in O(V)

time: In an undirected acyclic forest, |E| |V| - 1. So count the edges: if ever see |V| distinct edges, must

have seen a back edge along the way.

Page 29: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 30

Remarks (1)

Clearly data structures and algorithms are closely related:

Selecting the most efficient data structure and algorithm will almost always be the best way to proceed.

However, consideration of many factors are required to produce a good implementation:

The obvious solution isn’t always the best. Sometimes it makes sense to have multiple data structures,

each with different properties, to represent a single object. Factors to be considered are:

The memory footprint implied by a given representation. The cost of operations in that representation. The cost of converting to another representation. The amount of computation expected using a given

representation.

Page 30: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 31

Remarks (2)

When it comes to the implementation of an algorithm, the main point is that constant factors matter.

Mapping algorithms and data structures in a way that matches the architecture characteristics is VERY important !

Often, require to restructure a program, not functionally but behaviourally, to get better performance.

However, restructuring code, can be a bit more involved than just performing optimisations.

So, the bottom line is to think about trade-offs that could change the quality of an implementation.

Direct, obvious algorithm translations don’t always mean good performance;

Best performance comes from considering the many aspects of execution, e.g., memory access, processor characteristics, language overheads.

Page 31: 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 Data Structure Review 32

Good Luck!