27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

27-Apr-2009 SUNY Buffalo 1

Data Structure – Final Review

27-Apr-2009 Data Structure Review 2

About this review

I’ve been asked to review several data structures covered in class.

May not be totally complete as it is unrealistic to cover all the materials in ~40mins!

Exam may ask questions that weren’t covered in this review but were covered in class.

If you have questions, ask your instructor ASAP. I’ve used a different book than the one in this

class. Materials were mostly from “Data structure with C”.

I have years of hands on experiences with data structures/algorithms.

If you wonder how data structures are used in the “real world”, ask them.


Review Topics

Tree ADT: Heap, AVL Tree, Red-Black Tree, and 2-3 Tree (B-tree).

Dictionary (map) ADT: Hash tables and hash

functions. Graph ADT:

(?) Breadth-First Search (BFS), and

(?) Depth-First Search (DFS).

For each topic, you should prepare to answer:

What is it? How to represent it? What operations does it

support? How each operation

works? Practice your drawing; do

as much examples as you can!

How long each operation takes?

Best-case, Average-case, and Worst-case.


Review: Trees

Terminology: Size, height, depth (level),

link (edge), path. Root, parent, children,

sibling, leaves, ancestor, descendant, etc.

Representation: Node structure. Storage: Array, Linked list.

Types: Binary tree:

Binary Heap. Binary search tree (BST):

AVL and R-B. B-tree:

2-3 tree.

Operations: insert(), delete(),

search(), sort() and etc. Binary tree-walks:

Pre-order (Root,L,R), In-order (L,Root,R), Post-order (L,R, Root), Level-order.

Time complexity: Insertion: O (log n), Searching: O (log n), Deletion: O (log n), Sorting: O (n log n).


Binary Tree: Importance of Balance

Binary tree, in general, is useful for implementing many operations:

For examples, search(), successor(), predecessor(), minimum(), maximum(), insert(), and delete() can be achieved in O(h) time, where h is the height of the tree.

That is, the average running time of above operations on a balanced tree is h = O(lg n).

But, the insert() and delete() alter the shape of the tree and can result in an unbalanced tree.

In the worst case, h = O(n) no better than a linked list! So, we want to correct the imbalance in at most

O(lg n) time no complexity overhead.

27-Apr-2009x Data Structure Review 6

Review: Balanced Trees

To make sure a binary tree is balanced, add a requirement, called the heap property, to the binary tree. Binary heap is commonly use for implementing

Priority Queue ADT. Aside, heap could also mean the memory space

used for dynamic allocation. To make sure a BST is balanced, add a

constrain on the height of BST trees. The most popular data structures are AVL and

Red-Black trees.


Review: Binary Heap

A binary heap extends binary tree data structure and has the following properties: Each node has a key <greater|less> than or

equal to the key of its children. Greater - Max heap; Less - Min heap;

The tree is a complete binary tree. A complete binary tree is a binary tree in

which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. Longest path is ceiling(lg n) for n nodes.


Heap: Maintaining the Heap Property

heapifyUp() and heaifyDown() are the key operations for maintaining the heap property in O(lg n) time.

How does heapifyDown() work? Given a node i in the heap. For maxheap: A[i] < A[left(i)] or A[i] < A[right(i)], swap A[i]

with the largest of A[left(i)] and A[right(i)]. Recurs on that sub-tree.

How does heapifyUp() work? Given a node i in the heap. For maxheap: A[i] > A[parent[i]], swap A[i] with A[parent[i]. Recurs on parent[i].

What about other operations and their running time? delete(), insert(), buildHeap(), heapSort().


Review: AVL Tree

An AVL tree extends BST data structure and include the following property:

Any node in the tree has the height difference between its left and right sub-trees is at most one.

Observe that: The smallest AVL tree of depth 1 has 1 node. The smallest AVL tree of depth 2 has 2 nodes.

Th-1Th-2 h-2h-1

hTh

x

AVL: Adelson-Velsky and Landis, 1962

Th = Th–1 + Th–2 + 1 Size of tree:


AVL: Maintaining the AVL Property

Tree rotation is the key operations for maintaining the AVL property in O(lg n) time.

If a node is not balanced, the difference between its children heights is 2. 4 possible cases with a height difference of 2.

y

x

BC

A

x

y

BA

C

x

y

CA

B

y

x

CA

B

(1) (2) (3) (4)


AVL: Maintaining the AVL Property (2)

Case 1: rightRotate (y); x = y.getLeftChild(); y.setLeftChild(x.getRightChi

ld()); x.setRightChild(y); x = y;

Case 3: leftRotate (y); rightRotate (x);

Case 2: leftRotate(x); y= x.getRightChild(); x.setRightChild(y.getLeftChi

ld()); y.setLeftChild(x) x = y;

Case 4: rightRotate(x); leftRotate (y);

C

y

x C

A B

x

A y

B

rightRotate(y)

leftRotate(x)


AVL: Insert/Delete

Insertion is similar to a regular BST Insert:

Search for the position : Keep going left (or right) in

the tree until a null child is reached.

Insert a new node in this position.

An inserted node is always a leaf.

Rebalance the tree: Search from inserted node

to root looking for any node that violate the AVL property.

Use rotation to fix. Only require to find the

first unbalanced node.

Deletion is similar to a regular BST Delete:

Search for the node. Remove it :

0 children: replace it with null

1 child: replace it with the only

child 2 children:

replace it with right-most node in the left subtree

Rebalance the tree: Search from inserted node

to root for all node that violate the AVL property.

Use rotation to fix. Require to work all the way

back to the root.


Review: Red-Black Trees

Red-black trees extends BST data structure and include the following properties:

The root is always black. Every node is either red or black. Every leaf (NULL pointer) is black (every “real” node has 2

children). Both children of every red node are black (can’t have 2

consecutive reds on a path). Every simple path from node to descendent leaf contains

the same number of black nodes.

RB tree has height h 2 lg(n+1). So, operation is guaranteed to be the height h = O(lg n).


RB Trees: Maintaining RB Tree Property

Tree rotation is the key operation for maintaining a RB tree property in O(lg n) time:

Rotation preserves in-order key ordering Rotation takes O(1) time (just swaps pointers)

C

y

x C

A B

x

A y

B

rightRotate(y)

leftRotate(x)


RB Trees: Insert/Delete

Insertion is similar to BST’s insert: BST Insert. Color the new node red. Rebalance the tree:

If parent is black, done. Otherwise:

1. Parent’s sibling is red.2. Parent’s sibling is

black and new node is a right child.

3. Parent’s sibling is black and new node is a left child.

Repeat, moving up the tree until there are no violation.

Deletion is similar to BST’s delete: BST Delete; Rebalance the tree:

If node is red, color black, done.

Otherwise:1. Sibling has two black

children.2. Sibling’s children are

both black.3. Sibling's left child is red.

sibling's right child is black,

4. Sibling is black, sibling's right child is red.

Repeat, moving up the tree until there are no violation.


Review: 2-3 B-Trees

A B-tree extends tree data structure and has the following properties:

The root is either a leaf or has between 2 and m children. Each internal node has between ceiling(m/2) and m

children. Each internal node has between ceiling(m/2)-1 and m-1

keys. A leaf node has between 1 and m-1 keys. The tree is perfectly balanced.

So, a 2-3 B-tree is a B-tree of 3 order. A node can have 2 or 3 children, which means that a node

can have 1, 2 or 3 keys. R-B tree is a B-tree with degree 2.

< x , y>

<=x >x and <=y >y


2-3: Insert/Delete

Insertion is similar to insert in a BST:

Searching for the item. If found, done. Otherwise,

Stop at a 2-node? Upgrade the 2-node to a

3-node. Stop at a 3-node?

Replace the 3-node by 2 2-nodes and push the middle value up to the parent node.

Repeat recursively until you upgrade a 2-node or create a new root.

When is a new root created?

Deletion is similar to delete in a BST.

Start deletion at a leaf. Swap the value to be

deleted with its immediate successor in the tree.

Delete the value from the node.

If the node still has a value, done.

We’ve changed a 3-node into a 2-node;

Otherwise, Find a value from sibling

or parent.


Review: Hash Tables

Given n elements, each with a key and satellite data, we need to support:

insert (T, x), delete (T, x), and search(T, x), But, don’t care about sorting the elements.

Suppose no two elements have the same key and the range of keys is 0…m-1, where m is not too large.

Set up an array T[0…m-1] in which. T[i] = x if x T and i=h(key(x)); T[i] = NULL otherwise.

h() is called the hash function (or hashing) and T is called a direct-address table.

Hash tables support insert, delete, and search in O(1) expected time.


Hash: Resolving Collisions

Collision happens when two keys hash to the same memory location.

Two ways to resolve collisions: Open addressing:

To insert, if slot is full, try another slot, and another, until an open slot is found (probing).

To search, follow same sequence of probes as would be used when inserting the element.

Chaining: To insert, keep linked list of elements in slots. Upon

collision, just add new element to list. To search: search the linked list.


Review: Graphs

A graph G = (V, E), where V = set of vertices, E = set of edges. Dense graph: |E| |V|2 Sparse graph: |E| |V| Undirected graph:

Edge (u,v) = edge (v,u) No self-loops

Directed graph: Edge (u,v) goes from vertex u to vertex v, notated uv

A weighted graph associates weights with either the edges or the vertices.


Graphs: Adjacency Matrix

Assume V = {1, 2, …, n}. An adjacency matrix represents the graph

as a n x n matrix A: A[i, j] = 1 if edge (i, j) E (or weight of edge)

= 0 if edge (i, j) E

1

2 4

3

a

d

b c

A 1 2 3 4

1 0 1 1 0

2 0 0 1 0

3 0 0 0 0

4 0 0 1 0


Graphs: Adjacency List

An adjacency list represents the graph as an array of linked list. For each vertex v V, store a list of vertices

adjacent to v. Example:

Adj[1] = {2,3} Adj[2] = {3} Adj[3] = {} Adj[4] = {3}

Variation: can also keep a list of edges coming into vertex.

1

2 4

3


Graphs: Storage

Adjacency matrix takes O(V2) storage. Usually too much storage for large graphs. But can be very efficient for small graphs.

Adjacency list takes O(V+E) storage: The degree of a vertex v = # incident edges. For directed graphs, # of items in adjacency lists is:

out-degree(v) = |E| takes (V + E) storage. For undirected graphs, # items in adjacency lists is:

degree(v) = 2 |E| (handshaking lemma) also takes (V + E) storage.

Most large interesting graphs are sparse. E.g., planar graphs, in which no edges cross, have |E| =

O(|V|) by Euler’s formula; So, the adjacency list is often a more appropriate

representation.


Review: Graph Searching

Given: a graph G = (V, E), directed or undirected.

Goal: systematically explore every vertex and every edge.

General idea: build a tree on the graph. Pick a vertex as the root, Choose certain edges to produce a tree. Note: might also build a forest if graph is not

connected.


Breadth-First Search

General idea: Expand frontier of explored vertices across the breadth of

the frontier. Pick a source vertex to be the root. Find (“discover”) its children, then their children, etc.

Associate vertex “colours”: White vertices have not been discovered.

All vertices start out white. Grey vertices are discovered but not fully explored.

They may be adjacent to white vertices. Black vertices are discovered and fully explored.

They are adjacent only to black and grey vertices. Explore vertices by scanning the FIFO queue of

grey vertices.


BFS and Shortest-path

BFS can thought of as being like Dijkstra’s for shortest-path except every edge has the same weight.

BFS calculates the shortest-path distance to the source node. Shortest-path distance (s,v) = minimum

number of edges from s to v, or if v not reachable from s.

Proof should be in the book. BFS builds breadth-first tree, in which paths

to root represent shortest paths in G. Thus can use BFS to calculate shortest path from

one vertex to another in O(V+E) time.


Depth-First Search

General idea: Explore “deeper” in the graph whenever possible. Edges are explored out of the most recently

discovered vertex v that still has unexplored edges. When all of v’s edges have been explored,

backtrack to the vertex from which v was discovered.

Like BFS, associate vertex “colours”: Vertices initially white. Then coloured grey when discovered. Then coloured black when finished.

Explore vertices by scanning the stack of grey vertices


DFS And Cycles

An undirected graph is acyclic iff a DFS yields no back edges

If acyclic, no back edges (because a back edge implies a cycle).

If no back edges, acyclic. No back edges implies only tree edges (Why?)

Only tree edges implies we have a tree or a forest, which by definition is acyclic.

Thus, can run DFS to find whether a graph has a cycle. We can actually determine if cycles exist in O(V)

time: In an undirected acyclic forest, |E| |V| - 1. So count the edges: if ever see |V| distinct edges, must

have seen a back edge along the way.


Remarks (1)

Clearly data structures and algorithms are closely related:

Selecting the most efficient data structure and algorithm will almost always be the best way to proceed.

However, consideration of many factors are required to produce a good implementation:

The obvious solution isn’t always the best. Sometimes it makes sense to have multiple data structures,

each with different properties, to represent a single object. Factors to be considered are:

The memory footprint implied by a given representation. The cost of operations in that representation. The cost of converting to another representation. The amount of computation expected using a given

representation.


Remarks (2)

When it comes to the implementation of an algorithm, the main point is that constant factors matter.

Mapping algorithms and data structures in a way that matches the architecture characteristics is VERY important !

Often, require to restructure a program, not functionally but behaviourally, to get better performance.

However, restructuring code, can be a bit more involved than just performing optimisations.

So, the bottom line is to think about trade-offs that could change the quality of an implementation.

Direct, obvious algorithm translations don’t always mean good performance;

Best performance comes from considering the many aspects of execution, e.g., memory access, processor characteristics, language overheads.


Good Luck!

27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.

Documents

Transcript of 27-Apr-2009SUNY Buffalo1 Data Structure – Final Review.