Hierarchies & Trees in SQLHierarchies & Trees in SQL
by
Joe Celko
copyright 2008
Trees in SQLTrees in SQL
Trees are graph structures used to represent – Hierarchies
– Parts explosions
– Organizational charts
Three methods in SQL– Adjacency list model
– Nested set model
– Path enumeration
Trees in SQL -2Trees in SQL -2
Trees are not hierarchies– Hierarchies have subordination– Kill your captain, you still have to
take orders from your general– Break an edge in a tree, and you
have two or more disjoint trees. This means an adjacency list model is
a tree, but not a hierarchy
Tree TerminologyTree Terminology
The Tree - term inology
grandchild leaf node
C hild S ib ling
C hild
Parent
R oot
Tree as GraphTree as Graph
A1 A2
A0 B0
R oot
Trees as Nested SetsTrees as Nested Sets
root
A0
A1 A2
B0
Graphs as TablesGraphs as Tables
Nodes and edges are not the same kind of things– Organizational chart & Personnel file
You should use separate tables for the structure and the elements – The structure table will be small (two
integers and a key) – You can put more than one structure
table on the same elements
Adjacency List ModelAdjacency List Model
node parent cost ================== Root NULL 2.50 A0 Root 1.75 A1 A0 2.00 A2 A0 3.50 B0 Root 4.00
Cost really should not be in the table, but most adjacency list tables mix nodes and edges (see Oracle’s Scott/Tiger sample database)
Most common method in use.
Adjacency List ModelAdjacency List Model
Programmers do not add constraints:
CHECK((SELECT COUNT(*) FROM Tree) -1 = SELECT COUNT(*) FROM ((SELECT child FROM Tree) UNION (SELECT parent FROM Tree)))
Path Enumeration ModelPath Enumeration Model
Tree node cost path ============== Root 2.50 ‘Root’ A0 1.75 ‘Root,A0’ A1 2.00 ‘Root,A0,A1’ A2 3.50 ‘Root,A0,A2’ B0 4.00 ‘Root,B0’
Cost really should not be in the table, but most path enumeration tables mix nodes and edges.
Paths are search with path LIKE ‘Root,%’predicates
Graph with TraversalGraph with Traversal
A 1le ft = 3
rig h t = 4
A 2le ft = 5
rig h t = 6
A 0le ft = 2
rig h t = 7
B 0le ft = 8
rig h t = 9
R ootle ft = 1
rig h t = 1 0
Nested Sets with NumbersNested Sets with Numbers
1 2 3 4 5 6 7 8 9 10
A0
A1 A2
B0Root
Nested Sets as NumbersNested Sets as Numbers
Split nodes and edges into two tables. You can join them back together later This could be personnel and Org chart
– Tree.node would be job titles
– Nodes would need job titles and the person holding it
Tree Nodes
node lft rgt node cost
============== ========
Root 1 10 Root 2.50
A0 2 7 A0 1.75
A1 3 4 A1 2.00
A2 5 6 A2 3.50
B0 8 9 B0 4.00
Problems with Adjacency listProblems with Adjacency list
You have to use cursors or self-joins to traverse the tree
Cursors are not a table -- their order has meaning -- Closure violation!
Cursors take MUCH longer than queries
Ten level self-joins are worse than cursors
Problems with Path EnumerationProblems with Path Enumeration
Path can get long in a deep tree Great for searching down the tree, but not
up the tree– SELECT * FROM Tree WHERE path LIKE ‘Root,%’; – SELECT * FROM Tree WHERE path LIKE ‘%,B0’;
Inserting and deleting nodes is complicated – Requires string manipulation to change all
the paths beneath the insertion or deletion point
Tree AggregatesTree Aggregates
Give me the total cost for all subtrees
– (root, 13.75) -- sum of every node in tree
– (A0, 7.25) -- sum of “A0” subtree – (A1, 2.00)– (A2, 3.50)
Dropping A2 would reduce all superior rows by 3.50,but would not change A1
Find Root of TreeFind Root of Tree
SELECT * FROM Tree WHERE lft = 1; It helps to have an index the lft
column The rgt value will be twice the
number of nodes in the tree. General rule: The number of nodes in
any subtree ((rgt -lft) + 1 )/ 2
Find All Leaf NodesFind All Leaf Nodes
SELECT * FROM Tree WHERE lft = rgt -1;
An index on lft will help A covering index on (lft, rgt) is even
better
Find Superiors of XFind Superiors of X
SELECT Super.*
FROM Tree AS T1, Tree AS Sup
WHERE T1.node = ‘X’
AND T1.lft BETWEEN Sup.lft
AND Sup.rgt; This is the most important trick in this
method The BETWEEN predicates preserve
subordination in the hierarchy One query for any depth tree
Find Subordinates of XFind Subordinates of X
SELECT Sub.*
FROM Tree AS T1, Tree AS Sub
WHERE T1.node = ‘X’
AND Sub.lft BETWEEN T1.lft
AND T1.rgt; This is the same pattern as finding
superiors
Find Depth of TreeFind Depth of Tree
SELECT T1.node, COUNT(T2.node) AS level
FROM Tree AS T1, Tree AS T2
WHERE T1.lft BETWEEN T2.lft
AND T2.rgt
GROUP BY T1.node; Count the containing nested sets for
levels The closer to the root a node is, the
greater the value of (rgt - lft)
Totals by Level in TreeTotals by Level in Tree
SELECT T1.node,
SUM(T2.cost) AS tot_level_cost
FROM Tree AS T1, Tree AS T2
WHERE T2.lft BETWEEN T1.lft
AND T1.rgt
GROUP BY T1.node; Uses any aggregate function the
same way
Delete a SubtreeDelete a Subtree
Remove subtree rooted at :my_nodeDELETE FROM Tree
WHERE lft BETWEEN
(SELECT lft
FROM Tree
WHERE node = :my_node)
AND (SELECT rgt
FROM Tree
WHERE node = :my_node);
Delete a Single NodeDelete a Single Node
Method one - promote a child to the parent’s prior position in the tree. Oldest son inherits family business
Method two- subordinate the entire subtree to the grandparent. Orphans go live with grandmother.
Delete & Promote Oldest - 1Delete & Promote Oldest - 1
Delete A0 node
A1 A2
A0 B0
R oot
Delete & Promote Oldest - 2Delete & Promote Oldest - 2
A2
A1 B0
root
Delete & Promote Subtree - 1Delete & Promote Subtree - 1
Delete A0 node
A1 A2
A0 B0
R oot
Delete & Promote Subtree - 2Delete & Promote Subtree - 2
A1 A2 B0
root
Closing gaps in nested set model -1Closing gaps in nested set model -1
Deleted nodes leave gaps in numbering of lft and rgt nodes.
Fill in gaps by sliding everyone over to the lft until there are no gaps.
UPDATE Tree
SET lft = lft - gap_size,
rgt = rgt - gap_size
WHERE rgt >= gap_start
OR lft >= gap_start;
Closing gaps in nested set model -2Closing gaps in nested set model -2
CREATE VIEW LftRgt(i)AS SELECT lft FROM Tree UNION ALL SELECT rgt FROM Tree;
UPDATE Tree SET lft = (SELECT COUNT(*) FROM LftRgt WHERE i <= lft; rgt = (SELECT COUNT(*) FROM LftRgt WHERE i <= rgt;
Inserting into a TreeInserting into a Tree
The real trick is numbering the subtree correctly before inserting it.
Basic idea is to spread the nested set numbers apart to make a gap, the size of the subtree then you add the subtree.
The position of the subtree within the siblings of the new parent in the tree is another decision.
Inserting into a TreeInserting into a Tree
The real trick is numbering the subtree correctly before inserting it.
Basic idea is to spread the nested set numbers apart to make a gap, the size of the subtree then you add the subtree.
The position of the subtree within the siblings of the new parent in the tree is another decision.
Inserting into a Tree -2Inserting into a Tree -2
If you are worried about having to update the tree structure too often, then use a bigger spread in the numbering.
At higher levels, use steps of 100,000, then 10,000 and so forth.
Most SQL products can handle DECIMAL(s,p) of 30 or more digits.
Since insertion are done on the right side of the siblings, you can re-organize the tree by sliding everyone to the left and closing the gaps.
Inserting into a Tree -4Inserting into a Tree -4
BA1 A2
A0Root
Slide everyone to the left
Creating a Tree -1Creating a Tree -1
If you want to have all the constraints for a proper hierarchy, then it is complicated.
CREATE TABLE Tree
(node_id INTEGER NOT NULL REFERENCES Nodes(node_id),
lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
UNIQUE (lft, rgt), – redundant, but useful
CHECK (lft < rgt)
);
You can also declare node_id to be the PRIMARY KEY, but then one person cannot hold two jobs.
Creating a Tree -2Creating a Tree -2
Other needed constraints – no overlaps in the nodes
SELECT *
FROM Tree AS T1
WHERE EXISTS
(SELECT *
FROM Tree AS T2
WHERE T1.lft BETWEEN T2.lft AND T2.rgt
AND T1.rgt
NOT BETWEEN T2.lft AND T2.rgt;
Creating a Tree -3Creating a Tree -3
Other needed constraints – no disjoint nodes
SELECT * FROM Tree AS T1 WHERE EXISTS (SELECT * FROM Tree AS
T2 WHERE T1.lft < (SELECT rgt FROM Tree WHERE lft = 1));
Creating a Tree -4Creating a Tree -4
If you do not have triggers or CREATE ASSERTION, you can use an updatable view
CREATE VIEW GoodTree (node, i, j)ASSELECT T1.node, T1.i, T1.j FROM Tree AS T1WHERE NOT EXISTS (<overlaps>) AND NOT EXISTS (<disjoint>)WITH CHECK OPTION;
Converting an Adjacency Model into a Converting an Adjacency Model into a Nested Set ModelNested Set Model
Current best method is to load nodes into a tree in a host language, then do a recursive pre-order tree traversal to get the lft and rgt traversal numbers.
Adjacency list method does not order siblings; nested set model does automatically
Classic push down stack algorithm works You can keep both models in one table
with a column for the immediate superior
Converting a Nested Set Model into an Converting a Nested Set Model into an Adjacency ModelAdjacency Model
This actually pretty straight forward; you can put it into a single view
SELECT B.emp AS boss, P.emp
FROM OrgChart AS P
LEFT OUTER JOIN
OrgChart AS B
ON B.lft
= (SELECT MAX(lft)
FROM OrgChart AS S
WHERE P.lft > S.lft
AND P.lft < S.rgt);
Structure versus ContentsStructure versus Contents
Nested set model allows the structure of trees to be compared.
For each tree find the lft value of the root node of each tree
Make a canonical form and UNION ALL them EXISTS ( SELECT *
FROM ( SELECT (lft - lftmost), (rgt - lftmost)
FROM Tree1
UNION ALL
SELECT (lft - lftmost), (rgt - lftmost)
FROM Tree2) AS Both (lft, rgt)
GROUP BY Both.lft, Both.rgt
HAVING COUNT (*) =1 )
Questions & AnswersQuestions & Answers
?
Top Related