In my last post, I used CONNECT BY to query a table holding the bones of the skeleton in the “Dem Dry Bones” song: since the skeleton in the song has a head with no parent bone (making it a root node), and each bone has only one parent bone, it’s hierarchical data.
In this post, I’m going to add an arm and some ribs to the skeleton to make the data more interesting to query.
INSERT INTO skeleton VALUES ('a rib', 'back'); INSERT INTO skeleton VALUES ('b rib', 'back'); INSERT INTO skeleton VALUES ('c rib', 'back'); INSERT INTO skeleton VALUES ('arm', 'shoulder'); INSERT INTO skeleton VALUES ('wrist', 'arm'); INSERT INTO skeleton VALUES ('finger','wrist'); commit;
Now let’s see what the skeleton looks like.
SELECT bone, connected_to_the, level, lpad(' ',2*level, '-') || bone AS bone_tree --,ltrim(sys_connect_by_path(bone,' -> '),' -> ') as path FROM skeleton START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the
BONE CONNECTED LEVEL BONE_TREE --------- --------- ----- --------------------------------------------- head 1 - head neck head 2 --- neck shoulder neck 3 ----- shoulder arm shoulder 4 ------- arm wrist arm 5 --------- wrist finger wrist 6 ----------- finger back shoulder 4 ------- back a rib back 5 --------- a rib b rib back 5 --------- b rib c rib back 5 --------- c rib hip back 5 --------- hip thigh hip 6 ----------- thigh knee thigh 7 ------------- knee leg knee 8 --------------- leg ankle leg 9 ----------------- ankle heel ankle 10 ------------------- heel foot heel 11 --------------------- foot toe foot 12 ----------------------- toe 18 rows selected.
Now we’ve got some branches!
But this has potential to get a little ugly, because the various bones hanging off the backbone could come out in any order. It could show the hips above the ribs, or the ribs out of order, or the back above the arm. There’s an ORDER BY clause we can use to guarantee they come out in the order we want: ORDER SIBLINGS. Siblings are child nodes on the same level. For example, ‘arm’ and ‘back’ are both children of the ‘shoulder’ node on the same level. Likewise, all the ribs and the hip are siblings. As it happens, ordering alphabetically by bone will guarantee the order we want.
SELECT bone, connected_to_the, level, lpad(' ',2*level, '-') || bone AS bone_tree FROM skeleton START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the ORDER siblings BY bone ;
There are a couple more pseudocolumns that are fun. Going back to the tree metaphor, we have the root node, the node with no parent. And then we have leaf nodes, which are nodes with no children. In this case the root is the head and the leaves are toes and fingers and ribs. Oracle SQL has a pseudocolumn for root and leaf.
The connect_by_root psuedocolumn gives the root node:
SELECT bone, ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path, connect_by_root bone AS root FROM skeleton START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the ORDER siblings BY 1
BONE PATH ROOT --------- ---------------------------------------------------------------------------------------------------- --------- head head head neck head -> neck head shoulder head -> neck -> shoulder head arm head -> neck -> shoulder -> arm head wrist head -> neck -> shoulder -> arm -> wrist head finger head -> neck -> shoulder -> arm -> wrist -> finger head back head -> neck -> shoulder -> back head a rib head -> neck -> shoulder -> back -> a rib head b rib head -> neck -> shoulder -> back -> b rib head c rib head -> neck -> shoulder -> back -> c rib head hip head -> neck -> shoulder -> back -> hip head thigh head -> neck -> shoulder -> back -> hip -> thigh head knee head -> neck -> shoulder -> back -> hip -> thigh -> knee head leg head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg head ankle head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle head heel head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel head foot head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot head toe head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe head
This is boring because we have specified a START WITH condition which occurs once in the table, so the root will always be the same. Let’s omit the start with and try again,
SELECT bone, ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path, connect_by_root bone AS root FROM skeleton --start with connected_to_the is null CONNECT BY prior bone=connected_to_the ORDER siblings BY 1
BONE PATH ROOT --------- ---------------------------------------------------------------------------------------------------- --------- heel heel heel foot heel -> foot heel toe heel -> foot -> toe heel wrist wrist wrist finger wrist -> finger wrist a rib a rib a rib b rib b rib b rib c rib c rib c rib hip hip hip thigh hip -> thigh hip knee hip -> thigh -> knee hip leg hip -> thigh -> knee -> leg hip ankle hip -> thigh -> knee -> leg -> ankle hip heel hip -> thigh -> knee -> leg -> ankle -> heel hip foot hip -> thigh -> knee -> leg -> ankle -> heel -> foot hip toe hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe hip toe toe toe ...
Now we can really see what this query is doing. Without the “start with” telling us which row is the root, any row could be the root. So we have all the paths we would get if the hip was the root, all the paths we would get if shoulder was root, and so on. Even the leaf nodes are represented in one-node paths.
We can cut out the leaf nodes by using another pseudocolumn, connect_by_isleaf. This takes a value of 1 if the node is a leaf node, 0 if it’s not. So to show only leaf nodes that have children:
SELECT bone, level, ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path, connect_by_root bone AS root FROM skeleton WHERE connect_by_isleaf=1 START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the ORDER siblings BY 1
BONE LEVEL PATH ROOT --------- ----- ----------------------------------------------------------------------------------------------- --------- finger 6 head -> neck -> shoulder -> arm -> wrist -> finger head a rib 5 head -> neck -> shoulder -> back -> a rib head b rib 5 head -> neck -> shoulder -> back -> b rib head c rib 5 head -> neck -> shoulder -> back -> c rib head toe 12 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe head
Now! I want to make our skeleton do some yoga. Let’s connect the head bone to the toe bone.
UPDATE skeleton SET connected_to_the='toe' WHERE bone='head';
This curls the skeleton up in a circle, and the path from head to toe and back to head again becomes a loop. We call this loop a “cycle”. And now if we run a hierarchical query we get an error about the loop:
SELECT bone, connected_to_the, level, ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path FROM skeleton START WITH bone='head' CONNECT BY prior bone=connected_to_the ORDER siblings BY 1
BONE CONNECTED LEVEL PATH --------- --------- ----- ----------------------------------------------------------------------------------------------- head toe 1 head neck head 2 head -> neck shoulder neck 3 head -> neck -> shoulder arm shoulder 4 head -> neck -> shoulder -> arm ... toe foot 12 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe head toe 13 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe -> head neck head 14 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe -> head -> neck ... thigh hip 18 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe -> head -> neck -> shoulder -> back -> hip -> thigh ERROR: ORA-01436: CONNECT BY loop in user data
If we know there’s a loop in the data and we still want to get results, we can use the NOCYCLE keyword. The syntax is “connect by nocycle”
SELECT bone, connected_to_the, level, ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path FROM skeleton START WITH bone='head' CONNECT BY NOCYCLE prior bone=connected_to_the ORDER siblings BY 1
BONE CONNECTED LEVEL PATH --------- --------- ----- ----------------------------------------------------------------------------------------------- head toe 1 head neck head 2 head -> neck shoulder neck 3 head -> neck -> shoulder arm shoulder 4 head -> neck -> shoulder -> arm wrist arm 5 head -> neck -> shoulder -> arm -> wrist finger wrist 6 head -> neck -> shoulder -> arm -> wrist -> finger back shoulder 4 head -> neck -> shoulder -> back a rib back 5 head -> neck -> shoulder -> back -> a rib b rib back 5 head -> neck -> shoulder -> back -> b rib c rib back 5 head -> neck -> shoulder -> back -> c rib hip back 5 head -> neck -> shoulder -> back -> hip thigh hip 6 head -> neck -> shoulder -> back -> hip -> thigh knee thigh 7 head -> neck -> shoulder -> back -> hip -> thigh -> knee leg knee 8 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg ankle leg 9 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle heel ankle 10 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel foot heel 11 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot toe foot 12 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe
Oracle also provides a pseudocolumn to tell us when we’re in a cycle: connect_by_iscycle. 0 when not a cycle, 1 when it is a cycle.
SELECT bone, connected_to_the, ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path, connect_by_iscycle AS iscycle FROM skeleton START WITH bone='head' CONNECT BY NOCYCLE prior bone=connected_to_the ORDER siblings BY 1
BONE CONNECTED PATH ISCYCLE --------- --------- ----------------------------------------------------------------------------------------------- ---------- head toe head 0 neck head head -> neck 0 shoulder neck head -> neck -> shoulder 0 arm shoulder head -> neck -> shoulder -> arm 0 wrist arm head -> neck -> shoulder -> arm -> wrist 0 finger wrist head -> neck -> shoulder -> arm -> wrist -> finger 0 back shoulder head -> neck -> shoulder -> back 0 a rib back head -> neck -> shoulder -> back -> a rib 0 b rib back head -> neck -> shoulder -> back -> b rib 0 c rib back head -> neck -> shoulder -> back -> c rib 0 hip back head -> neck -> shoulder -> back -> hip 0 thigh hip head -> neck -> shoulder -> back -> hip -> thigh 0 knee thigh head -> neck -> shoulder -> back -> hip -> thigh -> knee 0 leg knee head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg 0 ankle leg head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle 0 heel ankle head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel 0 foot heel head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot 0 toe foot head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe 1
The value of connect_by_iscycle is only 1 when we’re about to complete the cycle. The head node is connected to the toe node, but the head node is also an ancestor of the toe. This is useful for figuring out where the cycles are in the data.
In the last two posts, we’ve looked at querying hierarchical data using CONNECT BY. In the next post, we’ll look at newer syntax introduced in 11g: recursive WITH.
Illustration by Barbie de Plastico, used under cc license.