CONNECT BY, Part I

In my last post, I briefly covered what hierarchical data is and how it’s represented in an RDBMS using the adjacency list model. In this post, we’ll look at querying hierarchical data with Oracle SQL’s CONNECT BY construct, using a really fun example.

Let’s get into the example. We could use the EMP table, but I want to do something a little more fun. If you’ve ever heard the song “Dem Bones”, you know the toe bone’s connected to the foot bone and the neck bone’s connected to the head bone.

Toe bone connected to the foot bone
Foot bone connected to the heel bone
Heel bone connected to the ankle bone
Ankle bone connected to the shin bone
Shin bone connected to the knee bone
Knee bone connected to the thigh bone
Thigh bone connected to the hip bone
Hip bone connected to the back bone
Back bone connected to the shoulder bone
Shoulder bone connected to the neck bone
Neck bone connected to the head bone

Since every bone has only one ancestor, and there is a root bone with no ancestor, this is hierarchical data and we can stick it in a table and query it.

CREATE TABLE skeleton (bone varchar2(20), connected_to_the varchar2(20));
INSERT INTO skeleton VALUES ('shoulder','neck');
INSERT INTO skeleton VALUES ('back','shoulder');
INSERT INTO skeleton VALUES ('hip','back');
INSERT INTO skeleton VALUES ('thigh','hip');
INSERT INTO skeleton VALUES ('knee','thigh');
INSERT INTO skeleton VALUES ('leg','knee');
INSERT INTO skeleton VALUES ('foot','heel');
INSERT INTO skeleton VALUES ('head','');
INSERT INTO skeleton VALUES ('neck','head');
INSERT INTO skeleton VALUES ('toe','foot');
INSERT INTO skeleton VALUES ('ankle','leg');
INSERT INTO skeleton VALUES ('heel','ankle');
commit;
 
SELECT * FROM skeleton;
 
BONE                                     CONNECTED_TO_THE
---------------------------------------- ----------------------------------------
shoulder                                 neck
back                                     shoulder
hip                                      back
thigh                                    hip
knee                                     thigh
leg                                      knee
foot                                     heel
head
neck                                     head
toe                                      foot
ankle                                    leg
heel                                     ankle

To keep it simple, there’s just two columns. There’s the bone, and there’s the connected_to_the , which is the parent.

You can see that for each row, we have the relationship where the connected_to value matches the parent bone value. For the neck bone, the parent bone is the head. The value of connected_to_the matches the parent, or prior, row’s value of bone. We express this in SQL using CONNECT BY PRIOR.

SELECT bone, connected_to_the  FROM skeleton
START WITH connected_to_the IS NULL
CONNECT BY prior bone=connected_to_the ;
 
BONE		     CONNECTED_TO_THE
-------------------- --------------------
head
neck		     head
shoulder	     neck
arm		     shoulder
wrist		     arm
finger		     wrist
back		     shoulder
a rib		     back
b rib		     back
c rib		     back
hip		     back
thigh		     hip
knee		     thigh
leg		     knee
ankle		     leg
heel		     ankle
foot		     heel
toe		     foot

I slipped an additional clause in there: start with. We’ll get to that in a moment. First I want to cover the level pseudocolumn.

SELECT bone, connected_to_the, level  FROM skeleton
START WITH connected_to_the IS NULL
CONNECT BY prior bone=connected_to_the ;

The LEVEL shows how many steps down the tree we have gone from the root. The root node is the first level. So the head bone gets level 1, and the toe bone gets level 12.

BONE                 CONNECTED_TO_THE          LEVEL
-------------------- -------------------- ----------
head                                               1
neck                 head                          2
shoulder             neck                          3
back                 shoulder                      4
hip                  back                          5
thigh                hip                           6
knee                 thigh                         7
leg                  knee                          8
ankle                leg                           9
heel                 ankle                        10
foot                 heel                         11
toe                  foot                         12

Now let us get some formatting in here. One easy way is to use the lpad function.

The lpad function takes 3 arguments. The string to be padded, the pad length, and the character to pad with, which defaults to a space. We want the nodes that are farther down the tree to be indented farther. The level pseudocolumn tells us how far down the tree we are. So we can use the level to tell lpad how many spaces to put in front of the node name.

SELECT bone, connected_to_the, level, lpad(' ',2*level, '-') || bone AS bone_tree 
FROM skeleton
START WITH connected_to_the IS NULL
CONNECT BY prior bone=connected_to_the ;
BONE                 CONNECTED_TO_THE          LEVEL BONE_TREE
-------------------- -------------------- ---------- ---------------------------------------------
head                                               1 - head
neck                 head                          2 --- neck
shoulder             neck                          3 ----- shoulder
back                 shoulder                      4 ------- back
hip                  back                          5 --------- hip
thigh                hip                           6 ----------- thigh
knee                 thigh                         7 ------------- knee
leg                  knee                          8 --------------- leg
ankle                leg                           9 ----------------- ankle
heel                 ankle                        10 ------------------- heel
foot                 heel                         11 --------------------- foot
toe                  foot                         12 ----------------------- toe
 
12 rows selected.

Now let’s come back to the START WITH clause. What happens if I comment that out?

SELECT bone, connected_to_the, level, lpad(' ',2*level, '-') || bone AS bone_tree 
FROM skeleton
--start with connected_to_the is null
CONNECT BY prior bone=connected_to_the ;
BONE                 CONNECTED_TO_THE          LEVEL BONE_TREE
-------------------- -------------------- ---------- ---------------------------------------------
heel                 ankle                         1 - heel
foot                 heel                          2 --- foot
toe                  foot                          3 ----- toe
hip                  back                          1 - hip
thigh                hip                           2 --- thigh
knee                 thigh                         3 ----- knee
leg                  knee                          4 ------- leg
ankle                leg                           5 --------- ankle
heel                 ankle                         6 ----------- heel                             
foot                 heel                          7 ------------- foot
toe                  foot                          8 --------------- toe
toe                  foot                          1 - toe         
neck                 head                          1 - neck      
...
foot                 heel                          1 - foot
toe                  foot                          2 --- toe                          
...
head                                               1 - head      
neck                 head                          2 --- neck     
shoulder             neck                          3 ----- shoulder
back                 shoulder                      4 ------- back
hip                  back                          5 --------- hip
thigh                hip                           6 ----------- thigh
knee                 thigh                         7 ------------- knee
leg                  knee                          8 --------------- leg
ankle                leg                           9 ----------------- ankle
heel                 ankle                        10 ------------------- heel
foot                 heel                         11 --------------------- foot
toe                  foot                         12 ----------------------- toe
 
78 rows selected.

The resultset blows up! START WITH is used to specify the root node. When we don’t specify START WITH, any node could be the root node, so the database returns all possibilities. The two-node tree with the foot as the root node, the tree with the hip as the root, and so on. When we specify start with, we only see the results with the specified node at the root.

Oracle also provides a built-in function, sys_connect_by_path, that lists all the column values from root to the node. It takes two arguments, the column name and a varchar2 to print between the column values.

SELECT bone, connected_to_the, 
sys_connect_by_path(bone,' -> ') AS path FROM skeleton
START WITH connected_to_the IS NULL
CONNECT BY prior bone=connected_to_the ;

To neaten it up we can trim the first occurrence of the varchar2 off the head node:

SELECT bone, connected_to_the, 
ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path FROM skeleton
START WITH connected_to_the IS NULL
CONNECT BY prior bone=connected_to_the ;
BONE      CONNECTED PATH
--------- --------- ----------------------------------------------------------------------------------------------------
head                head
neck      head      head -> neck
shoulder  neck      head -> neck -> shoulder
back      shoulder  head -> neck -> shoulder -> back
hip       back      head -> neck -> shoulder -> back -> hip
thigh     hip       head -> neck -> shoulder -> back -> hip -> thigh
knee      thigh     head -> neck -> shoulder -> back -> hip -> thigh -> knee
leg       knee      head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg
ankle     leg       head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle
heel      ankle     head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel
foot      heel      head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot
toe       foot      head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe

So far so good. But, this skeletal tree is lacking branches! The skeleton is lacking arms and any kind of torso. “Dem Bones” is a great song, but it comes up short in the arms and torso department. In my next post, I’ll give the skeleton an arm and some ribs.

Today’s illustration is The Disarticulated Skeleton, from the book “Magic; Stage Illusions and Scientific Diversions, Including Trick Photography” (1897) written by Albert Allis Hopkins and Henry Ridgely Evans. From plaisanter‘s photostream, shared under cc license.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.