In my last post, I briefly covered what hierarchical data is and how it’s represented in an RDBMS using the adjacency list model. In this post, we’ll look at querying hierarchical data with Oracle SQL’s CONNECT BY construct, using a really fun example.
Let’s get into the example. We could use the EMP table, but I want to do something a little more fun. If you’ve ever heard the song “Dem Bones”, you know the toe bone’s connected to the foot bone and the neck bone’s connected to the head bone.
Toe bone connected to the foot bone
Foot bone connected to the heel bone
Heel bone connected to the ankle bone
Ankle bone connected to the shin bone
Shin bone connected to the knee bone
Knee bone connected to the thigh bone
Thigh bone connected to the hip bone
Hip bone connected to the back bone
Back bone connected to the shoulder bone
Shoulder bone connected to the neck bone
Neck bone connected to the head bone
Since every bone has only one ancestor, and there is a root bone with no ancestor, this is hierarchical data and we can stick it in a table and query it.
CREATE TABLE skeleton (bone varchar2(20), connected_to_the varchar2(20)); INSERT INTO skeleton VALUES ('shoulder','neck'); INSERT INTO skeleton VALUES ('back','shoulder'); INSERT INTO skeleton VALUES ('hip','back'); INSERT INTO skeleton VALUES ('thigh','hip'); INSERT INTO skeleton VALUES ('knee','thigh'); INSERT INTO skeleton VALUES ('leg','knee'); INSERT INTO skeleton VALUES ('foot','heel'); INSERT INTO skeleton VALUES ('head',''); INSERT INTO skeleton VALUES ('neck','head'); INSERT INTO skeleton VALUES ('toe','foot'); INSERT INTO skeleton VALUES ('ankle','leg'); INSERT INTO skeleton VALUES ('heel','ankle'); commit; SELECT * FROM skeleton; BONE CONNECTED_TO_THE ---------------------------------------- ---------------------------------------- shoulder neck back shoulder hip back thigh hip knee thigh leg knee foot heel head neck head toe foot ankle leg heel ankle
To keep it simple, there’s just two columns. There’s the bone, and there’s the connected_to_the , which is the parent.
You can see that for each row, we have the relationship where the connected_to value matches the parent bone value. For the neck bone, the parent bone is the head. The value of connected_to_the matches the parent, or prior, row’s value of bone. We express this in SQL using CONNECT BY PRIOR.
SELECT bone, connected_to_the FROM skeleton START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the ; BONE CONNECTED_TO_THE -------------------- -------------------- head neck head shoulder neck arm shoulder wrist arm finger wrist back shoulder a rib back b rib back c rib back hip back thigh hip knee thigh leg knee ankle leg heel ankle foot heel toe foot
I slipped an additional clause in there: start with. We’ll get to that in a moment. First I want to cover the level pseudocolumn.
SELECT bone, connected_to_the, level FROM skeleton START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the ;
The LEVEL shows how many steps down the tree we have gone from the root. The root node is the first level. So the head bone gets level 1, and the toe bone gets level 12.
BONE CONNECTED_TO_THE LEVEL -------------------- -------------------- ---------- head 1 neck head 2 shoulder neck 3 back shoulder 4 hip back 5 thigh hip 6 knee thigh 7 leg knee 8 ankle leg 9 heel ankle 10 foot heel 11 toe foot 12
Now let us get some formatting in here. One easy way is to use the lpad function.
The lpad function takes 3 arguments. The string to be padded, the pad length, and the character to pad with, which defaults to a space. We want the nodes that are farther down the tree to be indented farther. The level pseudocolumn tells us how far down the tree we are. So we can use the level to tell lpad how many spaces to put in front of the node name.
SELECT bone, connected_to_the, level, lpad(' ',2*level, '-') || bone AS bone_tree FROM skeleton START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the ;
BONE CONNECTED_TO_THE LEVEL BONE_TREE -------------------- -------------------- ---------- --------------------------------------------- head 1 - head neck head 2 --- neck shoulder neck 3 ----- shoulder back shoulder 4 ------- back hip back 5 --------- hip thigh hip 6 ----------- thigh knee thigh 7 ------------- knee leg knee 8 --------------- leg ankle leg 9 ----------------- ankle heel ankle 10 ------------------- heel foot heel 11 --------------------- foot toe foot 12 ----------------------- toe 12 rows selected.
Now let’s come back to the START WITH clause. What happens if I comment that out?
SELECT bone, connected_to_the, level, lpad(' ',2*level, '-') || bone AS bone_tree FROM skeleton --start with connected_to_the is null CONNECT BY prior bone=connected_to_the ;
BONE CONNECTED_TO_THE LEVEL BONE_TREE -------------------- -------------------- ---------- --------------------------------------------- heel ankle 1 - heel foot heel 2 --- foot toe foot 3 ----- toe hip back 1 - hip thigh hip 2 --- thigh knee thigh 3 ----- knee leg knee 4 ------- leg ankle leg 5 --------- ankle heel ankle 6 ----------- heel foot heel 7 ------------- foot toe foot 8 --------------- toe toe foot 1 - toe neck head 1 - neck ... foot heel 1 - foot toe foot 2 --- toe ... head 1 - head neck head 2 --- neck shoulder neck 3 ----- shoulder back shoulder 4 ------- back hip back 5 --------- hip thigh hip 6 ----------- thigh knee thigh 7 ------------- knee leg knee 8 --------------- leg ankle leg 9 ----------------- ankle heel ankle 10 ------------------- heel foot heel 11 --------------------- foot toe foot 12 ----------------------- toe 78 rows selected.
The resultset blows up! START WITH is used to specify the root node. When we don’t specify START WITH, any node could be the root node, so the database returns all possibilities. The two-node tree with the foot as the root node, the tree with the hip as the root, and so on. When we specify start with, we only see the results with the specified node at the root.
Oracle also provides a built-in function, sys_connect_by_path, that lists all the column values from root to the node. It takes two arguments, the column name and a varchar2 to print between the column values.
SELECT bone, connected_to_the, sys_connect_by_path(bone,' -> ') AS path FROM skeleton START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the ;
To neaten it up we can trim the first occurrence of the varchar2 off the head node:
SELECT bone, connected_to_the, ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path FROM skeleton START WITH connected_to_the IS NULL CONNECT BY prior bone=connected_to_the ;
BONE CONNECTED PATH --------- --------- ---------------------------------------------------------------------------------------------------- head head neck head head -> neck shoulder neck head -> neck -> shoulder back shoulder head -> neck -> shoulder -> back hip back head -> neck -> shoulder -> back -> hip thigh hip head -> neck -> shoulder -> back -> hip -> thigh knee thigh head -> neck -> shoulder -> back -> hip -> thigh -> knee leg knee head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg ankle leg head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle heel ankle head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel foot heel head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot toe foot head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe
So far so good. But, this skeletal tree is lacking branches! The skeleton is lacking arms and any kind of torso. “Dem Bones” is a great song, but it comes up short in the arms and torso department. In my next post, I’ll give the skeleton an arm and some ribs.
Today’s illustration is The Disarticulated Skeleton, from the book “Magic; Stage Illusions and Scientific Diversions, Including Trick Photography” (1897) written by Albert Allis Hopkins and Henry Ridgely Evans. From plaisanter‘s photostream, shared under cc license.