One of the many useful things that analytic SQL makes it easy to do is a running total – or running average, or other calculation.
ENAME SAL RUNNING_TOTAL ---------- ---------- ------------- ADAMS 1100 1100 ALLEN 1600 2700 BLAKE 2850 5550 CLARK 2450 8000 FORD 3000 11000 JAMES 950 11950 JONES 2975 14925 KING 5000 19925 MARTIN 1250 21175 MILLER 1300 22475 SCOTT 3000 25475 SMITH 800 26275 TURNER 1500 27775 WARD 1250 29025
You might expect that the syntax to get this running total would be a bit of special, analytic-sounding syntax like COMPUTE AT EACH ROW. Somewhat counterintuitively, though, the difference between the SQL that delivers a running total and the SQL that prints a cumulative total for each row is just an extra ORDER BY clause at the end of the partition clause.
eg.
SELECT ename, deptno, avg(sal) OVER (partition BY deptno) dept_avg FROM scott.emp ORDER BY deptno, ename;
ENAME DEPTNO DEPT_AVG ---------- ---------- ---------- CLARK 10 2916.66667 KING 10 2916.66667 MILLER 10 2916.66667 ADAMS 20 2175 FORD 20 2175 JONES 20 2175 SCOTT 20 2175 SMITH 20 2175 ALLEN 30 1566.66667 BLAKE 30 1566.66667 JAMES 30 1566.66667 MARTIN 30 1566.66667 TURNER 30 1566.66667 WARD 30 1566.66667
vs
SELECT ename, deptno, avg(sal) OVER (partition BY deptno ORDER BY deptno, ename) running_avg FROM scott.emp ORDER BY deptno, ename
ENAME DEPTNO RUNNING_AVG ---------- ---------- ----------- CLARK 10 2450 KING 10 3725 MILLER 10 2916.66667 ADAMS 20 1100 FORD 20 2050 JONES 20 2358.33333 SCOTT 20 2518.75 SMITH 20 2175 ALLEN 30 1600 BLAKE 30 2225 JAMES 30 1800 MARTIN 30 1662.5 TURNER 30 1630 WARD 30 1566.66667
Why does the second one print a running average? let’s break the analytic SQL down…
avg(sal) OVER (partition BY deptno ORDER BY deptno, ename)
1. the partition is by deptno
2. there’s no window specified, so the default window is “RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”
3. “order by deptno, ename” results in a running average ordered by deptno, ename.
Without the order by, the average is taken over the entire set of the partition. With the order by, it’s taken over the range. The range is the default window (“between unbounded preceeding and current row”) ie. all previous rows in the partition… aka running average.
Today’s image is from Nancy Wilson’s blog, where she credits it to AnatolyM.