Running totals in SQL

One of the many useful things that analytic SQL makes it easy to do is a running total – or running average, or other calculation.

ENAME		  SAL RUNNING_TOTAL
---------- ---------- -------------
ADAMS		 1100	       1100
ALLEN		 1600	       2700
BLAKE		 2850	       5550
CLARK		 2450	       8000
FORD		 3000	      11000
JAMES		  950	      11950
JONES		 2975	      14925
KING		 5000	      19925
MARTIN		 1250	      21175
MILLER		 1300	      22475
SCOTT		 3000	      25475
SMITH		  800	      26275
TURNER		 1500	      27775
WARD		 1250	      29025

You might expect that the syntax to get this running total would be a bit of special, analytic-sounding syntax like COMPUTE AT EACH ROW. Somewhat counterintuitively, though, the difference between the SQL that delivers a running total and the SQL that prints a cumulative total for each row is just an extra ORDER BY clause at the end of the partition clause.

eg.

SELECT ename, deptno, 
avg(sal) OVER (partition BY deptno) dept_avg 
FROM scott.emp 
ORDER BY deptno, ename;
ENAME	       DEPTNO	DEPT_AVG
---------- ---------- ----------
CLARK		   10 2916.66667
KING		   10 2916.66667
MILLER		   10 2916.66667
ADAMS		   20	    2175
FORD		   20	    2175
JONES		   20	    2175
SCOTT		   20	    2175
SMITH		   20	    2175
ALLEN		   30 1566.66667
BLAKE		   30 1566.66667
JAMES		   30 1566.66667
MARTIN		   30 1566.66667
TURNER		   30 1566.66667
WARD		   30 1566.66667

vs

SELECT ename, deptno, 
avg(sal) OVER (partition BY deptno ORDER BY deptno, ename) running_avg
FROM scott.emp 
ORDER BY deptno, ename
ENAME	       DEPTNO RUNNING_AVG
---------- ---------- -----------
CLARK		   10	     2450
KING		   10	     3725
MILLER		   10  2916.66667
ADAMS		   20	     1100
FORD		   20	     2050
JONES		   20  2358.33333
SCOTT		   20	  2518.75
SMITH		   20	     2175
ALLEN		   30	     1600
BLAKE		   30	     2225
JAMES		   30	     1800
MARTIN		   30	   1662.5
TURNER		   30	     1630
WARD		   30  1566.66667

Why does the second one print a running average? let’s break the analytic SQL down…

avg(sal) OVER (partition BY deptno ORDER BY deptno, ename)

1. the partition is by deptno
2. there’s no window specified, so the default window is “RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”
3. “order by deptno, ename” results in a running average ordered by deptno, ename.

Without the order by, the average is taken over the entire set of the partition. With the order by, it’s taken over the range. The range is the default window (“between unbounded preceeding and current row”) ie. all previous rows in the partition… aka running average.

Today’s image is from Nancy Wilson’s blog, where she credits it to AnatolyM.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.