SQL vs. Excel – Subgroup medians

Recently I ran across this post on how to do subgroup medians in Excel 2010. First you need to create a pivot table, then “do some copying and pasting and use a formula to make it happen”. In SQL you can do this with one command.

Suppose that you have the same table as the Excel article, something like this:

CREATE TABLE sampletab
(arrest_day_of_week varchar2(10), 
arrest_ts TIMESTAMP, 
fingerprint_ts TIMESTAMP, 
days_between NUMBER);

and you want to get the median value of days_between for each day of the week.

The steps in Excel apparently go like this:

  1. Create pivot table to get the means
  2. Copy and paste the column values from the pivot table
  3. For Sunday, create an IF formula to include a cell’s days_between in the median calculation only if the arrest_day_of_week for that row is Sunday
  4. Repeat for other six days

Ouch!

In SQL, it’s one simple statement:

SELECT arrest_day_of_week, median(days_between) AS median_days_between FROM sampletab
GROUP BY arrest_day_of_week;

Conclusion – if you’re into data analysis, SQL can be a big time-saver!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.