VALUE
IMPROVEMENT
LEADERS
TOPIC #16
660 words + 1 activity  | 30 minutes (3 to read, 27 to plot a box and whisker)
BOX AND WHISKER PLOT
PRINCIPLE
A box and whisker plot is great for visualizing variation within and between groups.

TOOLS
• The box and whisker plot
• An online box and whisker plot generator 

APPLICATION
Build a box and whisker plot to discover which process variables are influencing variation in your important output variables.
This article borrows heavily from the Steve’s Dojo 2-part post on the same topic:
Adding to the Visual Analysis Toolkit

Hopefully we’ve convinced you of the communicative power of the histogram and its partner, the run chart. Our introductory statement about them was, “To understand the variation of a continuous variable, at a minimum you need a histogram and a run chart.” 

Today we add the box and whisker plot (sometimes shortened to “box plot”) to that visual toolkit. If you want to visualize variation within and between groups, box and whisker plots are a great option because they allow side-by-side comparison of segmented data on a single graph while still providing a notion of the distribution.
A Simplified Visual Distribution

A box and whisker plot is a distribution divided into quartiles rather than bins. In a box and whisker plot, the lengths of the quartiles indicate range. The lower quartile is defined as the minimum data point to the 25th percentile and it contains the lowest 25% of your data. The 25th to 50th percentile is the lower interquartile, and so on. The 25th to 75th percentile is known as the inter quartile range (IQR) and contains the middle 50% of your data. 

Here’s a box and whisker lined up with a histogram to demonstrate it is also a distribution. The upper quartile is much taller than the three lower quartiles in this dataset, suggesting a right skewed distribution. Note: There are many options for setting whisker lengths. In this image, high-end outliers are left out of the box and whisker but appear on the histogram.
Visualizing Variation Within and Between Groups

Histograms are superior to box and whisker if you’re looking at a single variable because they contain more information about the data. But let’s say you’re investigating cost variation among three variables: discharge day of week, performing provider, and DRGs. How would you do it? You could build histograms of each (16 total in this example) but that would be unwieldy for the reader to compare. You could perform a multivariate regression; power to you if you have the chops for that. Or you could produce this graph which will be good enough for most projects. These three plots, side-by-side and set to the same scale, help visualize variation within and between the three variables.
There are many observations to be made about the fictional dataset with this one graph: 

  • The variable with the greatest variation is DRG; lowest is provider.
  • Cases discharged on Mon and Tue are the only group with medians below $40K. 
  • Sat discharges may be less costly than Sun discharges. A two sample t-test could confirm.
  • Top quartile cases tend to get more expensive as the week progresses to Friday. 
  • The upper quartiles tend to be much taller than the lower quartiles, suggesting a right skewed distribution across the entire dataset. 
  • The upper quartiles are the most variable with spans ranging from ~$14K (DRG 222) to ~$37K (DRG 666). We don’t see this much variability in any other quartiles. 
  • The medians all hover around $40K with a range of around $6K. 
  • DRG 222 has the tightest dispersion and is similar to 444; 666 has the widest. 
  • 333, 555, and 777 share similar distributions. 
  • There seems to be a natural cost floor around $27K. 

If this were real data, our rather simple exploration exercise would certainly influence the direction of a project.
How to Build These Cool Charts

LINKS

Quickly locate all course videos, slides, and previous emails here .
LEARN  |  CONNECT  |  EXPLORE  |  ABOUT
Accelerate | University of Utah | healthsciences.utah.edu/accelerate
Questions? Email:  kim.mahoney@hsc.utah.edu