Truthier Bars in Excel

By Bob Rudis (@hrbrmstr)
Thu 14 May 2015 | tags: blog, excel, datavis, -- (permalink)

I saw some chatter about a post on spam and new gTLDs on Kasperky’s SecureList and initally got excited that there might be actual data to look at since our work-team started looking at this very topic last year but got distracted by the 2015 DBIR work (we’re hoping to pick up on it again as things settle down a bit). Needless to say that my elation waned quickly, but the purpose of this post is not to comment on the overall report. After scrolling through the content, I felt compelled to point out something our readers should never ever do. That would be this:

I have no issue with the use of European decimals (which are commas). I do have an issue with the y-axis not starting at 0% as it makes it look like there is a vast difference between the values. I’m not casting (much) blame at Kaspersky since this is what Excel will do by default. Yes, Excel helps you mislead with data by default (I validated this with the most recent beta of the new Excel for Mac). Since Excel was no doubt the culprit, I used Excel to fix the problem and create a more authentic chart:

I also got rid of some chart junk (one could go even more minimal, too).

The visual differences are not nearly as stark as the original chart would indicate and both the variance (0.001) and standard deviation (0.038) are really small, meaning there’s also not much difference statistically.

You can grab the Excel workbook and have a look at the data and result. Note that I had to add the y-axis, change the range, then delete the y-axis to correct the default (and bad) Excel defaults. Alas, there is no nice script to post since you have to do all the time-consuming mouse-clicks, deletes and box-value-fills on your own to reproduce from scratch.

Remember that your eyes and your mind are smarter than your tools. Don’t rely on them to tell the story for you. Don’t assume they are smarter than you. Ensure they’re helping you tell the messages that are in the data and are doing so truthfully and as clearly as possible.

comments powered by Disqus