Storytelling with Data: a book review and my takeaways

As a child, I loved telling stories. I’d take my favorite book and TV characters and create a world where they would oh-so-conveniently meet. Say, a magical anime girl wanders Narnia until she encounters the now-villainous Power Rangers.

As an adult in the corporate world, I still want to tell stories. But now I find that people are more critical of which stories I tell them.

It must be in the form of numbers, they said.

It’s a data-driven world, they said.

In Cole Nussbaumer Knaflic’s book Storytelling with Data, she argues we can do just that: tell stories with numbers.

language + math = data storytelling

She takes traditional storytelling concepts then re-interprets them for “adult-appropriate” tables and charts. She teaches us to edit our charts, the same way authors do their stories, by borrowing principles of visual design.

My key takeaways from the book can be found below (click for larger size), but they can be summarized as follows:

  1. Context is king. The form your data will take depends on your audience and what you want them to do with the data.
  2. Choose the right graph to best express the key message (I’ve made a flowchart in my notes to help with that).
  3. Following on #1, design around this message.
  4. Present your data as you would a story, with a beginning, middle, and end.
storytellingwithdata1
“Storytelling with Data” notes, by dannaisadork

P.S. Sorry about the terrible handwriting. My normal penmanship’s already pretty bad, but writing on a tablet made it worse!

 

Continue reading “Storytelling with Data: a book review and my takeaways”

.xlsx files are secretly compressed!

There. I spilled the not-so-big secret. Excel files from Excel 2007 and above (.xlsx) are automatically compressed. A feature which, in all my years of using Excel, I never knew about.

I once received a large excel file from finance for analysis. Normally I would convert said file to CSV (comma separated values) as the latter:

  1. …is just data, no formatting. Exactly what I need for a data extract and nothing more.
  2. …tends to be more malleable across multiple applications.
  3. …and because of #s 1 and 2, tends to have a smaller file size.

So imagine my surprise when, upon converting to CSV, my 29 MB file ballooned to 115 MB.

Whuuuutttt???

Usually it’s the other way around. With all the formatting and formulas removed, the file size usually shrinks.

But apparently this is no longer the case when you have a lot of data. Once you go over a certain point, the amount of data you use matters more than the formatting.

Fortunately .xlsx is compatible with Power BI, which is where I was going to plug the data into anyway. I let the file type stay as is.

Makes for a convincing argument for the utilizing the Microsoft suite, eh?

(And in case your answer is no, let me argue that even technology research group Gartner agrees with me by crowning Microsoft king in business intelligence and analytics platforms.)

How to properly use a pie chart

It might seem odd to talk about pie charts out of the blue, but let me guarantee that I want to talk about them precisely because of the timing.

It all began with a post from the Office for National Statistics discussing guidelines on the use of the dessert chart. Then over the weekend Cole Knaflic, author of Storytelling with Data, updated her stance on pies (hint: she still dislikes them).

Whether you love them or hate them, the humble pie chart is here to stay. But if it is to stay, we should at least make sure it stays in the right context. That its used the proper way.

 

What’s wrong with the pie?

So you might be wondering: Is there an improper way to use the pie chart?

Yes. Lots. Pie charts are one of the most difficult charts to use because it doesn’t have a common baseline.

When comparing values, we’re used to comparing off the same baseline. See the example below where its easy to say which bar is tallest and shortest because all the bars are along the same axis.

Without having to think I know #3 is the tallest, #5 is the shortest. BBC KS3 Maths: Bar Charts.

 

But with pie charts, there is no single axis to baseline off. Instead we’re comparing areas of sectors or arc lengths.

Arcs and sectors

What comes more natural to you? Getting the:

  • length of  a bar?
  • Or the arc length of a circle?

If you answered the latter, then go ahead and use pie charts.

But assuming you’re like most people, the second one takes takes some extra brain power to process. In data visualization, that extra cognitive load is a sign you’re doing something wrong. The chart is meant to make people’s lives easier by visualizing the data for them, not make it even harder.

If the values can be compared in a bar chart, go with the bar chart. Don’t make your audience do the extra math that comes with using a pie chart.

This problem is further compounded when the chart is in 3D:

The third dimension skews the physical appearance of the chart. This is called forced perspective:

Forced perspective is a technique which employs optical illusion to make an object appear farther away, closer, larger or smaller than it actually is. It manipulates human visual perception through the use of scaled objects and the correlation between them and the vantage point of the spectator or camera.

When in 3D, the sectors that are closer to the eye seem larger due to forced perspective, even though in reality they may be physically smaller. Look at that Firefox sector… doesn’t it seem larger than I.E.?

This defeats the purpose of the chart, which is meant to represent the relative sizes of data.

 

When should I use the pie?

So, we now know the pie chart isn’t very good at comparing values. But there is one thing the pie is good at, even better than any other chart I know of.

It’s very good at representing something is part of a whole.

When we think of pie charts, what comes to mind is the slice of pie, not necessarily the whole pie itself.

Pie charts are excellent at showing the composition of a whole (i.e., that the sum of parts is 100%).

But does this mean we should always be using pie charts to show composition?

Again, no. As always it depends on context.

If we’re comparing the relative magnitudes of the parts that comprise the whole, again the bar chart wins. Its comparing values after all.

But if we just care to emphasize that yes, these parts comprise the whole, regardless of by how much, then the pie chart wins.

It all depends on where you’re putting emphasis on. Again, it depends on context.

Pie charts are especially effective for single values that are relatively small compared to the whole. See example 10% pie chart above.

Think: Into how many slices do you usually cut your pizza?

Chances are, your average slice looks like the pie chart above. It works because it matches our mental image of what a slice of pie (or pizza) is supposed to look like.

So, part of a whole, and even better if that part is small. When you need to represent your data in such a way then pie charts are the way to go.

Otherwise, stick to other charts that better suit your purpose.

 

Note that this is bar any aesthetic considerations, such as having lots of roundish shapes in the same page. In that case you’re going to have to consider which is more important: context or design. I’d say context, but this always leads me to arguments with my art-inclined friends 🙂

 

UPDATE: I found this great post which talks through use cases of when using a pie chart is okay.

I measured my productivity for a month. Here’s what I learned.

I’m not a morning person.

Given the choice, I’d rather sleep in and work after lunch. Like most people though, I don’t have that choice.

But am I under-performing because my brain isn’t fully awake yet?

I have a typical 9 to 5 job. I often come in earlier to accommodate my global team’s timezone differences (but offset by leaving early as well). I’d come in all bleary-eyed, head floating in the clouds, rushing to get my first shot of caffeine.

At one point I questioned,

Am I under-performing because my brain isn’t fully awake yet? Am I selling myself short just because I’m forcing myself to work against what’s natural?

I asked a few colleagues and even my manager, and they assured me I wasn’t under-performing at all.

Paranoid as I am though, I decided to put numbers to my feelings so I could make some logical analyses.

For 3 to 4 weeks I measured my productivity, hour by hour, by evaluating my energy, focus, and motivation with a number between 1 to 10.

I limited myself to weekdays because I knew my weekends were too spontaneous to measure. I also wrote little notes to give context to my scores, such as “drank coffee” or “back-to-back calls”.

The data by the month’s end was revealing.

 

Mornings

Morning productivity measured

Since I have to have a morning cup of joe, all three traits start to climb after breakfast until they peak at around 9 AM. From there, the three diverge.

Energy is consistently high while motivation and focus start to dip. I suspect focus relates to a caffeine crash, while the other two to my morning schedule.

Why? Because my mornings are usually reserved for meetings. Whether its physical meetings where I hop from room to room and floor to floor, or virtual meetings where I talk to people over the phone or video.

This is my most physical part of the day, hence the high energy.

Yet, its also in those meetings that issues come up. Hearing bad news is never a good way to start the day. It’s a possible source of demotivation.

 

Continue reading “I measured my productivity for a month. Here’s what I learned.”

Microsoft DAT206x: Analyzing and Visualizing Data with Excel Review

You never actually analyze and visualize data, but this course is worth taking as it’s a good introduction to using Power Pivot and Power Query–both of which are useful for managing large amounts of data in Excel. Just make sure you manage your expectations.

Update: To follow my progress in this program, check the Microsoft Professional Program tag.

 

Context

For those who are following this blog for my data science updates, it might be of interest to you that I am still working on Microsoft’s Professional Program for Data Science  (on beta). I have recently completed my second course, Analyzing and Visualizing Data with Excel.

This was my gateway course to the program. Excel enthusiasts at work had recommended it as a good introduction to PowerPivot, and it was only later that I found out the course was part of a larger data science program.

My primary purpose for taking the course was increasing my proficiency in Excel. I currently manage a large-scale project with an equally large-scale tracking spreadsheet. The spreadsheet easily gets out of hand due to the sheer number of assets involved and because it pulls data regularly from multiple data sources. I was hoping the course would help me clean up the data and make it sustainable to maintain in the long run.

Because of this, I’m reviewing the course from a more practical Can I use this at work? perspective rather than its relation (or lack of) to data science.

It took me about a month to complete, starting September 2016. You can follow my progress in the MS Data Science Program by using my tag Microsoft Professional Program.

Continue reading “Microsoft DAT206x: Analyzing and Visualizing Data with Excel Review”