Microsoft DAT206x: Analyzing and Visualizing Data with Excel Review

You never actually analyze and visualize data, but this course is worth taking as it’s a good introduction to using Power Pivot and Power Query–both of which are useful for managing large amounts of data in Excel. Just make sure you manage your expectations.

Update: To follow my progress in this program, check the Microsoft Professional Program tag.

 

Context

For those who are following this blog for my data science updates, it might be of interest to you that I am still working on Microsoft’s Professional Program for Data Science  (on beta). I have recently completed my second course, Analyzing and Visualizing Data with Excel.

This was my gateway course to the program. Excel enthusiasts at work had recommended it as a good introduction to PowerPivot, and it was only later that I found out the course was part of a larger data science program.

My primary purpose for taking the course was increasing my proficiency in Excel. I currently manage a large-scale project with an equally large-scale tracking spreadsheet. The spreadsheet easily gets out of hand due to the sheer number of assets involved and because it pulls data regularly from multiple data sources. I was hoping the course would help me clean up the data and make it sustainable to maintain in the long run.

Because of this, I’m reviewing the course from a more practical Can I use this at work? perspective rather than its relation (or lack of) to data science.

It took me about a month to complete, starting September 2016. You can follow my progress in the MS Data Science Program by using my tag Microsoft Professional Program.

Prerequisites

The course builds heavily on the use of Pivot Tables in Excel, so if you aren’t familiar with that I would recommend AGAINST proceeding with the rest of the course. There are a number of pivot tutorials available so just pick one you’re comfortable with before moving forward.

The course also makes references to other Excel functions such as INDEX, MATCH, and VLOOKUP. These aren’t necessary to the course but would be good to know.

For some modules, a background in querying in SQL will come in handy. This is probably why the recommended MS Data Science curriculum places Querying with Transact-SQL prior to the Excel course. Personally I’m not familiar with SQL, but I managed to get by with using similar logic from other querying languages such as JQL*.

In terms of tools, you will need Excel 2010 at the minimum, though I’d recommend 2013 or higher.

Introduction

We are introduced to Dany Hoter who will be our lecturer for the course. Immediately I know Dany and I will become good friends, because his introduction video already addresses one of my greatest Excel pain points, vlookups:

“So the challenges here are the following.
First of all the need to combine all the data together with Vlookups.
It has its flaws, its problems.
First of all, Vlookups can get very slow.

If you notice in the bottom, in the status line it says calculating and it has a progress bar.
Going, going, going.
Still going.
And only now all the calculations were actually completed.
The reason that it’s a simple operation of changing one cell took so long is because this range is used in thousands and thousands of Vlookup expressions, and all of them needed to be calculated.
So, vlookups can be slow.
And this is slow, it can be actually a lot slower.
And now vlookups can also be difficult to maintain and can be error prone.
Another problem is that the data that we use is not always as simple as this one.”

–Dany Hoter, Data Analysis in Excel Lecture

THIS. This is the reason why I ended up in the course in the first place.

My spreadsheet at work only has about 4000+ rows, and yet whenever I make an update my laptop freezes while the vlookup formulas re-calculate. That “calculating… going, going, going…” part? I’m all too familiar with it. A few years back I managed what was to be a precursor to an online database–the “calculating” part happened so often that I set up a second laptop just for Excel.

So did the course make my sheet more manageable? Not quite, but it did point me in the right direction.

Positives

The course is structured in such a way that it teaches you how to import different forms of data into Excel:

  • How to import from a CSV file
  • How to import from databases
  • How to import data from multiple files
  • How to import from a pre-formatted report

etc. In the process it teaches you what are the best tools to use for the above imports (mostly PowerPivot and PowerQuery), how to perform some data clean-up prior and upon import, and how to mash up data imported from multiple sources.

My key takeaway from the course is the Excel Data Model. Using data models would allow me to use spreadsheets as I would database tables. Each table would contain different sets of data, and I could relate specific data to one another…  If done right, it would make vlookups irrelevant.

These data models are accessed from PowerPivot: a feature available by default in Excel 2016, needs to be enabled as an add-in for Excel 2013, and is available for download for Excel 2010 here. I use 2013 and the add-in works just fine.

Like pivots, you can use the data model to create pivot charts, but this time much more powerful ones due to the number of data and relationships available. For fancier charts you can plug your sheet into Power BI, a Microsoft Business Intelligence application with more visualization features compared to Excel.

The course also introduces the use of DAX (Data Analysis Expression). The closest analogy I can think of would be the language used to create Excel formulas (think back to index, match, and vlookup), but used for data models. It allows you to create calculations from the imported data, then treat those calculations as new data to manipulate.

All these combined made my brain whir with possibilities: The YoY (year-on-year) comparison charts used in the examples? I could see those being used for tech alerts… then used to make tech strategies… and then those would be the driver of the projects I’ll eventually manage. This is definitely something I can use at work.

Negatives

But in spite of all that, my spreadsheet, my cause for taking this course, still remains unmanageable.

The problem is the course gives so much focus to importing data, but not what to do after. For example, I’ve managed to load the tables I need into my data model, but I get errors whenever I try to relate them. The data also doesn’t mash up neatly whenever they come from multiple sources.

The course also takes the naive assumption that the data is relatively clean. In a few examples, Dany walks through how to clean “messy” data. He even shows that the data model is already recording his clean-up steps, much like a macro, so he only needs to do the clean-up once then Excel will automagically clean-up in the future.

In reality, data is often much messier. Excel is also a poor tool to use for data extraction (Springboard’s book actually recommends Python instead, and just using Excel for analysis). The course never actually shows how to use the recorded clean-up steps, just that it exists.

The course will actually point to Microsoft support pages many times, recommending the user to learn more about the features mentioned.

Because of this, I’m still looking for a proper PowerPivot tutorial to use for my spreadsheet. This course is simply not enough to be of use on it’s own. If you’ve got any recommendations please drop me a line.

Issues

Module 4 of the course is exclusively on the use of the Power Query add-in. Cool in theory but problematic in practice, especially when performing the lab section.

My first problem was that it required that I be outside of a firewall. This was simple enough. I just needed to work from a different network, and any coffee shop Wi-Fi would do nicely .

The second problem was that in spite of following the instructions to a T, I would get the below DataSource permission error:

excel

which apparently isn’t a unique problem as it’s mentioned specifically in Office’s April 2016 update. I’ve called it out to the community TA, and upon checking there’s a new announcement on the lab page that says,

excel2.png

Annoying I admit, but quirks like these are expected for a course that’s on beta. I don’t have the patience to try the workarounds so I’ve skipped the whole of Module 4 and thankfully, I still managed to pass the course.

I plan to give PowerQuery another try later, but for personal use, because I’m intrigued  by the fact it can pull data from Facebook.

Conclusion

The course title is misleading. There’s very little actual data analysis and visualization performed in this course. Instead, it teaches you multiple ways to import data into Excel, most notably through PowerPivot and PowerQuery.

Once imported, then you can begin to make analyses and visualizations, but these are no longer covered by the course. To make this course usable it would need to be supplemented by more thorough lessons on PowerPivot, PowerQuery, and DAX.

In spite of all these cons, I would still recommend the course. I use Excel on a regular basis and yet I had no idea these features exist, so to be introduced to all of it in one go is quite convenient. I can cherry pick which ones I’m curious about (PowerPivot definitely) and study about that.

If I ever get to rename the course I’d call it, “Overview on How to Import Large Amounts of Data into Excel.” It doesn’t have the same ring to it but it’s much more accurate to the content.

The last module shows off a few features of Power BI and how Excel integrates into it. The visualizations are exactly the kind of thing that excites me about Excel (I love charts, in case you didn’t know). There’s a separate Power BI course in edX though in terms of the MS Data Science program, you only need to take either this or the Excel course to qualify, not both.

Be mindful though that the Power BI course requires you to install Power BI Desktop. This is the primary reason why I haven’t gone through the course yet–the application hasn’t been approved for use in my office.

*JQL = JIRA Query Language.

 

Image: Solar system and comets in relation to other solar systems, about 1780. Ref F2776.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s