Dear Microsoft, I’m confused.

Last year I heard about your Professional Program for Data Science.

I’ve been following along, albeit slowly, as I’ve been supplementing your content with other MOOCs. But the point is I’ve been following along and still intend to.

Your content is good. Not the best, but good.

Here’s the thing though: Why suddenly announce Microsoft Advanced Analytics?

On the surface it looks all shiny and new with the focus on Cortana Intelligence and Machine Learning.

Looking under the hood though, I see the course catalog and certifications mirror those of the original program.

What gives?

Are these two different, or the same? Are they meant to be complementary? Where does one stop and the other begin?

SO. MANY. QUESTIONS.

Advertisements

Microsoft DAT208x: Introduction to Python for Data Science, a review

In my quest to complete the Microsoft Professional Program for Data Science, I took their course Introduction to Python for Data Science earlier this month to disappointing results.

It could be that I had very different expectations, or that I already have too much background in Python for another introductory course, but I wasn’t impressed and I’m loath to pay for the verified certificate.

This felt more like an overview than a proper introduction. If this was a university, this would have been the first day when the instructor gives out the syllabus and walks through the course expectations.

Would I discourage you from taking the course? Yes actually.

(To follow my progress on the program, check out the Microsoft Professional Program tag)

 

The Structure

DAT208x claims to “cover Python basics and prepare you to undertake data analysis using Python”. Similar to the Microsoft courses that come before it, it is a self-paced course comprised of video lectures and lab exercises.

The modules are as follows:

  1. Python Basics
  2. Lists
  3. Functions and Packages
  4. Numpy
  5. Plotting with Matplotlib
  6. Control Flow and Pandas

This course is brought to you by a partnership between Microsoft and Data Camp, the latter an online Data Science school similar to DataQuest. In an old post I mentioned my apprehension with Data Camp as I’ve heard they favor R over Python, but I decided to give them the benefit of the doubt and give their Python course a try.

Its due to this partnership that most of the lab activities are outside of edX. i.e., we’re redirected to DataCamp’s interface for the lab exercises.

These exercises are the meat of the course. If you’ve tried DataQuest before then the DataCamp interface should be familiar:

Instructions are to the left, interactive Python shell to the right. After submitting your answer DataCamp verifies if your code is correct.

Unlike other Microsoft courses I’ve tried, this one has a final exam. In this exam you are given 4 hours to answer 50 questions: a mixture of knowledge checks, pseudo coding, and actual coding.

Considering the quizzes, exercises, and final exam, you need to score at least 70% to pass the course. Pretty easy considering 40% is just course surveys.

 

Continue reading “Microsoft DAT208x: Introduction to Python for Data Science, a review”

Storytelling with Data: a book review and my takeaways

As a child, I loved telling stories. I’d take my favorite book and TV characters and create a world where they would oh-so-conveniently meet. Say, a magical anime girl wanders Narnia until she encounters the now-villainous Power Rangers.

As an adult in the corporate world, I still want to tell stories. But now I find that people are more critical of which stories I tell them.

It must be in the form of numbers, they said.

It’s a data-driven world, they said.

In Cole Nussbaumer Knaflic’s book Storytelling with Data, she argues we can do just that: tell stories with numbers.

language + math = data storytelling

She takes traditional storytelling concepts then re-interprets them for “adult-appropriate” tables and charts. She teaches us to edit our charts, the same way authors do their stories, by borrowing principles of visual design.

My key takeaways from the book can be found below (click for larger size), but they can be summarized as follows:

  1. Context is king. The form your data will take depends on your audience and what you want them to do with the data.
  2. Choose the right graph to best express the key message (I’ve made a flowchart in my notes to help with that).
  3. Following on #1, design around this message.
  4. Present your data as you would a story, with a beginning, middle, and end.
storytellingwithdata1
“Storytelling with Data” notes, by dannaisadork

P.S. Sorry about the terrible handwriting. My normal penmanship’s already pretty bad, but writing on a tablet made it worse!

 

Continue reading “Storytelling with Data: a book review and my takeaways”

A data journalism peg: NY Times on Uber’s psychological mind games.

The New York Times is right up there with the Guardian’s Datablog in my data journalism aspirations.

One of my favorite posts of theirs is Snow Fall: a coverage of the 2012 Tunnel Creek avalanche. Its a wonderful mixture of storytelling, visualizations, and traditional journalistic interviews.

Go check it out first, I promise you won’t regret it. Just don’t forget to come back.

Unlike the Datablog however, the Times doesn’t collate their data viz content into a single page (IKR? Not even a tag!), so I often miss out on great content unless it hits viral.

(Before you suggest I subscribe to the Times, did you know they publish about 230 pieces of content daily? I’m not willing to sift through that!)

So I’m glad I didn’t miss out on this latest one: their coverage on How Uber Uses Psychological Tricks to Push Its Drivers’ Buttons.

nyt_uber
This is a serious journalism piece. Not a game. I think.

What’s to like:

  • Interactive simulations!
  • The feature viz is a throwback to the 8-bit games of the 80s–which is kind of meta, given the post talks about how Uber experimented with video game techniques to maximize profit.
  • Charts. Charts. Charts. And interactive ones at that.
  • A union of social science with data science. How exciting! I like how they incorporated psychological vocabulary into the piece (e.g. loss aversion, ludic loop, binge-watching, etc).
  • “Uber exists in a kind of legal and ethical purgatory.” Please excuse me while I writer-geek out over this analogy.

Its a pretty length piece which will take about half an hour to get through, but I argue its worth it.

.xlsx files are secretly compressed!

There. I spilled the not-so-big secret. Excel files from Excel 2007 and above (.xlsx) are automatically compressed. A feature which, in all my years of using Excel, I never knew about.

I once received a large excel file from finance for analysis. Normally I would convert said file to CSV (comma separated values) as the latter:

  1. …is just data, no formatting. Exactly what I need for a data extract and nothing more.
  2. …tends to be more malleable across multiple applications.
  3. …and because of #s 1 and 2, tends to have a smaller file size.

So imagine my surprise when, upon converting to CSV, my 29 MB file ballooned to 115 MB.

Whuuuutttt???

Usually it’s the other way around. With all the formatting and formulas removed, the file size usually shrinks.

But apparently this is no longer the case when you have a lot of data. Once you go over a certain point, the amount of data you use matters more than the formatting.

Fortunately .xlsx is compatible with Power BI, which is where I was going to plug the data into anyway. I let the file type stay as is.

Makes for a convincing argument for the utilizing the Microsoft suite, eh?

(And in case your answer is no, let me argue that even technology research group Gartner agrees with me by crowning Microsoft king in business intelligence and analytics platforms.)