A data journalism peg: NY Times on Uber’s psychological mind games.

The New York Times is right up there with the Guardian’s Datablog in my data journalism aspirations.

One of my favorite posts of theirs is Snow Fall: a coverage of the 2012 Tunnel Creek avalanche. Its a wonderful mixture of storytelling, visualizations, and traditional journalistic interviews.

Go check it out first, I promise you won’t regret it. Just don’t forget to come back.

Unlike the Datablog however, the Times doesn’t collate their data viz content into a single page (IKR? Not even a tag!), so I often miss out on great content unless it hits viral.

(Before you suggest I subscribe to the Times, did you know they publish about 230 pieces of content daily? I’m not willing to sift through that!)

So I’m glad I didn’t miss out on this latest one: their coverage on How Uber Uses Psychological Tricks to Push Its Drivers’ Buttons.

nyt_uber
This is a serious journalism piece. Not a game. I think.

What’s to like:

  • Interactive simulations!
  • The feature viz is a throwback to the 8-bit games of the 80s–which is kind of meta, given the post talks about how Uber experimented with video game techniques to maximize profit.
  • Charts. Charts. Charts. And interactive ones at that.
  • A union of social science with data science. How exciting! I like how they incorporated psychological vocabulary into the piece (e.g. loss aversion, ludic loop, binge-watching, etc).
  • “Uber exists in a kind of legal and ethical purgatory.” Please excuse me while I writer-geek out over this analogy.

Its a pretty length piece which will take about half an hour to get through, but I argue its worth it.

.xlsx files are secretly compressed!

There. I spilled the not-so-big secret. Excel files from Excel 2007 and above (.xlsx) are automatically compressed. A feature which, in all my years of using Excel, I never knew about.

I once received a large excel file from finance for analysis. Normally I would convert said file to CSV (comma separated values) as the latter:

  1. …is just data, no formatting. Exactly what I need for a data extract and nothing more.
  2. …tends to be more malleable across multiple applications.
  3. …and because of #s 1 and 2, tends to have a smaller file size.

So imagine my surprise when, upon converting to CSV, my 29 MB file ballooned to 115 MB.

Whuuuutttt???

Usually it’s the other way around. With all the formatting and formulas removed, the file size usually shrinks.

But apparently this is no longer the case when you have a lot of data. Once you go over a certain point, the amount of data you use matters more than the formatting.

Fortunately .xlsx is compatible with Power BI, which is where I was going to plug the data into anyway. I let the file type stay as is.

Makes for a convincing argument for the utilizing the Microsoft suite, eh?

(And in case your answer is no, let me argue that even technology research group Gartner agrees with me by crowning Microsoft king in business intelligence and analytics platforms.)

The best path to data science starts with the problem.

In the third grade, my science teacher sent shockwaves when she failed the final projects of more than half the class (thankfully I was in the minority).

This is it??? This is all you have?!

You can do better than this. These are too easy.

Give me something that’s actually worth… something!

Let me remind you: WE WERE THIRD-GRADERS. We were little brats who had never been told we sucked, much less failed.

Stricken by this failure, one classmate approached me after class to ask for advice. He had always been in the top 10 of the class. This must have devastated him.

Too bad I was never good at consoling, even as a kid. So instead I told him a story.

Of how I was playing outdoors the day before and was bothered by mosquitoes. Of how, try as I might, I couldn’t find where my mom hid the insect spray.

So I just used the first thing I found in the kitchen: Maggi savor.

(For those outside the Philippines, maggi savor is a blend of liquid seasoning, something like soy sauce but with garlic and lime.)

And to my surprise it worked. Not as effectively as insect spray, but the mosquitoes no longer buzzed as actively as before.

You can guess what happened next: Classmate wins title of “Best Project” for his study on The feasibility of soy sauce as a mosquito repellent alternative. I was… well, I passed so all was well.

 

Why am I sharing this story?

Because to me, my experiment had been nothing more but a curious solution to play outdoors.

But to my friend, and to my science teacher, it was a problem worth solving.

And as it turns out, that’s how to become a data scientist.

 

 

One of the most popular posts I’ve written on this blog is Getting started with Data Science, for the complete beginner. Its also one of my first posts.

Since then, many articles on the same topic have come up. But of note is this one published in Forbes  (originally from Quora). It answers the question, “What’s the best path to becoming a data scientist?”

  1. Pick a topic you’re passionate or curious about.
  2. Write the tweet first.
  3. Do the work.
  4. Communicate.

 

Where I said have a personal project, the writer took it to the next level by recommending to have a public portfolio:

I recommend building up a public portfolio of simple but interesting projects. You will learn everything you need in the process, perhaps even using all the resources above.

Makes sense right? More and more we’re judged by what we can do, no longer by the credentials we have. Artists, architects, and now programmers and developers… more and more jobs require having a portfolio.

 

What I haven’t considered is to write the tweet first.

Is the project even worth pursuing?

It sounds obvious, but people are eager to jump into a random tutorial or class to feel productive and soon sink months into a project that is going nowhere.

Ouch. I think she’s talking about me.

She’s got a good point though.

 

So. I now know I have to revisit my projects and write their tweets… but how do I talk about that portfolio?

If you’re like me and data science isn’t your day job, how do you talk about what are, essentially, your side projects?

It’s unfortunate that side projects are often overlooked by the people who aren’t actively working on them. Side projects can be immensely rewarding to talk about. They demonstrate a lot about how you work.

 

Thankfully LinkedIn has the ability to showcase projects. Its the perfect avenue to showcase your portfolio.

In person though, you may want to try this approach:

  1. Start with the problem
  2. Define your approach
  3. Share the challenges you faced
  4. End with the results
  5. Follow-up with what you would do differently

Again, it starts with the problem.

 

Like most things, the start is the most difficult step.

Finding the right problem is hard. But it might not need to be. It might already be there, right in front of you, just under your nose… and you just haven’t recognized it as a problem yet. Just like maggi savor.

In order to re-course my path to data science, the first thing I’m doing is to take a second look. But this time with a fresh set of eyes.

GameSpace: Visualizing videogame likeness.

When I first started getting into data science, one of the projects I had set out to do was to build a visual and interactive database/recommendation engine for games.

The idea was the system would build you a car based on your preferences (games you already love), and then drop you at some random point in a data landscape visualization of thousands of games. You drive around and explore this landscape: mountains indicate games closer to your preference, while valleys are games you’re likely to hate.

Well, researchers from UC Santa Cruz beat me to it. GameSpace now exists:

GameSpace is a visualization of the videogame medium as an explorable 3D space. Each of the nearly 16,000 stars in its galaxy represents an actual game that exists in the real world, and stars are placed in the space such that more similar games are nearer to one another.

–What is this? GameSpace FAQ.

They used outer space where I had imagined roads, but the basic idea is the same.

Still, its a lovely thing to look at. Reminds me lot of the loading screen for No Man’s Sky, in itself a randomly-generated space exploration game.

What’s the code under the hood? Find out with Gomix

There’s a new kid tool in the block: Gomix.

The premise is simple:

  1. Find a piece of code you’d like to tweak (or maybe just curious about).
  2. Tweak it.
  3. Save it so others can tweak it too.

That’s it. Easy.

It’s perfect for all those times you’ve come across a program and wondered, “How did they do that?”

Gomix lets you not only view the code underneath, but play around with it to create something new.

Its like a less structured version of Github, which has its own pros and cons… I think its more fun?

Gomix is by the creators of Trello, so we’re guaranteed similar levels of collaboration and intuitiveness.

More information available here.

 

P.S. I’ve had this post in my drafts for a while now and apparently forgot about it. Oops. Gomix is no longer as “new” as it was on the first draft.