Dear Microsoft, I’m confused.

Last year I heard about your Professional Program for Data Science.

I’ve been following along, albeit slowly, as I’ve been supplementing your content with other MOOCs. But the point is I’ve been following along and still intend to.

Your content is good. Not the best, but good.

Here’s the thing though: Why suddenly announce Microsoft Advanced Analytics?

On the surface it looks all shiny and new with the focus on Cortana Intelligence and Machine Learning.

Looking under the hood though, I see the course catalog and certifications mirror those of the original program.

What gives?

Are these two different, or the same? Are they meant to be complementary? Where does one stop and the other begin?

SO. MANY. QUESTIONS.

Microsoft DAT208x: Introduction to Python for Data Science, a review

In my quest to complete the Microsoft Professional Program for Data Science, I took their course Introduction to Python for Data Science earlier this month to disappointing results.

It could be that I had very different expectations, or that I already have too much background in Python for another introductory course, but I wasn’t impressed and I’m loath to pay for the verified certificate.

This felt more like an overview than a proper introduction. If this was a university, this would have been the first day when the instructor gives out the syllabus and walks through the course expectations.

Would I discourage you from taking the course? Yes actually.

(To follow my progress on the program, check out the Microsoft Professional Program tag)

 

The Structure

DAT208x claims to “cover Python basics and prepare you to undertake data analysis using Python”. Similar to the Microsoft courses that come before it, it is a self-paced course comprised of video lectures and lab exercises.

The modules are as follows:

  1. Python Basics
  2. Lists
  3. Functions and Packages
  4. Numpy
  5. Plotting with Matplotlib
  6. Control Flow and Pandas

This course is brought to you by a partnership between Microsoft and Data Camp, the latter an online Data Science school similar to DataQuest. In an old post I mentioned my apprehension with Data Camp as I’ve heard they favor R over Python, but I decided to give them the benefit of the doubt and give their Python course a try.

Its due to this partnership that most of the lab activities are outside of edX. i.e., we’re redirected to DataCamp’s interface for the lab exercises.

These exercises are the meat of the course. If you’ve tried DataQuest before then the DataCamp interface should be familiar:

Instructions are to the left, interactive Python shell to the right. After submitting your answer DataCamp verifies if your code is correct.

Unlike other Microsoft courses I’ve tried, this one has a final exam. In this exam you are given 4 hours to answer 50 questions: a mixture of knowledge checks, pseudo coding, and actual coding.

Considering the quizzes, exercises, and final exam, you need to score at least 70% to pass the course. Pretty easy considering 40% is just course surveys.

 

Continue reading “Microsoft DAT208x: Introduction to Python for Data Science, a review”

Udacity CS101: Intro to Computer Science, a review

I’ve been trying to learn how to code in Python for a while now. Of all the beginner resources I’ve tried, Udacity’s Intro to Computer Science (UD CS101) has been my favorite.

To clarify: I’m not learning Python with the intention of becoming a software developer. Rather, I like analyzing data, and I hear Python can help with that. R too, but Python is 1: recommended for beginners, and 2: has more applications outside of big data.

I do have some programming experience, though never anything formal, never to this depth, and never in Python.

 

THE STRUCTURE

UD CS101’s premise is for you to create “The Next Google” by teaching you how to build your own search engine.

The self-paced course is broken down into 7 modules*. Each module introduces a new concept to help improve on your search engine.

Each module contains:

  • Videos. Here the instructor explains the theory behind the concepts and demonstrates how to use them on the search engine.
  • Q&As. These help nail down the concepts. These aren’t too difficult and are usually similar to the demonstrations.
  • Problem sets. These are machine problems that build on the concepts you’ve learned so far and are more challenging than the Q&As.

At the end of the course you would have built a search engine with a similar algorithm to AltaVista–what was once the #1 search engine in the 90s before Google took over.

For your class project you then build a mini social network based on the concepts you learned from the course.

*As of writing Udacity has revamped their classrooms so this modular approach may no longer apply.

 

Continue reading “Udacity CS101: Intro to Computer Science, a review”

The best path to data science starts with the problem.

In the third grade, my science teacher sent shockwaves when she failed the final projects of more than half the class (thankfully I was in the minority).

This is it??? This is all you have?!

You can do better than this. These are too easy.

Give me something that’s actually worth… something!

Let me remind you: WE WERE THIRD-GRADERS. We were little brats who had never been told we sucked, much less failed.

Stricken by this failure, one classmate approached me after class to ask for advice. He had always been in the top 10 of the class. This must have devastated him.

Too bad I was never good at consoling, even as a kid. So instead I told him a story.

Of how I was playing outdoors the day before and was bothered by mosquitoes. Of how, try as I might, I couldn’t find where my mom hid the insect spray.

So I just used the first thing I found in the kitchen: Maggi savor.

(For those outside the Philippines, maggi savor is a blend of liquid seasoning, something like soy sauce but with garlic and lime.)

And to my surprise it worked. Not as effectively as insect spray, but the mosquitoes no longer buzzed as actively as before.

You can guess what happened next: Classmate wins title of “Best Project” for his study on The feasibility of soy sauce as a mosquito repellent alternative. I was… well, I passed so all was well.

 

Why am I sharing this story?

Because to me, my experiment had been nothing more but a curious solution to play outdoors.

But to my friend, and to my science teacher, it was a problem worth solving.

And as it turns out, that’s how to become a data scientist.

 

 

One of the most popular posts I’ve written on this blog is Getting started with Data Science, for the complete beginner. Its also one of my first posts.

Since then, many articles on the same topic have come up. But of note is this one published in Forbes  (originally from Quora). It answers the question, “What’s the best path to becoming a data scientist?”

  1. Pick a topic you’re passionate or curious about.
  2. Write the tweet first.
  3. Do the work.
  4. Communicate.

 

Where I said have a personal project, the writer took it to the next level by recommending to have a public portfolio:

I recommend building up a public portfolio of simple but interesting projects. You will learn everything you need in the process, perhaps even using all the resources above.

Makes sense right? More and more we’re judged by what we can do, no longer by the credentials we have. Artists, architects, and now programmers and developers… more and more jobs require having a portfolio.

 

What I haven’t considered is to write the tweet first.

Is the project even worth pursuing?

It sounds obvious, but people are eager to jump into a random tutorial or class to feel productive and soon sink months into a project that is going nowhere.

Ouch. I think she’s talking about me.

She’s got a good point though.

 

So. I now know I have to revisit my projects and write their tweets… but how do I talk about that portfolio?

If you’re like me and data science isn’t your day job, how do you talk about what are, essentially, your side projects?

It’s unfortunate that side projects are often overlooked by the people who aren’t actively working on them. Side projects can be immensely rewarding to talk about. They demonstrate a lot about how you work.

 

Thankfully LinkedIn has the ability to showcase projects. Its the perfect avenue to showcase your portfolio.

In person though, you may want to try this approach:

  1. Start with the problem
  2. Define your approach
  3. Share the challenges you faced
  4. End with the results
  5. Follow-up with what you would do differently

Again, it starts with the problem.

 

Like most things, the start is the most difficult step.

Finding the right problem is hard. But it might not need to be. It might already be there, right in front of you, just under your nose… and you just haven’t recognized it as a problem yet. Just like maggi savor.

In order to re-course my path to data science, the first thing I’m doing is to take a second look. But this time with a fresh set of eyes.

Danna on Data

It’s been a while since I’ve talked about my data analysis self-study.

I’ve been trying this and that, but haven’t felt anything was worth writing about. I mean, who would want to know that I tried something and failed, right?

Oh wait. Me. I would want to know.

When I’m about to try something new, like skincare or a restaurant, I look up blogs for reviews. I try to see if I can relate to the blogger and put myself in their shoes–Would I have failed as well?

It saves me a lot of effort because someone else has already gone through the experience for me.

That’s why I’m writing about all my data science-related updates so far, incomplete and disorganized as they are. Maybe it’ll help.

Continue reading “Danna on Data”