Microsoft DAT208x: Introduction to Python for Data Science, a review

In my quest to complete the Microsoft Professional Program for Data Science, I took their course Introduction to Python for Data Science earlier this month to disappointing results.

It could be that I had very different expectations, or that I already have too much background in Python for another introductory course, but I wasn’t impressed and I’m loath to pay for the verified certificate.

This felt more like an overview than a proper introduction. If this was a university, this would have been the first day when the instructor gives out the syllabus and walks through the course expectations.

Would I discourage you from taking the course? Yes actually.

(To follow my progress on the program, check out the Microsoft Professional Program tag)

 

The Structure

DAT208x claims to “cover Python basics and prepare you to undertake data analysis using Python”. Similar to the Microsoft courses that come before it, it is a self-paced course comprised of video lectures and lab exercises.

The modules are as follows:

  1. Python Basics
  2. Lists
  3. Functions and Packages
  4. Numpy
  5. Plotting with Matplotlib
  6. Control Flow and Pandas

This course is brought to you by a partnership between Microsoft and Data Camp, the latter an online Data Science school similar to DataQuest. In an old post I mentioned my apprehension with Data Camp as I’ve heard they favor R over Python, but I decided to give them the benefit of the doubt and give their Python course a try.

Its due to this partnership that most of the lab activities are outside of edX. i.e., we’re redirected to DataCamp’s interface for the lab exercises.

These exercises are the meat of the course. If you’ve tried DataQuest before then the DataCamp interface should be familiar:

Instructions are to the left, interactive Python shell to the right. After submitting your answer DataCamp verifies if your code is correct.

Unlike other Microsoft courses I’ve tried, this one has a final exam. In this exam you are given 4 hours to answer 50 questions: a mixture of knowledge checks, pseudo coding, and actual coding.

Considering the quizzes, exercises, and final exam, you need to score at least 70% to pass the course. Pretty easy considering 40% is just course surveys.

 

Continue reading “Microsoft DAT208x: Introduction to Python for Data Science, a review”

Udacity CS101: Intro to Computer Science, a review

I’ve been trying to learn how to code in Python for a while now. Of all the beginner resources I’ve tried, Udacity’s Intro to Computer Science (UD CS101) has been my favorite.

To clarify: I’m not learning Python with the intention of becoming a software developer. Rather, I like analyzing data, and I hear Python can help with that. R too, but Python is 1: recommended for beginners, and 2: has more applications outside of big data.

I do have some programming experience, though never anything formal, never to this depth, and never in Python.

 

THE STRUCTURE

UD CS101’s premise is for you to create “The Next Google” by teaching you how to build your own search engine.

The self-paced course is broken down into 7 modules*. Each module introduces a new concept to help improve on your search engine.

Each module contains:

  • Videos. Here the instructor explains the theory behind the concepts and demonstrates how to use them on the search engine.
  • Q&As. These help nail down the concepts. These aren’t too difficult and are usually similar to the demonstrations.
  • Problem sets. These are machine problems that build on the concepts you’ve learned so far and are more challenging than the Q&As.

At the end of the course you would have built a search engine with a similar algorithm to AltaVista–what was once the #1 search engine in the 90s before Google took over.

For your class project you then build a mini social network based on the concepts you learned from the course.

*As of writing Udacity has revamped their classrooms so this modular approach may no longer apply.

 

Continue reading “Udacity CS101: Intro to Computer Science, a review”

The best path to data science starts with the problem.

In the third grade, my science teacher sent shockwaves when she failed the final projects of more than half the class (thankfully I was in the minority).

This is it??? This is all you have?!

You can do better than this. These are too easy.

Give me something that’s actually worth… something!

Let me remind you: WE WERE THIRD-GRADERS. We were little brats who had never been told we sucked, much less failed.

Stricken by this failure, one classmate approached me after class to ask for advice. He had always been in the top 10 of the class. This must have devastated him.

Too bad I was never good at consoling, even as a kid. So instead I told him a story.

Of how I was playing outdoors the day before and was bothered by mosquitoes. Of how, try as I might, I couldn’t find where my mom hid the insect spray.

So I just used the first thing I found in the kitchen: Maggi savor.

(For those outside the Philippines, maggi savor is a blend of liquid seasoning, something like soy sauce but with garlic and lime.)

And to my surprise it worked. Not as effectively as insect spray, but the mosquitoes no longer buzzed as actively as before.

You can guess what happened next: Classmate wins title of “Best Project” for his study on The feasibility of soy sauce as a mosquito repellent alternative. I was… well, I passed so all was well.

 

Why am I sharing this story?

Because to me, my experiment had been nothing more but a curious solution to play outdoors.

But to my friend, and to my science teacher, it was a problem worth solving.

And as it turns out, that’s how to become a data scientist.

 

 

One of the most popular posts I’ve written on this blog is Getting started with Data Science, for the complete beginner. Its also one of my first posts.

Since then, many articles on the same topic have come up. But of note is this one published in Forbes  (originally from Quora). It answers the question, “What’s the best path to becoming a data scientist?”

  1. Pick a topic you’re passionate or curious about.
  2. Write the tweet first.
  3. Do the work.
  4. Communicate.

 

Where I said have a personal project, the writer took it to the next level by recommending to have a public portfolio:

I recommend building up a public portfolio of simple but interesting projects. You will learn everything you need in the process, perhaps even using all the resources above.

Makes sense right? More and more we’re judged by what we can do, no longer by the credentials we have. Artists, architects, and now programmers and developers… more and more jobs require having a portfolio.

 

What I haven’t considered is to write the tweet first.

Is the project even worth pursuing?

It sounds obvious, but people are eager to jump into a random tutorial or class to feel productive and soon sink months into a project that is going nowhere.

Ouch. I think she’s talking about me.

She’s got a good point though.

 

So. I now know I have to revisit my projects and write their tweets… but how do I talk about that portfolio?

If you’re like me and data science isn’t your day job, how do you talk about what are, essentially, your side projects?

It’s unfortunate that side projects are often overlooked by the people who aren’t actively working on them. Side projects can be immensely rewarding to talk about. They demonstrate a lot about how you work.

 

Thankfully LinkedIn has the ability to showcase projects. Its the perfect avenue to showcase your portfolio.

In person though, you may want to try this approach:

  1. Start with the problem
  2. Define your approach
  3. Share the challenges you faced
  4. End with the results
  5. Follow-up with what you would do differently

Again, it starts with the problem.

 

Like most things, the start is the most difficult step.

Finding the right problem is hard. But it might not need to be. It might already be there, right in front of you, just under your nose… and you just haven’t recognized it as a problem yet. Just like maggi savor.

In order to re-course my path to data science, the first thing I’m doing is to take a second look. But this time with a fresh set of eyes.

Project Focus (formerly Project 2017): Update #2

A quick update on the Project Focus series, aka my resolution to increase awesomeness by harnessing the power of focus. Specifically, by applying Agile project management methods to my life.


Sprint 1: 2016 Clean-up.

The first sprint, which lasted for the first few weeks of January, was on cleaning up leftover 2016 tasks. That went extremely well.

I was forced to file for a lost passport. I finally stopped fooling myself that I’d simply misplaced it, and that it would eventually show up. No. Time to take action.

I also showed up for a medical appointment two years too late. There weren’t any adverse findings, but I wouldn’t risk it next time.

Among other trivial things. I cleaned them all up, and I’m rather proud of myself for doing so.


Sprint 2: Choose and complete a course.

Next sprint was to focus on my data analysis studies.

In my last update I talked about jumping from one MOOC to another trying to find the best fit for me.

Well, I told myself to stop jumping. I should at least finish one course first. Right now I’m almost done with Udacity’s CS 101 class.

I also started clocking my study time with Toggl to gauge if I was on-track with their estimated completion dates.

Turns out I am, but more importantly tracking the time made me realize that:

  1. I’m only actively studying 40-60% of the time.
  2. I need more than an hour to get into state #1.

I should really learn to focus.

As part of that effort, I’ve *gulp* restricted my book budget.

Normally I’d allow myself to purchase one book a month. Now its one book per course completed. Every day I look at my To Be Read pile and my heart aches a bit.


Sprint 3: Study next course? Or focus on focus?

With sprint 2 coming to a close I’m already considering what’s next.

I’m choosing between:

  1. To proceed with the next course on Statistics, or
  2. To work on actively improving my focus.

For the first option I plan to follow along Udacity’s Data Analyst path and thus take Intro to Statistics next.

Alternatively, I could segue into option 2; a long-term investment. I plan to either enroll in a focus course, or maybe just read some books on the subject (such as Cal Newport’s Deep Work or David Levitin’s The Organized Mind). Maybe I could do both.

Dear reader, which of the two sprint options should I go with?


Overall:

Thinking of my life as a series of sprints with constant deadlines has forced me to realize how limited and valuable time really is. I have to do what I can do today, because tomorrow will be another sprint.

That isn’t to say I don’t slack. I have to confess, I spent the better part of last weekend just completing the heck out of FFXV sidequests (P.S. I have gaming OCD and must complete all possible sidequests before moving forward with the main story).

BUT, to my defense, in order to be able to do that I invested extra hours studying earlier in the week to make up for it so… I guess its not too bad?

What I’d like to improve on is…

1: My focus, so I can make better use of the time I allot to studying. And

2: Keeping shorter sprints. Sprints are normally around 2-4 weeks, but right now I’m averaging 4-6. Not good.

I also have to wonder if this sprint style is costing me my health.

I’ve been feeling exhausted more often since the start of the year, but I can’t objectively say if the cause is that feeling of stress induced by the constantly looming sprint deadline.

On the upside, while my physical health may have deprecated, my brain is now performing better than ever. I find I’m able to give more valuable insights and opinions now, thanks to my well-curated books and all that self-studying.

This brings me back to a conversation I once had:

Friend: Let’s hit the gym!
Me: No thanks. I get enough exercise.
Friend: Really? How?
Me: My brain. It already has a six-pack.

Danna on Data

It’s been a while since I’ve talked about my data analysis self-study.

I’ve been trying this and that, but haven’t felt anything was worth writing about. I mean, who would want to know that I tried something and failed, right?

Oh wait. Me. I would want to know.

When I’m about to try something new, like skincare or a restaurant, I look up blogs for reviews. I try to see if I can relate to the blogger and put myself in their shoes–Would I have failed as well?

It saves me a lot of effort because someone else has already gone through the experience for me.

That’s why I’m writing about all my data science-related updates so far, incomplete and disorganized as they are. Maybe it’ll help.

Continue reading “Danna on Data”