Getting started with Machine Learning with U-Washington ML specialization in Coursera
Hi, I’m planning to make a 5/6 part series to reflect about my experience in University of Washington Machine Learning Specialization in Coursera while I take the five courses : Foundations, Regression, Classification, Clustering, Deep Learning and finish the capstone project. This is the first article in the series.
I’ll feature the first course Machine Learning Foundations : A Case Study Approach in this article and describe the philosophy behind the ‘case study approach’ with a brief overview of the tools used and reflect on what I’ve learnt.
I hope it will help people who want to use the same specialization.I’m also taking courses in Udacity, Edx and using other resources too, but experience those resources will be described in separate articles. I’m also planning to write a whole separate series on Udacity Machine Learning Nanodegree in recent future.
My background :
I’m an undergrad student, majoring in CS. I got interested in data science and later machine learning around 2015.
UWashington ML specialization is not my first confrontation with machine learning, rather I’ve been dabbling with data analysis/social network analysis for quite a while before I understood that machine learning might actually be more of my thing.
I started taking courses on Network Science first, by taking courses such as “Networked Life” from UPenn in Coursera, “Networks, Crowds and Markets” from Cornell in Edx and “Social Network Analysis” from UMichigan in Coursera.
Networked Life is a good introduction to the field of network science, the Cornell course is created by Jon Kleinberg,who wrote Algorithm Design, one of the best books on algorithms so far and I just loved his explanation style, Social Network Analysis used Gephi for analyzing networks.
But I’m from a CS background and I felt that I’d be much more productive if I could just get better in programming. I used R while taking T he Analytics Edge and this is the moment I got hooked by predictive modelling.
I ended up just doing minimum work to pass courses properly and focusing all my attention to finishing the R assignments using Rstudio, Caret and tm packages. I did my first Kaggle competition with this course, it was about predicting which NYTimes articles will be popular, but I just was incredibly bad. I knew I needed more help, but didn’t know where to start.
I felt the R ecosystem is not a good starting point for me because I’m from a programming background, not a statistical one. I needed a language that would help me to ‘look under the hood’, I wanted to see the implementation details and preferably implement algorithms from scratch myself. The language ended up being Python. I learnt python basics from CodeAcademy first , later I took Udacity courses to learn python properly, by taking Udacity’s Intro to Computer Science and Programming Foundations with Python . (object oriented programming).
The end result was that I knew python in a good-enough level, I started taking Udacity’s Data Science courses too but initially they seemed really hard. So I invested one month or so in Dataquest , to gain coding proficiency in the scientific python stack, numpy, scipy and scikit-learn mostly. Then I was ready to start Udacity’s courses again along with UWashington Machine Learning specialization.
I don’t think I’ve wasted time by taking courses on complex networks instead of machine learning directly at all. They were theoretical, intriguing and entertaining for me, all of the stuff which are ‘my style’. I’m glad that I’ve taken Analytics Edge too, that course is a blessing to man-kind so far. But now I’ll stick to python ecosystem for a while because I think the documentation is ‘exceptional’ here and it’d be more helpful for me as a programmer.
Case study approach philosophy and approximate timeline :
UWashington specialization starts off the specialization with teaching the basics of Graphlab along with 5 case studies which are :
- Regression : Predicting housing prices.
- Classification : Predicting Sentiment from Amazon product reviews
- Clustering : Clustering similar Wikipedia documents together
- Recommendation : Recommending songs from music data set
- Deep Learning : Image classification and retrieval with an already trained model
The course took me around 10 days to finish it, so the first one is really easy. It’s definitely way easier compared to Udacity’s “Intro to Machine Learning” which is significantly harder with 10/11 projects with a capstone for the nanodegree. For a person without my background, starting from scratch, I’d assume the six week guideline should be enough, provided they know programming in python.
The idea is to go down deeper with the same case studies but add algorithm implementation, rigorous theory in the later courses. I believe as I take both Udacity Courses and these one’s I’d be able to provide a good comparison point after I finish.
Tools used :
The specialization uses Graphlab with SFrame from Dato for the first course, however the usual tools such as Scikit-learn, pandas can also be used for the later courses.
I use Anaconda with Graphlab for finishing the assignments given the syntax of SFrame is similar to Pandas and I think it’d be easy to transition from SFrame to Pandas for other courses given I’ve already used Pandas for other easier projects. Graphlab is free for academic uses.
While I dislike the idea of not using Scikit-Learn given Scikit-learn is just open source, I really loved using SFrame’s because I could deal with bigger data sets without switching from my laptop to AWS. And well, I’m using Jupyter Notebook’s in both cases, it’s almost same.
I’ve the code up on my Github for now, but I don’t know the code sharing related rules still so I’m not linking it here. I’ve taken many courses without getting any certificate or grading so far, specially because I didn’t get other people like to see certificates.
I believe the power of MOOC’s lie in the fact that people are learning for the sake of learning i.e using intrinsic motivation instead of extrinsic motivation like certificates, but this is not how other people think in general society still. Also, without certification, it’s often easier for people to claim knowledge for stuff they don’t know, so there’s that argument.
I was genuinely surprised after getting likes on my Facebook posts that featured certificates given, logically, my knowledge gained has not changed as long as I just finish things and what I really want to do is to just get evaluated by other people, but oh well, here it goes :
I’m taking the second course on Regression right now, at least until I get started with the Udacity Machine Learning Nanodegree(hopefully), however there will be 6 part of this series until I reach the capstone project. I plan to make the next few articles of this series as theoretical as possible.