Week 3 – Galvanize Data Science Bootcamp

This week was all about basic modeling. We began by building skills with linear algebra and basic exploratory data analysis.  In my past, I had a lot of experience with linear algebra through quantum mechanic courses and finite element modeling, so I blew through these assignments quickly. We also moved into exploratory data analysis – basically plotting to drive the design of modeling. Unfortunately, I think Galvanize could benefit from more focus on this area. However, I feel my EDA experience from Intel is far superior to what I have thus far learned here, so I expect to rely heavily on these past experiences for my project and future assignments.

Anyone can run machine learning models, but a good data scientist can run one on clean data, yielding actionable insights.

As we moved into linear and logistic regression, Galvanize has a very orderly structure for teaching this material. We began by building our own models through object oriented design, and then comparing the results against scikit learn’s models. This approach ensures under the hood understanding whilst introducing the tools of scikit learn which will be the workhorse of our future jobs.

We moved into penalization terms or ‘regularization’ inclusion in models to prevent overfitting. Our assignments demonstrated clear improvements in prediction for cross-validated models on test data with regularization included. Success!

Before we ended the week, we learned gradient descent – a pretty simplistic method for determining minima/maxima in a function. When applied to a residual sum of squares cost function as in regression models, you are guaranteed to find the global minima/maxima – i.e. a converged solution to your optimization problem. I thought the videos from Andrew Ng’s coverage of gradient descent were particularly useful:

We closed the week with another two hour assessment. These are getting tougher and cover more material for the allotted time. Will need to review over the weekend and get pumped for next week's material on non-parametric models!