Galvanize

Post Galvanize

After completing the Galvanize Data Science Immersive (DSI) program, I moved to Chicago to live with my girlfriend. I had two goals: 1) launch PinkSlipper, my capstone project, as a full fledged business and 2) find a full-time job. In this post, I’ll describe my job search as well as explicitly state the offers I received.

Before I reveal the results, I must point out that I had a very strong background coming into Galvanize:

  • Strong academic record with PhD and 15+ authored papers
  • 2.5 years of work experience inventing and implementing billion-dollar patterning techniques for manufacturing computer chips at Intel
  • Strong math, stats, analytics, and programming background prior to Galvanize

I went about my job search on four vectors:

  1. Going to data science meetups and finding about opportunities through other data scientists
  2. Responding to recruiters who messaged me on Linked-In
  3. Actively connecting to people at organizations I was interested in
  4. Following up on leads from Galvanize

Of the four listed above, most my success came from proactively connecting with companies I wanted to work for as well as leads from Galvanize. To connect to the right person, I would ask strangers at networking events if they knew anyone at Company X. I would describe my background, mention I was interested in that company, and ask if they could put me in touch. This was much more successful than I ever thought.

Once I made contact, I initiated the interview process which consisted of:

  • Telephone contact / initial screen
  • Technical Screen
  • Takehome exam
  • On-site interview

The initial telephone screen was to gauge a candidate’s fit. I did not mold my thoughts based on the company I chatted with. I would simply state I have a strong background in stats, math, and programming with recent experience in machine learning and that my core strength was in experimentation, A/B testing, and simply making decisions with data to grow a business or product. If a recruiter mentioned they were looking for a data engineer, I would say “I have limited experience in that area, and cannot supply you the full value that a trained data engineer could. However, I suggest you can contact X for a good data engineer”.

During technical screens, I was asked to talk about a personal project, give a suggestion about how to complete a certain task, define various terms, or complete online programming challenges. All of these were quite easy given both my work background as well as the preparation from the Galvanize program.

I completed 5 takehome exams during my interview process. Every one of them led to a final round. There is zero chance I could ever have completed these takehomes without entering the Galvanize program, and I was surprised to find out from the companies how well I did on them. The takehomes often asked some stats questions, dataframe manipulation tasks, SQL, and obviously a modeling problem (most often linear or logistic regression worked quite well though I would often show-off with more complex models to demonstrate my abilities).

Finally, in my on-site interviews, I was mainly asked about prior experiences and to see if I was a good culture fit. A number of companies asked basic analytic problems which were much easier than I prepared for coming out of Galvanize. I aced every coding problem I was given and thought these interviews were much easier than I expected. On a side note, I was interviewing only in Chicago which I believe is a much less rigorous process than in SF.

My job search came in two phases. The first phase consisted of two companies I was interested in. I completed most the interview process while in the Galvanize DSI program. I did final rounds at both when I got back to Chicago and obtained an offer from one. With no other offers, and the company unwilling to negotiate, I rejected the offer as it was well below what I could accept.

I then initiated a second phase which consisted of building a job opportunity pipeline with as many interesting opportunities I could find. I used prior Galvanize Alum Greg Kamradt's (now data scientist at Salesforce) Interview tips and tracker at Lessons Learned Data Science Interviews which has a fantastic lecture here:

Before I run through the details of the process and offers, here are some stats regarding my interview funnel:

  • 18 companies contacted
  • 13 phone screens
  • 5 takehomes (not all companies required this though)
  • 7 on-sites
  • 5 offers
  • Total time / effort – 2 months

Below are the offers I received. I have masked the company names, but supplied details many of you will be interested in. Note that the 190k base was the highest offer ever obtained by a Galvanize DSI fellow. The first three offers were more inline with other members of my cohort who had strong backgrounds and work experience.

Title

Base

Bonus

Estimated Total Value of Offer

Company A

Data Scientist

110k

None

110k

Company B

Data Scientist

125k

None

135k

Company C

Principal Data Scientist

120k

10k

150k

Company D

Senior Data Scientist

155k

20k

180k

Company E

Principal Data Scientist

190k

40k

230k

Making the decision actually became one of the most challenging parts of this process for me. I chatted with Katie Kent (Director of Outcomes at Galvanize) for advice. By the way, she is awesome and one of the best reasons to attend Galvanize over other data science programs. She suggested 5 vectors that are most critical for success in a role:

  1. Opportunity for growth
  2. People that I work with
  3. The person I report to
  4. Day to day work
  5. Exit Opportunity

For each company, I rated each vector from 1 to 5 and included an additional vector for compensation. I had a threshold compensation which I rejected any offer that was below, and my final decision was based on the sum of the various ratings. In the end, I decided to work at Trunk Club. I saw a huge opportunity to contribute to the data science effort and the group was small so I could own full implementations of improved analytic algorithms. Culture and fit was amazing and I’m looking forward to beginning!

Week 8 - Galvanize Data Science Bootcamp

Week 8 is the last week of formalized curricula at Galvanize. It covers visualization, web app development via Flask and Bootstrap, and a two-day challenge to build a full end-to-end data pipeline for a predicting fraud on EventBrite's dataset.

We started the week by extending our visualization skills with Bokeh. Bokeh allows for interactive visualization within modern web browsers utilizing graphics built on d3.js. The graphics are sick. They are simple to build, and remove the messy javascript requirements needed to work directly with d3.js. The afternoon was flexible: explore a dataset, and extract insight through plotting. I downloaded and plotted crime data in the Seattle area. The result is below and shows a heat map according to crime rates for various districts of the city. Working with geographic data is difficult. Shapely was required to work with the shape files for districting, while descartes was required to patch the polygons. Furthermore, these crime rates need to be normalized by the populations within each district which may or may not have the same boundaries as the census data.

Seattle_crimes.png

 

Next we learned Flask and Bootstrap which are tools to create nice looking web apps or dashboards with limited web development / javascript knowledge. My guess is that this is in the curricula specifically so we can make flashy apps for our projects. Either way, pretty quick to learn and simple to use.

On to the big ticket item for the week. A two day challenge to build a full end-to-end pipeline for predicting fraud based on EventBrite's dataset. The deliverable was a webapp dashboard to track and alert fraudulent events in real-time (for real-time analysis, an instructor setup a server to ping events every few seconds). We worked in teams of four and began executing. The first day was mainly comprised of feature engineering, and modeling. However, in making an application like this, we were very aware that our models, vectorizers, and pipeline must scale and align - a difficult task working on a team of four. However, after careful planning and a lot of whiteboarding, we had a working vectorizer and model for predicting low, medium, and high likelihood of fraud based on the following probability output of the model:

The second day was focused on building a web app and dashboard along with backend postgres database to store risk predictions. We connected to streaming events from the server using requests module, vectorized and predicted the risk for each event. Risk calculations and event details were then stored in a postgres database and accessed via a Flask webapp dashboard which displayed events that were likely to be fraudulent.

Building this full end to end product was a great experience on several fronts:

  1. Working with a team to build a scalable end to end product
  2. Building a working dashboard that could be used by executives to make decisions
  3. Attacking all parts of the data pipeline (exploratory analysis, feature engineering, model building, webapp development, backend database engineering) to create a single data product

Next week begins 3 weeks of our personal capstone projects. I will simply write a single post for my project once complete. Can't wait!

Week 7 - Galvanize Data Science Bootcamp

After a 1 week break, we are now back in full swing. Unfortunately we lost 2 cohort members during the break as they interviewed at companies and accepted offers. Really proud of them and glad to see the tangible results that the Galvanize Data Science program can lead to.

The focus of Week 7 was machine learning at scale. Days this week were focused on :

  • Threading and multiprocessing
  • MapReduce
  • Apache Spark
  • Apache Spark on AWS

I was very excited to get my hands on MapReduce and Apache Spark. It is such a hot skillset to have, and most big data employers want data scientists who have this experience.

The MapReduce framework came out of Google in 2004 and is based on the map and reduce functions commonly utilized in functional programming. The following example converts each word in a string to an acronym by:

  1. Mapping a lambda function which extracts the first letter, and makes it uppercase
  2. Reducing a list of uppercase letters into one string by lambda function

  def acronym(s):
	chars_list = map(lambda x: x[0].upper(), s.split())
	return reduce(lambda x, y: x+y, chars_list)

Now imagine the case with billions of words in a string. It would take too long to do this on one processor so it would need to be parallelized. Each word would be shipped to a different processor within a managed cluster for the map function and a reduce function would assemble back all the capitalized first letters into a single string. This is a great way to process very large datasets, but is limited in the functions you can apply.

Enter Apache Spark which is a framework that extends MapReduce by adding functionality for machine learning algorithms, graph database support, and additional SQL-like queries.

Initially we built clusters on our individual quad-core macbooks utilizing tmux to multiplex master and slave nodes. We developed queries for Spark SQL and joined tables / pulled data similar to PostgreSQL / SQLite. We then fired up 20 cores on AWS and did similar queries. Apache Spark on AWS is a game changer for machine learning on large datasets and hence it is such a highly sought skill.

My takeaways for this week were the impressive benefits to parallelization. However the cost is pretty great. First, it takes a lot of technical knowledge to set up AWS and get all the modules and environments working properly (I was actually surprised how difficult this was given that AWS and Apache Spark have made big data analytics accessible for all). The other expense is the shear abstraction in understanding errors. Running split packets of data (RDDs) on multiple cores, on multiple executors, controlled by a driver with which you interact makes it near impossible to debug. Simple advice: only use Apache Spark on AWS if your data is too big to analyze on your macbook.

Week 6 - Galvanize Data Science Bootcamp

Week 6 was our last week of the very intense core curriculum. As I write this post, we are entering our break week (an entire week off to review previous material, flush out ideas for capstone projects, and prepare for recruiting. We then have two more weeks of specialized topics and a couple weeks for capstone projects.

This week’s material was fun and lighter than Week 5 -- thank the lord. The focus was on various techniques of recommendation systems using the graphlab module for python which I thought is nicely written and a very fast numerical method for calculating recommendations. We also did some more work on NLP using non-negative matrix factorization – a subset of U-V decomposition that allows more intuitive understanding of the latent features in unsupervised learning. In our sprints, we used the same NYT data as in Week 5, and were able to discover much better groupings of articles with NMF modeling.

We also completed two group challenges this week. In my opinion, these are the most fun days in the program. The idea is to split into teams of 4 and get the best scores for regression / classification problems. Our challenges this week were:

  1. Churn prediction in Ride Share data – (super exciting dataset! Spoiler Alert: it wasn’t from Flywheel J). Idea was to predict whether a user would churn or not.
  2. Recommender model to give best suggestions to new users on new movies which were not used in training. We were given user info (Age, Sex, Location, etc) and movie info (title, year, genre, etc) to allow for learning based on their profiles and movietypes.

Churn prediction challenge:

Whoa this was fun. Our data set was much cleaner than our previous Kaggle Blue Book Challenge, so we got to spend more time modeling, and less time cleaning – yay! My group focused on modifying the metric used to rate our our model. In the problem statement, it was ok to have false positives (people who have not churned but we predict they have), but very important to limit false negatives (people who have churned but we don’t predict them to). By optimizing our models to these constraints we found the best classifiers to be random forests and adaboost ensemble models. Unfortunately, it was hard to compare our results against other groups because we optimized to a different metric, though I expect our approach to be inline with expectations in business.

Movie to user recommendation challenge:

This challenge was pretty easy using graphlab, but became very frustrating to debug errors as graphlab does not notifiy you if datatypes are incorrect for modeling -> two hours later, we had something working J. We were able to draw up competitive accuracy scores for rating new movies and new users based on their associated feature matrices. One key to our good ratings was the fact that we modeled the test split of our cross-validated training data to represent mostly unknown movies and unknown users which is what we were trying to model in our real test data. This enabled our model iterations to more accurately reflect real improvements in model scores on real test data.

Finally, the week ended with a resume workshop and branding exercises. Building our own brands and selling our skills is key to going out and getting the job you want. Now its time to relax and review during break week!

Week 5 – Galvanize Data Science Bootcamp

This week’s focus was Natural Language Processing and Neural Networks. These are big topics, so hours were the toughest we’ve seen to date.

Beginning with neural networks, we built models to differentiate cats from dogs, and forests from beaches. Image featurization was critical: colors mattered much more for beaches vs forests while edge detection was much better to differentiate cats vs dogs. The instructors put together some benchmarks for their prediction accuracy with non-neural net models and asked us to beat their scores. Cool challenge and I think we were all able to do so which really emphasized the power of these models for image analysis.

We then moved into natural language processing, but to get the full learning, it required some serious improvements on our data pipelines and infrastructure knowledge. We began by teaching ourselves mongodb as a non-schema database to hold any data we scrape from the web. We then learned webscraping, and by the end of the day had pulled more than 1,000 NYTimes articles into our own database.

This accomplishment was a holy shit moment for me. I can honestly say it the most exciting day of the course thus far, and I most certainly want to get better at scraping through my capstone.

Now that we had a solid text dataset, we learned unsupervised classification techniques like kMeans along with required featurization techniques which are difficult and complicated in text analysis. Thus we needed techniques to simplify these absurdly large feature sized datasets.

Enter dimensionality reduction via Principal Component Analysis. This is an eigenvector problem to rotate your data matrix onto a new axis with reduced number of features that are linearly independent, yet describe often >90% of the data. This can be very important to model your data where memory issues may be a concern, though interpretability is often limited because meaning is extracted from your rotated matrix, and not the original dataset.

We ended the week by discussing project ideas. We walked through a few examples from previous cohorts and people are pretty amazing. Check out these incredible and inspiring ideas from previous Galvanize fellows:

Week 4 – Galvanize Data Science Bootcamp

This week, we covered non-parametric supervised (and kNN unsupervised) machine learning with the following methods:

  • SVMs
  • K Nearest Neighbors
  • Decision Trees
  • Bagging
  • Random Forests
  • Boosting (Adaboost and Gradient boosting)

Mathematically, SVMs are rather beautiful. They follow the same mathematical rigor of linear / logistic regression by numerically solving a cost function, yet allow separation of data by non-linear means using a kernel trick to extrapolate data into a new dimension.

Though decision trees are not as elegant mathematically, in practice they simply work and are conceptually easy to understand. Bootstrap sampling methods such as bagging and random forests are merely extensions of the tree concept allowing you create many overfit models which are averaged, resulting in a single model with good accuracy and low variance. Boosting achieved at a similar result but using the opposite approach: first build weak trees with poor accuracy, but low variance, and slowly add more trees based on the weighted missclassifications or residuals of the previous tree’s decisions. This can take time to build, but overtime, allows very good accuracy while keeping variance low from the start.

This week’s exercises were tough. They ran late, and additional reading at home kept the day quite busy. Aside from the new material we learned above, we also began learning new methods for optimizing our models as we develop them using scikit learn’s pipeline and gridsearchCV classes. Galvanize does a very good job of focusing on the core concepts of a new algorithm, while introducing a new best practice for building models in each of the exercises.

The week ended by splitting into teams of four and competing in Kaggle’s Blue Book for Bulldozers Challenge. Galvanize setup a scoring system so teams could submit models on test data and automatically post scores to our cohort’s slack channel – the competition was on! With only four hours, our team attacked this competition structurally by splitting into two teams of two: one team focused on building the pipeline for testing, while the other team focused on cleaning up the data. After 3 hours, our model pipeline was built and we had clean data to start sending in. We had an hour to try linear regression, random forests, and gradient boosting. We optimized model parameters using grid search CV and won the competition among our cohort using random forests method. After this competition we were surprised to find out we were within the top 100 models submitted to Kaggle’s competition. Reflecting on this experience, it seemed clear that our strategy for splitting the work yielded the best efficiency, allowing us to try different models and allow time for optimization.

Week 3 – Galvanize Data Science Bootcamp

This week was all about basic modeling. We began by building skills with linear algebra and basic exploratory data analysis.  In my past, I had a lot of experience with linear algebra through quantum mechanic courses and finite element modeling, so I blew through these assignments quickly. We also moved into exploratory data analysis – basically plotting to drive the design of modeling. Unfortunately, I think Galvanize could benefit from more focus on this area. However, I feel my EDA experience from Intel is far superior to what I have thus far learned here, so I expect to rely heavily on these past experiences for my project and future assignments.

Anyone can run machine learning models, but a good data scientist can run one on clean data, yielding actionable insights.

As we moved into linear and logistic regression, Galvanize has a very orderly structure for teaching this material. We began by building our own models through object oriented design, and then comparing the results against scikit learn’s models. This approach ensures under the hood understanding whilst introducing the tools of scikit learn which will be the workhorse of our future jobs.

We moved into penalization terms or ‘regularization’ inclusion in models to prevent overfitting. Our assignments demonstrated clear improvements in prediction for cross-validated models on test data with regularization included. Success!

Before we ended the week, we learned gradient descent – a pretty simplistic method for determining minima/maxima in a function. When applied to a residual sum of squares cost function as in regression models, you are guaranteed to find the global minima/maxima – i.e. a converged solution to your optimization problem. I thought the videos from Andrew Ng’s coverage of gradient descent were particularly useful:

We closed the week with another two hour assessment. These are getting tougher and cover more material for the allotted time. Will need to review over the weekend and get pumped for next week's material on non-parametric models!

Week 2 - Galvanize Data Science Bootcamp

Week 2 just ended. Intensity is up. We are full throttle learning new concepts everyday and pushing the limits on what our the human brain can handle.

We began with probability and frequentist statistics. I would consider myself highly trained in this area after my Ph.D and significant training at Intel, so conceptually no issues. However, assignments were long with heavy coding requirements, so I was learning a lot.

We moved on to Bayesian statistics and I felt like my entire world flipped upside down. This is not a topic often taught outside of statistics or math departments at a university and it clearly should be. We began our Bayesian lessons by analyzing variations of webpages and their associated click rates. I really enjoyed the lessons on these methods, and was grateful that we now had enough plotting skills in Pandas and Matplotlib to visual the convergence of our statistical results.

Next we began the well documented ‘multi-arm bandit’ problem. Say you have 5 versions of the website and one of them is clearly best. Should you run an experiment until you are confident in a delta, and then select the best website as your new baseline? Or if you notice certain websites are much worse, can you update the sites your audience views so they are weighted in favor of the most effective sites? This concept effectively stems from methods used in Monte Carlo simulations in the 1970s where a Boltzmann method was utilized to minimize optimization in a local minima.

As much as I enjoyed learning these new Bayesian methods, I still need time to reflect on this very different approach to statistics. It seems clear that the instructors, and many data scientists are in favor of Bayesian statistics because of the simplicity with which you can interpret the results. However, I felt their neglect for frequentist approaches to data science were unwarranted and hyperbolic. Intel was extremely successful in chip manufacturing because of their frequentist approach to experimentation. Many other companies also utilize these methods to extend their competitive advantage. To simply say industry cannot interpret what a p-value means seems a bit ignorant to me, but I will move on.

I now feel amazing after our second week. I learned some very neat statistical methods, and the assignments hammered home the key learnings. Can't wait for the next week!

Week 1 – Galvanize Data Science Bootcamp

Finally! I have been preparing for this formal transition into data science for about one year. I am very excited to begin this journey. The first day, we met the members of cohort 9 and learned more about Galvanize. Our cohort is comprised of people are from all walks of life, joined together in their passion for analytics. An epidemiologist, database engineer, options trader, social worker, entrepreneur, economist, software developer, postdoc, and many others round out breadth our group.

Furthermore, we explored the space Galvanize has built. Incredible! Galvanize bought an entire 5 story building including the rooftop terrace. Several bootcamps are taught by Galvanize, while the rest of the space houses a number of the 180 startups affiliated with Galvanize.  Galvanize provides this space so that students in their programs are surrounded by smart people solving interesting problems in today’s world. The location is in the heart of SOMA, surrounded by billion dollar companies including linkedin, twitter, trulia, uber, salesforce and others. After moving to SF, this atmosphere is killer and is your pulse to make an impact in this world.

Rooftop.jpg

The week began with a two-hour test on python, numpy, pandas, SQL, and probability skills. It was tough, especially given the timeframe, but I got through them all and liked the intensity with which the program started.

The material for the 1st week covered, python, object-oriented programming, Pandas, and SQL. Most of the material I was already familiar with, but we did learn some very cool methods for connecting to PostgreSQL databases using the psycopg2 package. Wish I had known about this stuff when I was working at Intel.

First impression of the program was great. I was truly surprised by the quality of the atmosphere Galvanize has built in SOMA and the great people they have brought in to teach the material. Naively, I thought I would come to SF and learn the data science material. However, Galvanize is much more than the material. You are learning from the experiences of others in your cohort, as well as from the many entrepreneurs around the coworking space. What an awesome setup!

Week 0: Prework for Galvanize

Two weeks before classes began, an instructor connected with us to distribute a prework assignment on Galvanize's Github account. I was eager to jump in! I wanted to tackle this problem set quickly so I could better prepare and also properly set expectations for the course.

I had used Github before, but my experience was limited. Luckily, Galvnize supplied some logistical information about Github, and how they’d like the documents pushed back. So I logged in, forked the repo, and downloaded to my hard drive. Pretty easy.

First Impressions: 8 Chapters of prework, with a focus on Python, NumPy, Pandas, linear algebra, SQL, probability, statistics, html/css. Problem sets in each. I first began with the setup documentation. Here, I learned the development loop Galvanize abides by, allowing a tight development loop for agile coding. Turns out, I had not used either iTerm 2 or Sublime Text 2 in my prior development work, but they did have some amazing advantages:

  • Iterm 2:
    • Ability to search in the terminal
    • Easy copy / paste features
    • Ability to split windows
  • Sublime Text 2:
    • Multiple Selections
    • Batch Edits

Once setup, I began the Python module. I spent two days to complete the assignment and learned many pythonic methods of programming. It was tough, but eye opening. I now expected that the 12 week course would significantly improve my programming skills. By extrapolating the time it took to complete one module, I needed to increase my pace through the next few before moving to SF.

Numpy and Pandas were new for me. In the past, I had used SAS's JMP program for any sort of exploratory analysis or data rollup, so initially I struggled with the assignments. However, I powered through these module each in a day, and moved on. I was back on pace.

Statistics, Probability, SQL, and linear algebra are strengths. I began doing 2 modules a day as I was already well versed in this material, but I did learn a few new techniques to add to my toolbox. All the while, I was improving my development loop and debugging more efficiently.

After the final modules, I accomplished the prework in 8 days, allowing time to build on some weaknesses before getting heading to SF. Furthermore, I wanted to set up this blog. I figured it would not take more than a couple days to build a site in Wordpress. Long story short, I think Wordpress is quite nice, but I absolutely despised the theme I downloaded: Suarez by CMS Superheroes. It was so buggy and slow that I questioned the authors' ability to write good clean code. I eventually tried some other wordpress themes and they magically worked out of the box. In the end I lost 5 days of effort to build the site.

Overall, the prework was great. Assignments were well documented and Zipfian had code to verify our functions ran as expected. I learned a lot in a short amount of time, and was eager to get to SF, meet my cohort, and start this experience.