Kaggle’s famous competition, Titanic: Machine Learning from disaster, had been on my radar for quite some time. A couple of weeks ago, we were introduced to Decision Tree Classification in our Supervised Learning course at the Master of Data Science program at UBC. I couldn’t wait to make my first submission to the Kaggle competition.
I spent a significant amount of time exploring, wrangling, organizing and filling missing data in the train and test CSV files. I choose to do this part in R, availing the resource Tidyverse library.
Then, I took the analysis over to Python where I used scikit-learn to evaluate the performance of Decision Tree and Random Forest Classifiers. I also incorporated 5-fold cross validation to find ideal hyperparameters. How accurately did I predict? Not as well as I would have hoped, but I sure learnt a ton in the process.
Check out the full analysis here on GitHub.