For the past several weeks, I had the amazing opportunity of working on developing Python and R packages to compare Scikit-learn and Caret Machine Learning models, respectively. I learnt a great deal with the help of my collaborators,  Jes Simkin and Birinder Singh, providing excellent brainstorming sessions and code reviews.

The packages are intended to:

Facilitate beautifully efficient comparisons of machine learning classifiers and regression models.

The R package, caretcompaR harnesses the power of caret, combining it with R dataframes for easy, breezy, and beautiful machine learning regressors and classifiers exploration. 

The package comprises of the three following functions:

split() splits the training input samples X, and target values y (class labels in classification, real numbers in regression) into train, test and validation sets according to specified proportions. Outputs four X dataframes and four y lists. One each for training, validation, test, and combined training and validation.

train_test_acc_time()compares different caret regressors or classifiers in terms of training and test accuracies, and the time it takes to fit and predict. The function inputs are list of models, input train samples Xtrain(input features), input test samples Xtest, target train values ytrain and target test values ytest (continuous or categorical). The function outputs a beautiful dataframe with training & test scores, model variance, and the time it takes to fit and predict using different models.

comparison_viz() visualizes the output of train_test_acc_time() for easy communication and interpretation. The user has the choice to visualize a comparison of accuracies or time. It takes in a dataframe with 7 attributes i.e. model name, training & test scores, model variance, and the time it takes to fit, predict and total time. Outputs a beautiful ggplot bar chart comparison of different models’ training and test scores or the time it takes to fit and predict.

Check out the Github page to explore the wonderful functions and follow the easy steps to start comparing your  caret models.

The Python package, SklearncomPYre harnesses the power of scikit-learn, combining it with pandas dataframes and matplotlib plots for easy, breezy, and beautiful machine learning exploration.

The package comprises of the three following functions:

split() splits the training input samples X, and target values y (class labels in classification, real numbers in regression) into train, test and validation sets according to specified proportions. Outputs four array like training, validation, test, and combined training and validation sets and four y arrays..

train_test_acc_time()compares different sklearn regressors or classifiers in terms of training and test accuracies, and the time it takes to fit and predict. The function inputs are dictionary of models, input train samples Xtrain(input features), input test samples Xtest, target train values ytrain and target test values ytest (continuous or categorical). The function outputs a beautiful dataframe with training & test scores, model variance, and the time it takes to fit and predict using different models.

comparison_viz() visualizes the output of train_test_acc_time() for easy communication and interpretation. The user has the choice to visualize a comparison of accuracies or time. It takes in a dataframe with 7 attributes i.e. model name, training & test scores, model variance, and the time it takes to fit, predict and total time. Outputs a beautiful matplotlib bar chart comparison of different models’ training and test scores or the time it takes to fit and predict.

Check out the Github page to explore the wonderful functions and follow the easy steps to start comparing your  scikit-learn models.