Automated Machine Learning with Hyperopt and Scikitlearn without Writing Python Code


The most challenging part of building supervised machine learning model is optimization for algorithm selection, feature selection and algorithm specific hyper parameter value selection that yields the best performing model. Undertaking such a task manually is not feasible, unless the model is very simple.

The purpose of Automated Machine Learning (AutoML) tools is to democratize Machine Learning by making this optimization process automated. In this post we will use one such autoML tool called Hyperopt along with Scikitlearn. and show how to choose the optimum Scikitlearn classification algorithm, feature subset and associated hyper parameters for the algorithm. The solution is available in my open source project avenir on github.

Automated Machine Learning

Like any other optimization problem, in automated Machine Learning the goal is to select a set of parameter values that will minimize cost or error. All the parameters constitute the search space for the optimization problem.

With Machine Learning, unlike many other optimization problems, the cost function is not mathematically expressible and computable. Given a set of parameter values you have to train a model and then find the generalization error with k cross fold validation or some other technique.

There are 3 areas in a Machine Learning pipeline that can be optimized with AutoML tools.

  • Feature selection
  • Algorithm selection
  • Hyper parameter selection for a given algorithm

Optimizing in all the 3 areas manually is time consuming and will incur prohibitively high computation cost. With hyper parameter tuning, even for moderately complex model there is a combinatorial explosion problem, when all possible combination of parameter values need to be considered.

For example, if there are 10 parameters and if each parameter has 3 different values, there will be more that 60000 possible combination of values to explore. With complex models, you may have hundreds of thousands of parameter value combinations to search.

The two common and popular tuning or optimization techniques are grid search and random search. As the the name suggests, grid search is a brute force approach and searches the whole search space exhaustively. In random search, parameter values to explore are selected randomly.

Bayesian Optimization and Hyperopt

None of the two above mentioned optimization techniques is particularly intelligent. What we need is a more intelligent optimization algorithm that will focus more on the promising areas of the parameter search space.

Bayesian optimization is one such algorithm. In Bayesian Optimization, a probability distribution model for the cost is built based on all the parameters. The distribution model is refined as a new point in the search space is explored and cost obtained. The next point to explore is obtained by sampling the probability distribution model. Hyperopt is based on Bayesian Optimization.

Here are some salient characteristics of Bayesian Optimization. It elaborates where Bayesian Optimization is most appropriate and how it works

  • Input space dimension should not be too large. Preferably, it should be less than 20
  • The objective or cost function should be continuous.
  • Cost is expensive to evaluate
  • The cost function lacks concavity, derivatives, so that traditional optimization methods are not applicable. In other words, cost function is a black box.
  • Uses Gaussian process regression for statistical inference
  • Uses acquisition function to decide where to sample next

The search space in Hyperopt is defined based on the range of values for each parameters. The range definition is as follows based on the data type of the parameters.

  • Categorical : Set of discrete values
  • Integer : A range of values. Hyperopt will sample within the range
  • Floating point : Uniform distribution with two boundary values provided

For floating point, Hyperopt also supports other distributions e.g normal and log normal. Unless you have good intuition about how an algorithm behaves with respect to certain parameter, you may not want use these and stay with simple uniform distribution.

For categorical parameters, you can associate probability with each value and it will be sampled accordingly by Hyperopt. Classifiers names are all categorical variables.

If you have the intuition that certain classifier will work better for a given problem, you can assign separate probability for each classifier, making the probability distribution skewed towards the preferred classifier. If you intuition happens to be incorrect, your result is likely to be worse than the case with no probability.

Code Free Hyperopt Optimization

Earlier I had built wrapper classes around some of the Scikitlearn classification algorithms. One of the goals for the abstraction was to enable someone to build classification models without writing any Python code. This is accomplished by defining meta data about the data and all the algorithm specific parameters is a properties configuration file.

Using the properties configuration file, you can train and validate classification models without writing any Python code. Following classification algorithms are currently supported in this framework. There is a configuration file for each learning algorithm.

  • Support Vector Machine
  • Random Forest
  • Gradient Boosted Trees

Although these abstractions help, Machine Learning expertise is still needed. Because you still have to manually set the appropriate parameter values in the configuration file. To bring Machine Learning and model training within the reach of domain experts who are not necessarily Machine Learning experts, I have added support for Hyperopt parameter search space definition in the configuration file.

There are two steps in this automation process. Through a particular configuration parameter, you provide a list of names of parameters to be included in the Hyperopt search space. In the next step, for each of the parameters in the list you provide a list of values. They get used to define the range of each parameter. Each parameters is one of the three types listed earlier. Here is one example

train.search.params=train.search.num.estimatorsgb:int,train.search.max.depth:int
train.search.num.estimatorsgb=140,180
train.search.max.depth=3,5

Here, I am searching with 2 parameters. The next 2 parameters provide the range for the two parameters, both of which happen to be integer. You could expand the search space by adding more parameters by adding to the list in the first parameter value and then adding a parameter to provide the range of values for the corresponding parameter.

Service Ticket Escalation

The use case is an example of service desk automation with Machine Learning. It has to with escalation in a customer service ticket. Consider a customer service ticketing system, where a support person will periodically escalate different tickets, based on various ticket parameters and statuses.

This escalation process could be automated, by using manual escalation data to train a supervised Machine Learning model. The model then could be used to predict and recommend the tickets that need to be escalated. Here are the different fields in the hypothetical customers service tickets. These are the features for the model to be trained.

  • Number of days open
  • Number of re open
  • Number of messages exchanged
  • Number of past tickets on the same issue
  • Number of hours before the first response message
  • Average number of hours before response messages
  • Number of re assignments
  • Customer type

There could be other features e.g. keywords in the text message or emails. I am using message as a generic term, which includes test message, email and phone call.

Here are the 3 configuration files for the 3 classification algorithms. For the most part they can be re used. For different data set, some changes are necessary, which will be discussed later.

Running Hyperopt

When running the python script for Hyperopt, you have to provide the number of iterations as a command line argument. You also need to have the 3 configuration files ready. The script has a callback function, which Hyperopt calls as combination of parameters sampled. It will be called as many times as the number of iterations you provide. Hyperopt keeps track of the best combination of parameter values that result in least cost or error.

The algorithms to be used along with corresponding configuration files are specified in the command line argument. You can choose all the 3 classification algorithms or a subset of the 3. In future, as I implement abstraction of other classification algorithms, more algorithms will be added to the repertoire. Here is how to run it with 50 iteration along with the tail end of the output.

./autosupv.py 50 svm:esc_svm.properties rf:esc_rf.properties gbt:esc_gbt.properties
...............
next evaluation                                                                                                                                  
...building svm model                                                                                                                            
...training and kfold cross validating model                                                                                                     
average error with k fold cross validation 0.025                                                                                                 
100% 50/50 [02:41<00:00,  3.24s/it, best loss: 0.0253984195984]
{'train.penalty': 1.0453502981033078, 'train.kernel.function': 0, 'classifier': 0}

After Hyperopt has run, the script will output the best set of parameters found. For categorical and integer parameters it will output an index into the corresponding array of values. For floating point parameters, it will output the actual value.

Here we find that SVM(with index o in the classifier list) is the optimum model. For train.penalty we get the actual value. The parameter train.kernel.function with index 0 happens to be selected., which is rbf. I have used a restricted search space with only few parameters. In reality you would want to include more search parameters and use a larger number of iterations.

You can also run Hyperopt with biased classifier choice, assigning separate probability to each classifier as below. In this example, classifier choice is biased towards SVM.

./autosupv.py 50 svm:esc_svm.properties:0.4 rf:esc_rf.properties:0.3 gbt:esc_gbt.properties:0.3

Training and Deploying the Optimal Model

The last step that needs to be performed is to train and save the model with the parameters chosen by Hyperopt. It is done by taking the optimal parameters found by Hyperopt and setting the corresponding parameter values in the configuration and training the model with appropriate parameters set to save the trained model.

You can deploy the trained model as a Rest service. Rest service implementations for the 3 classifiers are also available in avenir. More details can be found in the tutorial document.

Using a Different Data Set

For you own use case and data set, some changes need to be made in the example configuration files . The first set of changes are related to meta data of your data set as below.

parameter comment
train.data.fields Coma separated index of fields used to extract data form the training data file
train.data.feature.fields Coma separated index of fields for features in the extracted data
train.data.class.field Index of class variable field in extracted data

The next set of changes relate to the parameter search space. Here is an example for SVM.

parameter comment
train.search.params List of search parameters. Update as needed, adding and removing parameters
xxx Add parameter with range of values, one for each parameter in the list
xxx Add parameter with range of values, one for each parameter in the list

The parameter with value range can be modified to expand or shrink the range. If you expand the search space, you should also increase the number of iterations to allow sufficient sampling of the search space by Hyperopt.

Optimum Feature Set

Although I have not done it in the example, feature subset selection can be optimized with Hyperopt. This is accomplished by adding train.search.data.feature.fields to the list of values for train.search.params.

Then train.search.data.feature.fields could be defined with a list of values. Each value in the list is a colon separated list of feature column indexes in the data set. The list of indexes will be subset of all feature column indexes.

Running Hyperopt in Parallel

Although Bayesian Optimization is essentially a sequential algorithm, Hyperopt can be run in parallel with MomgoDB. We can also run Hyperopt in parallel with PySpark. One option is to split the parameter search space and use use PySpark to process each parameter search subspace in parallel.

For example, we could split along different classifiers and feature sub sets to create separate parameter search subspaces . Hyperopt could process each subspace in a separate Spark task. The results for each search sub space could be combined to select the best parameter values.

Summing Up

As it has been exhibited in this post, you can build your own AutoML pipeline with Hyperopt and Scikit without writing any Python code. Hyperopt which is based on Bayesian optimization, is better than commonly used grid search or random search for hyper parameter tuning.

Please follow the tutorial, if you are interested in executing the autoML use case in this post.

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.
This entry was posted in Data Science, Machine Learning, Python, ScikitLearn, Supervised Learning and tagged , , , . Bookmark the permalink.

2 Responses to Automated Machine Learning with Hyperopt and Scikitlearn without Writing Python Code

  1. Pingback: Machine Learning Model Interpretation and Prescriptive Analytic with Lime | Mawazo

  2. Pingback: Evaluation of Time Series Predictability with Kaboudan Metric using Prophet | Mawazo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s