Updated: Feb 11
I am currently working with a client that is growing its data science group, like, a lot. There are so many opportunities, so that makes sense. Data scientists tend to focus heavily on algorithms, but there are a lot of fascinating (and rather essential) problems to solve on the infrastructure and deployment (in a production setting, that is) side of machine learning as well. That client has set up an ML Engineering group that is responsible for deploying in production.
I did a bit of research to see how to go about deciding on deployment approaches. I came across an article by Håkon Hapnes Strand, which gives us a good start. He writes on Quora quite a bit, and I also read many of his answers - excellent insights, a smart guy.
So here is how he looks at the problem.
We first need to define the type of learning. There are two modes of training machine learning models:
Offline learning: The model is trained once on historical data. After we deploy the model to production, it persists as is unless we re-train the model if it becomes volatile, which we expect to have to do regularly.
Online learning: here, the model is continuously updating itself as new data arrives. This is particularly beneficial for sensor data because online learning algorithms can pick up effects that change over time.
We then need to recognize how the algorithm makes predictions:
Batch predictions: The algorithm generates predictions in bulk. As an output, it may fill a table of forecasts from its input data. This approach is typical for data that is not time-dependent, like the likelihood of a client leaving, or when the forecast is not required to be up-to-the-second fresh.
On-demand or live predictions: The predictions are being calculated live, as the input data comes in. Predictions are being made in real-time using the input data that is available at the time of the request.
He classifies the "productionalization" of machine learning models using a 2-by-2 matrix depicted on the left.
Forecasting - This entails training a model, then running it on a data set and storing the result somewhere. Arguably, this is similar to some ETL or scoring engine which could be implemented somewhere to refactor scores or predictions for later access, for example in a data warehouse. We also see this sort of approach to feeding mobile applications.
Web service - Here, we wish to embed analytics in applications. A web service like a REST call is typically how we implement it, much like a function call. Send the featured parameters in (the input fields that the model was trained to use), and it swiftly returns a prediction.
Automated Machine Learning - The least used of the four methods, this entails automating the entire process of training and model selection process. It takes engineering to do it. Imagine repeated batch predictions to both train the model and generate the predictions. Because it takes time to train the model, this approach cannot support real-time.
Online training - Dynamic and most technically challenging to implement, the model is updated continuously and immediately accessible as a web service. Here the learning algorithm is pinned into in a data stream (or big data stream).
Self-learning algorithms are sexy but probably will require way more skills in ML engineering. Forecast is likely the most traditional and simplest deployment method - like putting a scoring algorithm in a production SAS environment that gets invited to run at regular intervals to update some score for clients.
The web service approach is the most popular because most times we want to either automate decisions or bring the prediction as close as possible to where it can have an impact, like the front line of customer service or maintenance.
But even with the web service, I think it gets more complicated pretty quickly. For instance, you can have a case where new data comes in a stream, but the algorithm not only needs that new data but also to include some history to make its prediction. Running an algorithm on one record is typically fast, but on a year's worth of data for a specific client (so a query is involved), with many features (these have to be aggregated), cannot be expected to deliver a result in sub-seconds unless we engineer it somewhat.
GETTING GOING QUICKLY
There is no one standard way of deploying ML algorithms in production. It is a serious issue for organizations, like my client, who are betting on data science to be a driver in the optimization of their operational excellence pillar. The truth at the moment is that it will depend on the use case. As seen above, there are patterns, so the solution will be a mix of deployment methods (patterns, technologies and infrastructure choices) and people that will focus on the IT move-to-production part. Hire data engineers or AI engineers.
But the process includes a lot of data discovery too, requiring speed and agility even before anything goes in production.Data engineers will help there too. I will write more about this. Promised.