Machine Learning is talked about everywhere but is still far from realizing its potential in production systems. While deployment challenges like integrating with existing systems, testing, computational cost, etc. persist, a great challenge that impedes adoption is a skepticism of the model predictions. Since one does not entirely understand why and how a model predicts what it did, it becomes difficult to trust its predictions when it gets different data from what it was trained on.

For example, a speech recognition model works very well in training, will fail if there are multiple people talking simultaneously or when it faces a dialect it was not trained on. While there is no robust approach to overcome these challenges, a mitigation strategy is to monitor the model input and predictions and check for anomalies or better predict anomalies and take proactive action.

Unlike software systems, Machine Learning systems are not one time deployments but are a continuous cycle, of data gathering, model training, evaluation, and deployment. This is necessary to ensure the systems work in the face of model decay caused by the change in the input data distribution. An important question is, how often and when should one retrain deployed models. Monitoring the input data and model predictions helps answer these questions.

Monitoring the data and predictions, we constantly gauge the range, distribution, and type of input data against training data. The operations team is alerted when data drift, anomalies are detected. Importantly a good monitoring system can predict anomalous behavior before it happens and alert the respective teams to take corrective action.

Consider a sales prediction model that predicts sales based on the performance of advertisements seen on youtube. The model receives data from YouTube and is trained on various attributes like the number of clicks, ad slots, total watch time of advertisement, number of likes, number of unique views, user demographics, etc. Now assume, Google introduced a “skip ad” feature. This will cause a change in how long a user watches the ad which in turn will affect the sales prediction model. ML models predictions are consumed by end-users or are served as an input to other models. So, any anomalous predictions will have ripple effects if not detected earlier.

Image shows the performance of the model without retraining (left) with periodic retraining (right)

The most important aspect of statistical analysis is not what you do with the data, it’s what data you use

Andrew Gelman

We set the context for why monitoring is important in ML models. In our next post, we will touch upon the different concepts of data drift, concept drift, data bias, model bias.

Related Post

Leave a Comment