/ Data Science, Machine Learning

Deploying AI to production: 12 tips from the trenches

I’m Max, and I work on applied AI here at SC5. As a consultancy, SC5 is expected to provide our clients with services that are not only well-designed and functional, but also capable of scaling and withstanding production load. An application isn’t much good unless it works in the real world.

Machine learning is, in many ways, a completely different beast than “traditional” software engineering. But in some aspects, it isn’t. Machine learning solutions also need to be deployed to production to be of any use, and with that comes a special set of considerations. Many of these aren’t properly taught in ML/AI courses/programmes, which is both a shame and an oversight.

In this post, I’ll try to give some tips on how to avoid common mistakes, based on my own experiences.

1. Democratize access to data within your company

Machine learning lives and dies by the amount and quality of data you provide it. Unfortunately, in many cases, data is hard to come by, even within one’s own company. It’s in multiple different databases, spread across several business units, or both. Building a modern, well-architected data platform and storage solution that gives everyone access to the data they need is a huge boon not only for machine learning, but other data-intensive activities as well. ML people will often tell you that the bulk of their time goes to accessing and cleaning data. Mitigating this issue makes it much faster to iterate.

2. Think about your use cases before training your models

Let’s say you want to build an AI that can read sign language from video in close to real time, and deploy that into your livestreaming service. Before you start testing different algorithms, take some time to think about the constraints and requirements imposed by the use case. Typically, machine learning models are evaluated using a single metric like accuracy or F1-score on a validation dataset, but other metrics usually apply, too. Using our example above, say you’ve trained two neural networks:

  • Neural network A has an accuracy of 94,5% and can predict the sign in a key frame of video in about 70ms
  • Neural network B has an accuracy of 92,1% and can do inference in around 20ms

Here, B is probably a better choice because although the accuracy is better in A, an inference time of 70ms won’t be able to keep up with a livestreaming application.

3. Think about scalability from the start

Supervised, unsupervised and reinforcement machine learning all have different requirements when it comes to deploying at scale, especially for access via an API. The easiest situation is when can you deploy a pre-trained model in which case you can have several instances running with little to no need for synchronisation, but other scenarios require more engineering — sometimes more than it takes to train a viable model to begin with.

4. Use the simplest learning algorithm that’s good enough

This applies in part to #2 — choosing the simplest algorithm that does the job will usually also a) do inference faster and b) require less computational power. Simpler algorithms are also easier to explain to stakeholders, which is something you should absolutely do.

5. Don’t retrain models automatically unless you are absolutely sure you know what you are doing

While it’s technically possible to retrain or continue training models as new batches or single examples of data come in, it’s usually not advisable. Unless you are sure about the distribution of incoming data, particularly for classification, you’ll end up with a biased estimator that leans too heavily to some side, producing incorrect predictions. If you do know the distribution, you can use cost-sensitive learning or over/undersampling, but it still may not be advisable, particularly in an internet setting — just look at what can happen to a chatbot under Goodwin’s law. In general, machine learning deployment shouldn’t be fully automatic, at least not for the time being. It’s a good idea to have a human director to decide what to deploy and when.

6. Double-check your hyperparameters regularly

You may test dozens of algorithms and hundreds of different trained models, each with their own set of hyperparameters, before deploying an algorithm/model combo that does the job well. The hyperparameters of the model you made worked for the training data you used — as that data grows or is altered, the same hyperparameters might not work as well anymore. Therefore, it’s worth doing a sanity check and hyperparameter sweep frequently to ensure that your piece of AI is working as well as possible.

7. Use real-world data wherever possible

Let’s say you want to build the world’s best cat recognizer mobile app. You use high-resolution pictures of cats off the internet as training data, train an accurate model & deploy your app, only to find that people complain that your app isn’t very good at detecting cats.

This is an example of a data mismatch problem — the mobile snaps used for inference come from a different distribution than your training data. Mobile snaps might be of poor quality, done under low lighting, and taken from challenging angles. Some users may be using older feature phones with bad cameras to begin with. A model that’s trained solely on good quality source material isn’t going to generalise well to user-provided images.

If it’s possible, and feasible, to collect “real-world data”, it’s usually worth doing. Add that data to your training set and you’ll have a higher chance of getting things right.

8. Log everything

CPU, GPU, RAM, I/O numbers, and well as the predictions and probabilities output during inference — log all of it to some centralised location. It helps debugging immensely, even more so for reinforcement learning applications. It’s also useful for making visualisations and dashboards that tell you how your models are doing in production.

9. Design a feedback loop to get more, better quality data

Before your machine learning system is deployed, it’s a good idea to design a feedback loop so you can a) collect more labelled data and b) correct mistakes your system is making (a) is mainly for supervised learning). On a UI level, this could manifest itself as a feedback dialogue or simple rating system. Or it could be more involved.

Some companies even have the foresight to collect labelled data for some task as part of an entirely unrelated service — Google collects labelled data for text recognition and other machine learning tasks as part of its reCAPTCHA spam system, which is pretty genius if you ask me.

10. Embrace the fact that machine learning is function approximation

The core idea of machine learning is to learn some function that maps data X into output Y as reliably as possible. By very definition, the learned function is an approximation of some hypothetical ground truth function — a machine learning system may reach some theoretical limit of accuracy (Bayes error), but it will never be 100% spot on. By and large, I think users are smart enough to understand this aspect of AI, which is why you consider communicating the use of AI within any application.

11. Leave quality assurance of predictions/inference to the machine learning experts

Let’s say you’ve built a recommendation model for movies and hand it over to QA for testing before deployment. One QA engineer doesn’t agree with the recommendations he or she is getting, even though on balance, you know your system is highly accurate. Since it’s impossible to emulate every single type of user your recommendation might have, it’s better to allow machine learning experts to validate models numerically rather than relying on empirical evidence by a handful of people. Numbers don’t lie.

12. Allow machine learning experts to deploy new models to production themselves

Machine learning people are tinkerers. They might—and usually do—come up with improvements to an existing model, and would like to deploy that model ASAP. In supervised learning at least, models are just a bunch of weights and metadata; they don’t change over time unless retrained. Provided that a proper serving system is in place, deploying a validated model shouldn’t be more difficult than uploading a file, and there is no reason you shouldn’t be able to trust machine learning experts to do that themselves.

There are many more considerations when it comes to making AI shine in the real world. It requires a coordinated effort between machine learning experts, developers, architects, designers and stakeholders. Do it wrong, and you aren’t getting the best of what AI has to offer. Do it right, and you’ll be able to make something spectacular that puts you significantly ahead of the competition.

If you have any questions, want to chat about the possibilities of AI, or need help on your project, feel free to shoot me an email (max.pagels@sc5.io) or LinkedIn message!