- Posted by Daitan Innovation Team
- On May 14, 2021
- AI, Machine Learning, MLOps
A Set of Essential Practices for Scaling ML-powered Applications
Data Science and Machine Learning (ML) are becoming priority strategies for solving many complex real-world problems. Results from McKinsey & Company reports suggest that organizations from all sectors are using AI (in at least one business function) as a tool for generating value, mainly in the form of revenues or cost reductions.
In fact, in our day-to-day lives, we are surrounded by artificial intelligence systems on all sides, from recommendation algorithms on e-commerce and streaming platforms to photographic filters in popular apps such as Instagram. To develop such algorithms, a data scientist needs to spend a lot of time on tasks ranging from data acquisition and cleaning to running training experiments and validation using a pre-defined metric.
But while developing and training ML models is not a trivial task, from an industry perspective, the real challenge is not to build a model, but rather a ML-powered system that can be built, integrated and operated continuously on production in an easy way. To have an idea, according to Algorithmia’s “2020 State of Enterprise Machine Learning” report, only 22% of companies have fully deployed a ML model to production.
Further, according to Sculley et al , only a small fraction of a real-world ML system is composed of the ML code, while the required surrounding components are vast, diverse and complex. Therefore, to design, build and operate complex systems like these, a set of effective practices and processes is needed. Thus emerges MLOps.
How To Define MLOps
According to MLOps SIG, MLOps is defined as
“Extension of the DevOps methodology to include Machine Learning and Data Science assets as first-class citizens within the DevOps ecology”. .
However, as a rising field, the term MLOps does not have a strict definition, especially when compared to Machine Learning Engineering (MLE). So the definition given by Andriy Burkov about MLE is also accepted, where
“MLE is the use of scientific principles, tools, and techniques of machine learning and traditional software engineering to design and build complex computing systems. MLE encompasses all stages from data collection, to model building, to make the model available for use by the product or the consumers.” .
Regardless of the term used (MLOps or MLE) what matters is the goal to provide an end-to-end machine learning development process to design, build and manage reproducible, testable and scalable ML-powered software.
Benefits of MLOps
As stated, MLOps seeks to provide a process for the development of ML systems that are robust and maintainable. This is achieved by ensuring fundamental capacities and qualities for both the system and for the project itself. Some examples include:
- Reduction of technical debt across ML projects.
- Application of Agile Principles to ML project.
- Reproducibility guarantees.
- Versioning of data, pipelines and ML models.
- Automated testing of ML artifacts.
- Performance monitoring of models in production.
- CI/CD support for ML assets(*) and data.
- CT (Continuous Training) support for ML models and pipelines.
- Unification of the delivery cycle of both models and application.
- Scalability, high availability, fault tolerance, fairness and security in the context of ML.
Note that from these capabilities, more benefits arise, such as speed in the process of putting models into production, reduced development and operating costs at enterprise-level, mitigation of risks associated with the ML project, etc.
Fundamental Practices of MLOps
In the MLOps world, new trends and tools emerge all the time. However, among the various practices that exist, there are three major practices in MLOps arsenal. These include “must-have” requisites to achieve the goal of proving a powerful development process and ML system. Furthermore, each of these practices can be extended, improved and contains practices that can also be extended and improved.
Unlike conventional software development, ML-powered applications have three artifacts that must be worked on: Data, Model, Code(**). The practice of versioning the data, model and code artifacts of a ML project is one of the most fundamental practices in MLOps, since through versioning it is possible to improve reproducibility and ensure maintenance, error prevention and disaster recovery for the entire project.
For example, there may be situations where updating either the data or the model worsens the performance of the application in production. In situations alike, it is important to be able to rollback to previous versions in an automated way. Or, keep a heavy tracking of changes as the data is kept and updated from several different sources frequently and the model is also updated frequently automatically. By versioning artifacts of a ML project, is possible to:
- Keep track of modifications on both the data and the model, which makes it possible to identify insertion of bugs or changes that caused a drop in performance in the model.
- Revert the version of the data or model to previous versions in the case of broken releases (or ones that may break in production).
- Automate the entire ML system using CI/CD (and CT).
Due to the experimental and iterative nature of the development of ML models, keeping a strict tracking of all experiment related information is essential. Basically, experiments tracking is the practice of saving all important information related to data, model e code of each experiment you run, so that it is possible to have full knowledge about every information generated in each experiment and control over all changes.
For example, when developing a model we may want to track (and also versioning) each time an experiment is run
- Scripts used.
- Configuration files.
- Data (and metadata) for training, validation and testing.
- Model (hyper)parameters.
- Results of evaluation metrics.
- System metrics performance.
Once we have this information, we can compare results of different models and approaches, identify computational performance troubles, in addition to the knowledge and control over each artifact and its impact on the experiment. Therefore, the practice of tracking experiments is fundamental for reproducibility (and the main way to achieve it) and model development.
Automated Machine Learning Pipelines
Automation is another MLOps fundamental practice. In the context of ML, automation consists of automating all pipelines in the ML workflow, including data pipeline, model building and code integration, so that the entire flow takes place without any manual intervention. Thus,
- Experimentation occurs more quickly and there is greater readiness to move the entire pipeline from development to production.
- The production model is automatically retrained using updated data, where the (re)training is automatically activated through triggers.
- The pipeline implementation that is used in the development or experiment environment is used in the (pre)production environments
- An ML pipeline in production continuously delivers prediction services to new models that are trained on new data. The model deployment step, which serves the trained and validated model as an inference service for online inferences, is automated.
Considering a typical ML pipeline, where we start from the data extraction step until the model serving step, an ML project has (generally) three levels of automation: Manual Process, Automated ML Pipeline and CI/CD Pipeline.
- Manual Process. Typical Data Science process, where each step of the pipeline, from data preparation to model building is performed manually using Rapid Application Development (RAD) tools, such as Jupyter Notebooks. This level of automation is characterized by its experimental and iterative nature.
- Automated ML Pipeline. Stage where the whole process of building and validating the model is performed automatically as new data becomes available or from retraining triggers that are activated based on some scheduling policy or performance threshold. Here, the goal is to perform continuous training (CT) of the model by automating the ML pipeline. This level of automation is characterized by the orchestrated experiments(***) and models in production that are continuously updated automatically. Test and deployment steps are performed manually.
- CI/CD Pipeline. Stage where the whole ML workflow, from data acquisition to model deployment is performed automatically by CI/CD systems. At this stage, unlike the previous one, we also automatically build, test and deploy the pipelines of each of the artifacts: data, model and code. This level of automation is characterized by the orchestrated experiments and full automation of the whole ML workflow.
As the automation of the ML system becomes more sophisticated, the testing routines must follow the evolution and be executed automatically as part of the pipeline. In addition to unit and integration tests (which must be present in all artifacts), we also include specific tests for data and model. Some examples are: data validation (check if the data follows the defined schema) and model validation (check if the model is obsolete, skewed or biased).
After a model goes into production, it needs to be monitored to ensure that it works as expected. In the context of automated ML pipelines, monitoring is a prerequisite for a proper automation. In other words, by monitoring the model in production, it is possible to keep track of the model’s performance and automatically retrain it when it becomes obsolete.
A Feature Store is an optional component but strongly recommended, even without (or out of) automated ML pipelines. Essentially, a feature store is a centralized storage and feature computation service that allows features to be defined, stored and used for both model training and production models. Furthermore, a feature store should be able to store large volumes of data and provide low latency access to features for online applications.
Among the many benefits of a feature store, there are:
- Reuse of available features by sharing features between teams and projects, instead of recreating the same or similar ones.
- Avoid having similar features that have different definitions by maintaining the feature pipelines and their related metadata.
- Enable the serving of features at scale with low latency for training routines and online applications.
- Ensure the consistency of features between training and serving avoiding training-serving skew.
With the increasing use of ML in various industry sectors and the need for maintainable and scalable ML-powered applications, the adoption of the MLOps culture should become standard for all those who work with AI over the next few years. After all, MLOps has been proved essential for large-scale projects and its adoption has generated many benefits, as mentioned before.
In addition, since Big Data and Machine Learning only generate value when properly used, MLOps tend to become the differential between companies that use data well or not.
Currently, there are several MLOps tools that help data teams manage their projects, from open-source tools such as MLFlow, DVC, TFX and Kubeflow to cloud such as AWS SageMaker, Valohai, Algorithmia, DataRobot and Neptune.ai. But as hot topic, new trends and tools have been introduced every day.
(*) ML assets (or artifacts) are all parameters, hyperparameters, training scripts, training and testing data used to build the ML model.
(**) Data comprises both the data pipeline and training, validation and test data. Model comprises ML assets and Code reflects the application source-code whose model is to be integrated.
(***) Here, we are considering orchestrated experiments, experiments whose transitions between steps are made automatically and, preferably, with rigorous tracking.
References and Further Reading
- Sculley, David, et al. “Hidden technical debt in machine learning systems.” Advances in neural information processing systems 28 (2015): 2503–2511.
- “ML-Ops.org.” MLOps, ml-ops.org/.
- Burkov, Andriy. Machine learning engineering. True Positive Incorporated, 2020.
- “MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.” Google Cloud, cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning.
- Breck, Eric, et al. “The ml test score: A rubric for ml production readiness and technical debt reduction.” 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017.
Top banner image source: Source: https://ml-ops.org/