Machine Learning Life Cycle

The machine learning process maps out the steps a data science team takes to build and deploy a model, while also guiding their collaboration to ensure the creation of a highly effective predictive model.


What is a Machine Learning Life Cycle?

A machine learning life cycle describes the steps a team (or person) should use to create a predictive machine learning model.

Hence, an ML life cycle is a key part of most data science projects. In fact, for many people, it’s not clear what is the difference between a machine learning life cycle and a data science life cycle.

So, in this post, I’ll explore the machine learning life cycle and discuss how it relates to the data science life cycle.

!https://www.datascience-pm.com/wp-content/uploads/2022/01/charles-deluvio-Lks7vei-eAg-unsplash-1-1024×683.jpg

What is a Life Cycle?

A life cycle is used to explain the steps (or phases) of a project. In short, a team that uses a life cycle will have a consistent vocabulary to describe the work that need to be done.

While machine learning engineers and data scientists can typically describe the steps within a project, they might not use the same words, or even define the same number of phases. By having a consistent vocabulary, the team can better ensure they do not “miss a step”. While you might think that experienced team members would know the steps and not skip steps, teams can easily skip steps. For example, I have often seen that when the team has deadlines, the team finishes one model and then goes directly to trying to create a different model, without really exploring how well the first model performs. This could be due to very tight schedules, or the team’s desire to explore many models and “play with the data”.

There is another benefit to use an ML life cycle, beyond ensuring the team does not miss a step and having a consistent vocabulary. That other benefit is that non-technical people, such as a product owner or a senior manager, can better understand the work required and how for a long the project is towards completion.

In summary, a life cycle framework will:

  • Standardize the process and vocabulary
  • Help guide the team’s work
  • Allow others to understand how a problem is being approached
  • Encourage the team to be more thorough, increasing the value of the work.

A High Level Machine Learning Process

This workflow includes problem exploration, data engineering, model engineering and ML Ops.

!https://www.datascience-pm.com/wp-content/uploads/2022/05/Picture1-1.png

The Benefit of a More Detailed Machine Learning Process

While this high level workflow (which some people refer to as a life cycle) is helpful for providing an overall summary of the phases in a machine learning project, it does not provide an intuitive explanation of the work required to actually create a predictive model.

In other words, a more detailed machine learning process could provide a better non-technical view the work required to build a machine learning model. This enables the entire team to have an intuitive understanding of the steps required to build a model, and hence, how to prioritize the work to be done, and how much time each step might take.

A More Detailed Machine Learning Process

This more detailed process keeps the same high level phases (problem exploration, data engineering, model engineering and ML Ops), but defines the key steps within each phase of the ML process. Below is a discussion on each of the steps in the process.

!https://www.datascience-pm.com/wp-content/uploads/2022/05/Picture1-4-1024×638.png

Problem Exploration

First focus on how the model will be used. In the process, assess the desired model accuracy and explore other details, such as if false positives are worse than false negatives. This phase also includes understanding what data might be available.

  • Define Success: Define the problem to be solved. For example, what should be predicted. This helps define what data will be needed. Also, make sure it’s clear how success will be measured.
  • Evaluate Data: Determine what are the relevant data sources. In other words, evaluate what data the team will need, how that data is collected, and where the data is stored.

Data Engineering

Design and build data pipelines. These pipelines get, clean and transform data into a format that is more easily used to build a predictive model. Note that this data might be coming from multiple data sources, so merging the data is also a key aspect of data engineering. This is often where the most time is spent in an ML project.

  • Obtain Data: Assembling the data. This includes connecting to remove data stored and databases, which might be in different formats. For example, some data might be in CSV format, and other data could be available in JSON via web services.
  • Scrub Data: The process of re-formatting particular attributes and correcting errors in data, such as missing values imputation. Datasets are often missing values, or they may contain values of the wrong type or range. Cleaning can include removing duplicates, correcting errors, dealing with missing values, normalization, and handling data type conversions.
  • Explore / Validate Data: Get a basic understanding of the data. This exploratory analysis includes data profiling to obtain information about the content and structure of the data. The goal is to both understand the data attributes as well as the quality of the data.

Model Engineering

This is the phase that most people associate with building a machine learning model. During this phase, data is used to train and evaluate the model. This is often an iterative task, where the different models are tried, and the model is tuned.

  • Select & Train Model: The process of identifying an appropriate model, and then building / training the model (on training data). The goal of training is to answer a question or make a prediction correctly as often as possible.
  • Test Model: Run the model on data that the model has not yet seen (such as testing data). In other words, perform model testing by using data that was withheld from training (i.e., backtesting).
  • Evaluate & Interpret Model: Objectively measure the performance of the model. Note that basic evaluation explores metrics such as accuracy and precision, to determine if the model is useable, and which model is best for the specific problem being explored. This evaluation also includes an understanding of when the model makes mistakes. More generally, validating the trained model helps to ensure the model meets original organizational objectives before the ML model is put into production.
  • Tune Model: This step refers to parameter tuning, which, depending on the model being used, can be more an art than a science. In short, models typically have parameters (i.e., dials for tuning the model), which allows the model to get improved performance via parameter refinement. Simple model parameters may include attributes such as the number of training steps and the initialization of certain values.

ML Ops

Broadly defined, machine learning operations (ML Ops) spans a wide set of practices, systems, and responsibilities that data scientists, data engineers, cloud engineers, IT operations, and business stakeholders use to deploy, scale, and maintain machine learning solutions.

  • Deploy Model: Package and put the model to use (i.e., into production). While this varies from one group to another, the team needs to understand the expected model performance, how the model will be monitored, and in general, key performance indicators (KPIs) of the model.
  • Monitor Model: Maintain the model in production. This includes monitoring the KPIs and proactively working to ensure stable and robust predictions.

The Machine Learning Process Coordination Framework

When most people describe the machine learning process, they focus only on the steps required to build a predictive model (i.e., the steps just discussed) or more generally, the machine learning life cycle. This might be appropriate if the work is being done by one person, such as a researcher doing some analysis.

However, creating and using predictive models is increasingly becoming a team sport. And a modern data science team needs to define both the steps in doing the project as well as how to coordinate among the team members working on the project.

For example, note that the while the arrows in the diagram show a continuous flow, the team might need to go back to the previous phase / step. How does the team determine “when to move forward”, and “when to take a step back”? This is where a coordination framework can be useful.

Together, the steps of the project combined with a coordination framework create a comprehensive process that can guide the team toward successful project execution.