AI for formulation

A concise state of the art.

Introduction

Artificial Intelligence (AI) is being used more and more by chemists to help them perform various tasks. Originally, research in AI applied to chemistry has largely been fueled by the need to accelerate drug discovery and reduce its huge costs and the time to market for new drugs. So far, AI has made significant progess towards the acceleration of drug discovery R&D. However, the applications of AI in chemistry go much further than drug discovery, as discussed in a recent review. In this article, we will provide a general picture of how AI can help formulation scientists be faster and more creative in their research.

AI to develop new formulations

When developing a new formulation, AI-based optimization algorithms such as Bayesian optimization can significantly speed up the process.

Here are the typical steps of an AI-based optimization project:

You perform a small number of initial experiments (as few as 2),
A machine learning model is built using the experiment results. This model is usually a Gaussian process, which is able to provide an uncertainty with each of its predictions.
Using the model built at step 2, the algorithm will choose the next experiment(s) to run by combining the exploitation strategy (trusting the model and running experiments that are predicted to give good results) and the exploration strategy (running experiments for which the model uncertainty is high, in order to reduce that uncertainty and improve the model quality).
You run the experiment(s) proposed by the algorithm.
You add the result of that new experiment to the dataset.
You go back to step 2.

This optimization algorithm stops when you obtain a formulation that satistfies your objectives or when you have spent your experiments budget.

As you can imagine, at each iteration of this algorithm, the quality of the machine learning model improves (since you are adding data that the algorithm requested) and the algorithm becomes more and more powerful. We can say that you are capitalizing on your experiment results.

Another nice thing about this Bayesian optimization approach is that it doesn't need much data to start being used. As few as 2 initial experiments will do the job, and the algorithm will take care of requesting the next experiments it needs in order to learn and improve.

AI for reformulation

Now, let's say you want to replace one or several raw materials in an existing formulation. The reason for that could be: to change your supplier, to replace the ingredient with a bio-based alternative, to replace it with a cheaper ingredient, etc. Of course, you want to obtain a new formulation that has the same performances and stability (or better) than the original formulation.

Starting with your existing formulation and the formulation data that you have acquired while developing that existing formulation, you can use AI-based optimization algorithms (e.g. Bayesian optimization) to reformulate your product in a minimum number of experiments. You will use the same process as if you were developing a new formulation from scratch (see "AI to develop new formulations"), but starting with a larger dataset (all the formulations you have tested while developing the existing formulation). The mission of the optimization algorithm will be to adapt the initial formulation to the new ingredient(s).

AI to test your ideas of new formulations

If you have accumulated formulation data, you can build machine learning models from that data and use the models to predict the properties of hypothetical formulations. This is a more traditional approach of doing machine learning than AI-based optimization. If you manage developing an accurate machine learning model, you can use it to test your ideas of new formulations in silico. Here, contrary to the use of AI-based optimization algorithms, the AI doesn't propose new formulations, but predicts the properties of formulations that you design, as a formulation scientist. Thus, you can validate your formulation ideas before spending time and money to prepare and test your new formulas in a laboratory.

The drawback of this traditional machine learning approach is that it requires significant amounts of data. As a rule of thumb, we recommend to have a number of data points (experiments) higher than five times (and ideally ten times) the number of inputs (composition variables and process variables). In practice, datasets that contain only human-designed experiments suffer from a lack of diversity that can limit the quality and robustness of the resulting machine learning models. When using AI-based optimization, this problem is solved because the AI itself designs the experiments for maximum usefulness from an AI perspective.

What kind of formulations can be developed using AI?

At ChemIntelligence, we have used AI to speed up the development of many different kinds of formulations. Here are some examples of formulations that you can develop using AI techniques:

Coatings,
Adhesives,
Plastics,
Vaccines,
Drugs,
Cosmetics,
Perfumes,
Inks,
Cleaning products,
Rubbers,
Concrete,
Food & drinks,
Etc.

Whatever the kind of formulation you are developing, as long as you are able to provide the AI with a list of possible ingredients, process parameters, objectives (i.e. tell the software what are the characteristics of the ideal formulation you would like to obtain) and constraints, AI-based optimization can help you design your ideal formulation in a minimal number of experiments.

How much formulation data is it necessary to have in order to use AI?

It depends on which approach you are using:

If you are using the "traditional" machine learning approach (accumulate data, build a model, use the model to make predictions), we recommmend to have at least five times (and if possible, more) as many experiments as you have composition variables and process variables.
If you are using Bayesian optimization (or a similar approach), you can start with as few as 2 experiments, and use the optimization algorithm to design the next experiments. Overall, you will need to perform fewer experiments than with the traditional machine learning approach, because the Bayesian optimization algorithm requests to perform the experiments that are most useful to improve the quality of the machine learning model.

How to be successful at using AI to develop formulations

During our collaborations with formulation scientists, we give them plenty of advice. Here, we share with you a few tips that will help you be successful in applying AI techniques to your formulation projects.

Tip 1: Use modern optimization methods that exploit machine learning (also called sequential learning or even sometimes active learning). This ensures that the data you acquire while doing experiments is the most useful for the machine learning algorithms you use.

Tip 2: Don't wait to accumulate data before starting using machine learning (as a direct consequence of Tip 1). Start using machine learning-based optimization as soon as possible in your project, and let the optimization algorithm suggest you experiments that are needed to improve the machine learning model quality.

Tip 3: Use a modern system to store your formulation data. Paper notebooks and Excel are insufficient for long-term capitalization of experiment results. Using an electronic lab notebook (ELN) or a database system to store and centralize formulation data ensures that you will never waste or lose valuable results. It also means that when you want to retrieve historical data to apply data science or machine learning techniques, you can retrieve that data with minimal effort.

Tip 4: Favor data diversity. For example, if you have numerical variables (e.g. representing the amounts of each ingredient in your formula), acquiring data with diverse values of these variables (and not only two different values for each composition variable) will produce more robust machine learning models and increase your chances of success. Machine learning-based optimization can help you increase the diversity of your dataset by using a strategy called "exploration".

Tip 5: Integrate as much domain knowledge as possible in the optimization or model building process. Domain knowledge is involved in the choice of ingredients and process variables, in the choice of constraints applied to the optimization problem, in the choice of the optimization objectives (and the way of combining them), etc. We believe that it is the combination of machine learning and domain knowledge that will bring the most acceleration of R&D projects in the next years.

Tip 6: Collaborate with companies which are specialized in AI applied to formulations/chemistry. When the company you are collaborating with is able to understand your language (because they have a chemistry/formulation background), they can propose the most adapted AI solutions to your formulation problems. Understanding the data and how it was generated is key to be successful in applying AI techniques.

Conclusion

Machine learning-based optimization algorithms such as Bayesian optimization can help you develop formulations in a minimum of experiments. They are more efficient than traditional trial-and-error (one variable at a time) optimization and than Design of Experiments (DoE) methodologies. Moreover, they are a clever way of capitalizing your R&D experiment results and will get better each time you provide them with new experiment results.