Design of Experiments in Organic Chemistry

You may be wondering, why we are making such a fuss about «Design of Experiments» – after all, it’s nothing fancy, right? Choose some conditions, go to the lab and follow the procedure.

Design of Experiments is actually a fixed term for a very specific way of thinking about experiments – not just chemical ones. You can use this method for virtually anything which has at least one measurable outcome, that depends on at least 2 factors. Let me take you through the mental journey we have undergone during the last 2 years.

For this part, let’s assume we did find some new reaction – the next step will be to optimize it to give the highest possible yield, selectivity and so forth. So, how are optimizations typically carried out in academic organic chemistry? Again, we look at the literature to find out what others have done, and try that first. After some experiments, we usually get some feeling about how the reaction might work, and we start adjusting our reaction parameters, such as catalyst, solvent or temperature (the things we talked about in the beginning). As we have learned in 8th grade in school, we must only vary one of those parameters at once, since otherwise we do not know which change actually caused the new result. This approach is called OFAT – one factor at a time – and is still the standard today in academic organic chemistry. Here is what it looks like in real life:

The obvious parameters that were modified here are the stoichiometric equivalents of [Co], AgSbF6 and MCO2R, as well as different types of [Co] and MCO2R. Hidden in the “small print“ are three special cases c, d and e, where the reaction was run at a higher concentration (c, entry 10), at 100 °C instead of 120 °C (d, entry 11) and in TFE instead of DCE as solvent (e, entry 13). So, according to the way they arranged the table, the authors’ thought process could have been something like this: Starting from the standard conditions they probably got from the literature (entry 1), they screened some alkali carboxylates first (entries 2 and 3). They then continued with the carboxylate that gave the highest yield of the chosen three (although KOAc is almost as good here). The authors then kept the newly found NaOAc constant and started varying [Co], choosing between 4 different types (entries 3 – 6).1 Having found the best one (that which they were already using anyway, Cp*Co(CO)I2), they started varying the amounts of NaOAc and [Co], respectively, and then employed the aforementioned special conditions to some of the examples. Now you should understand, what OFAT really means: changing one condition at a time, taking the best result and continuing with the remaining conditions.

1There is also something going on with the [Co] equivalents at this point, which in part is attributed to the type of chemistry applied here, but in part makes not much sense and is rather random (i.e. entry 6), but let’s ignore that for the sake of simplicity (it is already complicated enough as it is).

Now, when I look at something like this, two things come into my mind: First, this is quite confusing to read (not to criticize the authors specifically; almost all optimization tables in Organic Chemisty look like this). You have to dig manually through the numbers, and often the only thing you really take from a table like this are the conditions at maximum yield (here entry 10). A lot of effort for a single number, right? Second, what do we really know about the reaction now? We only went along a narrow line of conditions, showing us the effect of each process parameter only for a very restricted part of the so called reaction space. But what does that mean?

A bad attempt at drawing out an OFAT approach in Chemdraw.

Take a look at the figure above: It shows the first three conditions tested in our example as the axes of a 3D coordinate system. The red arrows indicate the order in which the authors screened their reaction in the table: First the type of carboxylate, then the type of Co complex, then the amount of NaOAc.

If we stick to these first three conditions for clarity, we can imagine the situation as a cuboid. Let’s complete the above diagram:

Note how the investigation only covers narrow lines of the entire cuboid space.

At least now, even if OFAT might intuitively make sense, we can see its biggest flaw: It implies that the factors do not influence each other. We happily assume, that for this reaction (whatever that’s supposed to mean), NaOAc is the best carboxylate additive, and 10 mol% of it is the best amount to use. Problem is, chemistry isn’t that easy. Who says that with [Cp*CoI2]2, 50 mol% of NaOPiv wouldn’t have worked better? Or that using Cp*Co(CH3CN)3(SbF6)2, no carboxylate would have been the best solution, giving the highest overall yield? After all, these specific conditions were never investigated. All factors we put into a chemical reaction may not only change its outcome, but they may influence each other tremendously. Using OFAT, these interactions will never be observed, and the global maximum yield/selectivity/whatever might be hidden forever.

To illustrate what we can do about that, let’s consider another example from the field of Cp*CoIII chemistry:

Again, we will choose three factors, so we are able to plot these as a cube. In this case, I chose the amounts of Cp*Co(CO)I2 and  of Zn(OAc)2, respectively, as well as temperature. All other factors were ignored.

It’s cluttered!

Looking at the resulting picture, it is clear that the choice of conditions strongly biases certain areas of the reaction space, particularly those of high temperature and high amounts of Zn(OAc)2. To be fair, the authors did find some really good conditions – both products, 3aa and 4aa, could be obtained in high yield and selectivity. Still, these 7 runs could have been utilized much more efficiently. If we add only one more reaction, we can, for example, cover all vertices of the cube. Adding just one more to a total of 9, we can squeeze in a center point:

Evenly dividing reactions across space helps to see the bigger picture.

Using this layout, we would have been able to observe the reaction’s behaviour throughout the entire investigated reaction space, without overrepresenting certain areas. We still would have been able to detect the trends in yield and selectivity, with the added benefit of having an actual mathematical model to visualize the influence of the different factors. This is precisely what DoE does: Instead of randomly poking around, you define a range of values you would like to examine, plan an according number of experiments before you enter the laboratory, conduct all experiments necessary, and observe the results. The most powerful way of doing so is what we have just described: You take a number of factors and determine a lower and upper bound for each factor, respectively. For additional information gain, you can consider adding the center point, at which all factors will be set to their mean value inside the given boundaries. In our example, the center point would be [CpCo(CO)I2] = 0.5 eq, [Zn(OAc)2] = 0.75 eq., and T = 75 °C. The center point is often replicated 3 times, to determine whether the error margin of our system is adequate.

It does not stop here though: After you ran your pre-designed experiments, the results (called responses) will be used to create a mathematical model – it’s simply a function describing your response in dependency on your input factors. With this, you will be able to predict yields (purities, enantiomeric excesses, selectivities,…) based on all possible combinations of factors, even those you haven’t tried yet. Sounds complicated? It isn’t! Software exists for this purpose, for example Design Expert® by Statease® , which does all the math for you! You may also download an incredibly well-made presentation on DoE in Organic Chemistry, created by Luis Sanchez at Michigan State University, which explains the mathematical background in a really understandable way and which served us as a starting point in all this.

An example page from Sanchez’ presentation. Check it out, it’s really good!

«Okay, this is nice and all», I hear you say, «but where do the statistics come in? Have we not just iterated over all possible combinations, and observed the numbers, just as before? Doesn’t this mean we have to do even more experiments, than with OFAT?»

Yes, you are absolutely correct. For small amounts of factors (in this case 3), DoE isn’t operating at its full potential. Still, we can already observe some interesting behavior, which would be hard to catch with the naked eye. Using a DoE software, such as Design Expert® (we are not sponsored by them, by the way…), you can just plug in all your values, and will get a nicely presented statistical analysis of your data, along with plots, graphs and everything. Design Expert® caters specifically to DoE beginners like us, who do not have much expertise (if any) in the field of statistics, but would still like to use its powers. Take a look at what my bachelor student Jan Gerstenberger did in his thesis:

Our own first example of a 23 full factorial experiment design. Source: Jan Gerstenberger, bachelor thesis.

This is the concept we’ve seen in cube form, only as a table. Three factors, solvent, time and [Co] amount, were set to two levels each. The yields of the resulting 8 (= 23) experiments were put back into the software, which gave us, among other things, this graph:

Detecting multifactor interactions is one of DoE’s strong areas. Source: Jan Gerstenberger, bachelor thesis.

What can we see here (if we squint our eyes a bit, at least)? In the model, time proved to be irrelevant inside the given boundaries (not shown here). The graph itself displays the impact of the solvent on the horizontal axis and the resulting yield on the vertical, with the two cases for low (black dashed line) and high (red dashed line) [Co] amounts. Evidently, the solvent type as well as the amount of [Co] did matter, with DCE being better than acetone, and high amounts of [Co] resulting in better yields than low ones.

The interesting bit is the multifactor interaction between solvent and [Co] amount found by the software. We can see that DCE is always better than acetone, for both high and low amounts of [Co]. However, with 10 mol% of [Co], the difference isn’t as large as with only 2.5 mol% – it’s even within the error bars. That means we have a choice here: DCE is a highly toxic, carcinogenic solvent banned for industrial use in the European Union. If we increase the amount of [Co], we can get away with using the non-toxic, and overall much «greener» acetone instead, while only sacrificing a small fraction of the yield. DoE is really good at detecting these interactions. More than that, since you created an actual model from your response data, you have statistical validity. There are error bars and significance tests for everything, so you can be sure your interpretation actually means something.

Common DoE software offers lots of clear diagnostics for your models.

As implied before, the real powers of the DoE approach emerge, when we have more than just a few factors to investigate. Imagine you would like to do an extensive screening of a fictional reaction, and you suddenly have, let’s say, 12 factors. These could be temperature, amounts of 5 different reagents, respectively, addition rates of 2 reagents, catalyst pre-formation time, stir speed, reaction time and concentration (I have just made these up). According to our knowledge so far, we would have to conduct 212, that is, 4096 reactions.

Surely this can’t be right?

Well, yes and no. Technically, to try all combinations, we would have to carry out all of those, plus 3 center point reactions. We don’t really have to though. The big, big advantage we have is, again, statistics.

Imagine for a moment the full set of those 4096 reactions. Since each factor is set to two extreme levels (e.g. Temperature could be set to 40 °C and 100 °C), each level for each factor would be tested 2048 times. But because in the end we will stuff everything into a statistical model anyway, and we will not be looking at this manually, an important thing happens:

We don’t need to test all combinations – in fact, we can omit most of them. This might seem counterintuitive at first, but bear with me here: Because we are testing so many factors, each level of each factor has to appear a certain number of times already. Think about it: Even if we only did a single reaction per factor (that would be 12 here), each level for each factor would already appear 6 times! Now, what do you think how often each level actually has to be tested for a model to be meaningful?

The application of statistical models lets you drastically reduce the number of necessary experiments.

What you see above is the experiment planning tool in Design Expert®, with the number of factors to be investigated on the horizontal axis, and the resulting necessary number of runs on the vertical axis. Diagonally in white are the so-called «full factorial» runs, where you consider all possible 2n reactions (as in our examples above). What you also see are a lot of options, where the number of runs is equal to 2n-x (with x < n, of course). This puts actual numbers to the statement we’ve made before: You can drastically reduce the number of experiments while still obtaining relevant information. For our example, a staggering 32 reactions (+ 3 center points) will still enable us to use a «Resolution IV Design», which is considered to be adequate for most cases.

So, hold on a minute: We just reduced the required amount of experiments from 4096 to 35, eliminating a whooping 4061 runs. Even more, with those 35 reactions, we could investigate up to 16 (!) factors in a Resolution IV Design. Now consider that our parallel synthesizer could in principle carry out 64 reactions at once in its current configuration. Performing an initial investigation of 16 factors could be done in a single day,2 with statistical significance, with multifactor correlations, including fancy plots and graphs.

2Read more about the current status of our parallel synthesizer project to see how realistic that assumption really is!

For now, this is where we will leave you, as we feel that this is not the right place to go into more detail. If you are interested in more, please feel free to join us at our meetings (to be resumed as soon as this whole pandemic thing is under control, you know) or swing by at our labs!

Oh, and one more thing: This text was written by people who «grew up» with the traditional trial-and-error/OFAT way of thinking. As with all aspects of this project, we struggled, googled for solutions, and found this one. In all fairness, all criticism directed at other people’s work applies just as well to our own past contributions. We feel that the Design of Experiments approach could drastically alter the way we think about and approach chemical reactions. Industry of all fields have been happily applying it for a long time, and we hope we can be part of spreading the knowledge and excitement in our own field and institute as well.