Can I estimate the optimal proportions of ingredients in a blend?

Question

I'm trying to find the optimal blend of 4 ingredients, to maximize liking of a beverage.

I can write a function for liking such as:

Liking is on a scale of 0 - 1 (0% - 100%).

I conducted 10 taste tests to gather data, then estimated a simple regression (this is in R):

# Data on blends tested in taste tests
X    <- rbind( c(.93,   .01,   0.03, .03 ),
               c(.92,   .02,   0.03, .03 ),
               c(.94,   .00,   0.03, .03 ),
               c(.94,   .02,   0.02, .02 ),
               c(.91,   .03,   0.03, .03 ),
               c(.97,   .01,   0.01, .01 ),
               c(1,     .00,   0.00, .00 ),
               c(.92,   .01,   0.03, .04 ),
               c(1,     .00,   0.00, .00 ),
               c(.87,   .05,   0.04, .04 )
               )
# Liking results from taste tests
Y <- c(0.841217,
        0.841213,
        0.84121,
        0.84121,
        0.841204,
        0.841201,
        0.8412,
        0.841187,
        0.841187,
        0.841172)

# Regression analysis (no intercept model)
glm(Y~X-1)

The results of the regression analysis were:

Call:  glm(formula = Y ~ X - 1)

Coefficients:
    X1      X2      X3      X4  
0.8412  0.8404  0.8445  0.8386  

Degrees of Freedom: 10 Total (i.e. Null);  6 Residual
Null Deviance:      7.076 
Residual Deviance: 8.169e-10    AIC: -193.9

So, this gives me some insight as to the relative weight of each ingredient in determining liking, but it doesn't directly tell me which levels of the ingredients I should set to maximize liking. Of course, I could specify this model differently, such as using the binomial distribution or testing interactions, or worry about my small sample size, but I'm trying to keep it simple for now and focus on finding the right methodological framework for what I'm trying to do.

I was thinking that I could set up some sort of optimization function to determine the best levels (ingredient proportions) based on:

The weights from the regression
The constraint that the sum of the ingredients' proportions adds up to 1 (i.e. 100%)
An objective function that specifies maximization of liking
A constraint that liking is between 0 and 1

However, it's not clear to me how I can setup such an optimization model.

Am I heading in a reasonable direction? Is optimization the appropriate tool and, if so, how can such an optimization be setup (mapping the examples I found in various optimization packages to this use case was a point of confusion for me)?

Or, is it possible to determine the optimal levels directly from the regression results?

Using a no-intercept model is questionable. What results do you get *with* an intercept? If the intercept term is sizable and significant, you ought to include it. It is very strange that all your responses are equal to within three digits. I suppose we need to understand you are not actually conducting "taste tests," but only using them as a metaphor, and that differences in responses in the fourth decimal place are important to you. — whuber, Mar 24 '16 at 18:23
@whuber Yes, this was just made up data and I suspect that the close results for each is because I made up dependent variable values that were pretty consistent across blends (probably a bad example). In the past I have had this question with actual taste test data, but indeed it was a metaphor for something else when I asked this. I didn't include an intercept for this example because it didn't seem to make good conceptual sense, but it's certainly situational. — Hack-R, Nov 30 '17 at 18:04

score 2 · Accepted Answer · answered Mar 24 '16 at 18:43

Mathematical optimization is a tool that is often used to optimize a decision given some objective function and some constraints.

In your case, the decision variables would be the amount of each ingredient (we'll call them $X_1$, $X_2$, $X_3$, and $X_4$), and the optimization would take the form of a linear program:

\begin{align*} \max_{X_1,X_2,X_3,X_4} &~ ~~0.8412X_1 + 0.8404X_2 + 0.8445X_3 + 0.8386X_4 \\ s.t. &~ ~~X_1+X_2+X_3+X_4 \leq 1 \\ &~ ~~0.8412X_1 + 0.8404X_2 + 0.8445X_3 + 0.8386X_4 \leq 1 \\ &~ ~~X_1, X_2, X_3, X_4 \geq 0 \\ \end{align*}

If you had your coefficients from your model in a vector coef in R, this could be implemented with the lpSolve package as:

library(lpSolve)
coef <- c(0.8412, 0.8404, 0.8445, 0.8386)
mod <- lp(direction = "max",
          objective.in = coef,
          const.mat = rbind(rep(1, length(coef)), coef),
          const.dir = c("<=", "<="),
          const.rhs = c(1, 1))
mod$solution
# [1] 0 0 1 0

Rather unsurprisingly in this case, the optimization model simply selects 100% of the ingredient with the maximum coefficient.

To summarize, mathematical optimization could be used to identify ingredient proportions yielding the highest predicted value, but with the current simple constraints you could probably solve the optimization problem of interest by inspection.

Thanks! That helps a lot. 2 questions -- 1. (very minor) shouldn't it be maximizing Y (liking) instead of X1,X2,X3,X4? 2. (more important) if I don't believe that all 1 ingredient is really optimal based on contextual knowledge, how would I integrate this? by interaction effects somehow? — Hack-R, Mar 24 '16 at 18:47
@Hack-R $\max_{X_1,X_2,X_3,X_4} 0.8412X_1+0.8404X_2+0.8445X_3+0.8386X_4$ means we are changing the variables $X_1$, $X_2$, $X_3$, and $X_4$ to maximize the specified linear combination of those variables. — josliber, Mar 24 '16 at 18:49
And to your second question, yes, an interaction term could capture non-linear effects that might change the optimal solution. Please note that it is non-trivial to add the product of two decision variables (so-called "bilinear terms") to mathematical optimization formulations. — josliber, Mar 24 '16 at 18:50
OK great thanks. I may have to open another question related to that, but in fairness I think you answered the question as it was asked, so I will mark this as the solution. Thanks again! — Hack-R, Mar 24 '16 at 18:52

score 0 · Answer 2 · answered Mar 24 '16 at 18:39

0

You might want to check out Beta Regression, which is a form of GLM based on a Beta distribution. A similar question was answered here. The corresponding R package is betareg

answered Mar 24 '16 at 18:39

dmb

241
1
5

1

Thanks. Sure, I know betareg but I don't see how that gives me the optimal proportions of the ingredients? – Hack-R Mar 24 '16 at 18:46

score 0 · Answer 3 · answered Nov 30 '17 at 00:37

A late response, but the issue is interesting.

This problem can be easily transformed into one where the variables are not proportions. Rather than focusing on the proportions of the ingredients, you would focus on how much of each ingredient you add to a standard measure of vodka. That way, you variables become absolute measures that have no upper bound. The vodka variable is then omitted (it is redundant, as it would be 1 in all cases).

d <- cbind(Y, X[,-1] / X[,1])
colnames(d) <- c("Like", "peach", "peppermint", "juice")
d <- as.data.frame(d)

Now you can do a regression type analysis. With the model, you could use a combination of function predict() and optimization to find the best mix.

But unfortunately you have too little data to run a meaningful regression, so for now, continue mixing!

Can I estimate the optimal proportions of ingredients in a blend?

3 Answers3