can the parameters of this nonlinear equation be reliably estimated using a limited set of experimental data?

Question

A chemical system at equilibrium is described by this equation: $$ D_{tot} = D + D \cdot \frac {P \cdot X}{D+K_P} + D \cdot \frac {E}{D+K_E} $$ The only parameters that the experimenter can control (by setting up the experiment with different concentrations of certain substances) are $X$ and $D_{tot}$.
$X$ is a real number in $(0,1]$ and $D_{tot}$ is a positive real number.
Usually, $X$ is varied, whereas $D_{tot}$ is kept constant, but it's possible to vary $D_{tot}$ if needed. $P, K_P, E, K_E$ are positive real constants whose values are not known.

The main goal is estimating $P$ and $K_P$.

The experimenter can't measure $D$ directly. However, by appropriate sampling of the system, a quantity $f_u$ can be measured, for which it is known that:

$$\frac 1 {f_u} = {1+\frac {P \cdot X}{D+K_P}} $$ The 'usual' approach is to assume (quite arbitrarily) that $E \approx 0$, which also implies $D = D_{tot} \cdot f_u$, thus:

$$D_{tot} \approx D_{tot} \cdot f_u + D_{tot} \cdot f_u \cdot \frac {P \cdot X}{D_{tot} \cdot f_u+K_P} $$

(...)

$$\frac X {1 - f_u} \approx \frac {D_{tot}} P + \frac 1 {f_u} \cdot \frac {K_P}{P} $$

Auxiliary variables are defined:

$$y = \frac X {1 - f_u}$$ $$x = \frac 1 {f_u}$$

thus:

$$y \approx \frac {D_{tot}} P + x \cdot \frac {K_P}{P} $$

By linear regression, using data where $X$ was varied and the corresponding values of $f_u$ were measured, the slope and intercept of this equation are found, and knowing the experimental value of $D_{tot}$ in theory one can obtain the desired parameter estimates.

Example (in R):

    HSA_data <- data.frame(X=c(1, 0.4, 0.1, 0.01), fu=c(0.003, 
                           0.011, 0.028, 0.224))
    HSA_data["x"] <- with(HSA_data, 1/fu)
    HSA_data["y"] <- with(HSA_data, X/(1-fu))
    HSA_lm <- lm(y~x, HSA_data)
    summary(HSA_lm)

        Call:
    lm(formula = y ~ x, data = HSA_data)
    
    Residuals:
           1        2        3        4 
    -0.02211  0.09838 -0.03948 -0.03679 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)  
    (Intercept) 0.036432   0.054364   0.670    0.572  
    x           0.002966   0.000313   9.476    0.011 *
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.08087 on 2 degrees of freedom
    Multiple R-squared:  0.9782,    Adjusted R-squared:  0.9673 
    F-statistic:  89.8 on 1 and 2 DF,  p-value: 0.01095

My first question is: do you think that this approach is valid, given that 'linearizing' nonlinear equations has long been criticized?

[BTW, while the slope is generally OK, the intercept is often negative, which is nonsensical given the theory from which the equation is derived. That for me points to a fundamental problem with this approach.]

The second, even more important question is: given that the assumption $E \approx 0$ is really arbitrary and often wrong, do you think the data in the above example could be somehow used with the original equation, i.e. without making this assumption?
I can see that there would be 4 parameters to estimate with only 4 data points.
Would you suggest collecting more data? Manipulating the equation in some clever way, considering that $f_u$ is independent from $E, K_E$? Any other suggestions?

can the parameters of this nonlinear equation be reliably estimated using a limited set of experimental data?

0 Answers0