A chemical system at equilibrium is described by this equation:
$$
D_{tot} = D + D \cdot \frac {P \cdot X}{D+K_P} + D \cdot \frac {E}{D+K_E}
$$
The only parameters that the experimenter can control (by setting up the experiment with different concentrations of certain substances) are $X$ and $D_{tot}$.
$X$ is a real number in $(0,1]$ and $D_{tot}$ is a positive real number.
Usually, $X$ is varied, whereas $D_{tot}$ is kept constant, but it's possible to vary $D_{tot}$ if needed.
$P, K_P, E, K_E$ are positive real constants whose values are not known.
The main goal is estimating $P$ and $K_P$.
The experimenter can't measure $D$ directly. However, by appropriate sampling of the system, a quantity $f_u$ can be measured, for which it is known that:
$$\frac 1 {f_u} = {1+\frac {P \cdot X}{D+K_P}} $$ The 'usual' approach is to assume (quite arbitrarily) that $E \approx 0$, which also implies $D = D_{tot} \cdot f_u$, thus:
$$D_{tot} \approx D_{tot} \cdot f_u + D_{tot} \cdot f_u \cdot \frac {P \cdot X}{D_{tot} \cdot f_u+K_P} $$
(...)
$$\frac X {1 - f_u} \approx \frac {D_{tot}} P + \frac 1 {f_u} \cdot \frac {K_P}{P} $$
Auxiliary variables are defined:
$$y = \frac X {1 - f_u}$$ $$x = \frac 1 {f_u}$$
thus:
$$y \approx \frac {D_{tot}} P + x \cdot \frac {K_P}{P} $$
By linear regression, using data where $X$ was varied and the corresponding values of $f_u$ were measured, the slope and intercept of this equation are found, and knowing the experimental value of $D_{tot}$ in theory one can obtain the desired parameter estimates.
Example (in R):
HSA_data <- data.frame(X=c(1, 0.4, 0.1, 0.01), fu=c(0.003,
0.011, 0.028, 0.224))
HSA_data["x"] <- with(HSA_data, 1/fu)
HSA_data["y"] <- with(HSA_data, X/(1-fu))
HSA_lm <- lm(y~x, HSA_data)
summary(HSA_lm)
Call:
lm(formula = y ~ x, data = HSA_data)
Residuals:
1 2 3 4
-0.02211 0.09838 -0.03948 -0.03679
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.036432 0.054364 0.670 0.572
x 0.002966 0.000313 9.476 0.011 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.08087 on 2 degrees of freedom
Multiple R-squared: 0.9782, Adjusted R-squared: 0.9673
F-statistic: 89.8 on 1 and 2 DF, p-value: 0.01095
My first question is: do you think that this approach is valid, given that 'linearizing' nonlinear equations has long been criticized?
[BTW, while the slope is generally OK, the intercept is often negative, which is nonsensical given the theory from which the equation is derived. That for me points to a fundamental problem with this approach.]
The second, even more important question is: given that the assumption $E \approx 0$ is really arbitrary and often wrong, do you think the data in the above example could be somehow used with the original equation, i.e. without making this assumption?
I can see that there would be 4 parameters to estimate with only 4 data points.
Would you suggest collecting more data? Manipulating the equation in some clever way, considering that $f_u$ is independent from $E, K_E$? Any other suggestions?