Estimation of exponential model

Question

An exponential model is a model described by following equation:
$$\hat{y_{i}}=\beta_{0}\cdot e^{\beta_{1}x_{1i}+\ldots+\beta_{k}x_{ki}}$$

The most common approach used to estimate such model is linearization, which can be done easily by calculating logarithms of both sides. What are the other approaches? I'm especially interested in those which can handle $y_{i}=0$ in some observations.

Update 31.01.2011
I'm aware of the fact that this model can't produce zero. I'll elaborate a bit what I'm modeling and why I choose this model. Let's say we want to predict how much money does a client spend in a shop. Of course many clients are just looking and they don't buy anything, that why there are 0. I didn't want to use linear model because it produces a lot of negative values, which doesn't make any sense. The other reason is that this model works really good, much better than the linear. I've used genetic algorithm to estimate those parameters so it wasn't 'scientific' approach. Now I'd like to know how to deal with problem using more scientific methods. It can be also assumed that most, or even all, of the variables are binary variables.

if there are zeroes in your data, exponential regression might not be appropriate, since the model as you stated it cannot allow zero values to be observed. — mpiktas, Jan 30 '11 at 14:00

whuber · Accepted Answer · 2011-01-31T14:48:08.047

There are several issues here.

(1) The model needs to be explicitly probabilistic. In almost all cases there will be no set of parameters for which the lhs matches the rhs for all your data: there will be residuals. You need to make assumptions about those residuals. Do you expect them to be zero on the average? To be symmetrically distributed? To be approximately normally distributed?

Here are two models that agree with the one specified but allow drastically different residual behavior (and therefore will typically result in different parameter estimates). You can vary these models by varying assumptions about the joint distribution of the $\epsilon_{i}$:

$$\text{A:}\ y_{i} =\beta_{0} \exp{\left(\beta_{1}x_{1i}+\ldots+\beta_{k}x_{ki} + \epsilon_{i}\right)}$$ $$\text{B:}\ y_{i} =\beta_{0} \exp{\left(\beta_{1}x_{1i}+\ldots+\beta_{k}x_{ki}\right)} + \epsilon_{i}.$$

(Note that these are models for the data $y_i$; there usually is no such thing as an estimated data value $\hat{y_i}$.)

(2) The need to handle zero values for the y's implies the stated model (A) is both wrong and inadequate, because it cannot produce a zero value no matter what the random error equals. The second model above (B) allows for zero (or even negative) values of y's. However, one should not choose a model solely on such a basis. To reiterate #1: it is important to model the errors reasonably well.

(3) Linearization changes the model. Typically, it results in models like (A) but not like (B). It is used by people who have analyzed their data enough to know this change will not appreciably affect the parameter estimates and by people who are ignorant of what is happening. (It is hard, many times, to tell the difference.)

(4) A common way to handle the possibility of a zero value is to propose that $y$ (or some re-expression thereof, such as the square root) has a strictly positive chance of equally zero. Mathematically, we are mixing a point mass (a "delta function") in with some other distribution. These models look like this:

$$\eqalign{ f(y_i) &\sim F(\mathbf{\theta}); \cr \theta_j &= \beta_{j0} + \beta_{j1} x_{1i} + \cdots + \beta_{jk} x_{ki} }$$

where $\Pr_{F_\theta}[f(Y) = 0] = \theta_{j+1} \gt 0$ is one of the parameters implicit in the vector $\mathbf{\theta}$, $F$ is some family of distributions parameterized by $\theta_1, \ldots, \theta_j$, and $f$ is the reexpression of the $y$'s (the "link" function of a generalized linear model: see onestop's reply). (Of course, then, $\Pr_{F_\theta}[f(Y) \le t]$ = $(1 - \theta_{j+1})F_\theta(t)$ when $t \ne 0$.) Examples are the zero-inflated Poisson and Negative Binomial models.

(5) The issues of constructing a model and fitting it are related but different. As a simple example, even an ordinary regression model $Y = \beta_0 + \beta_1 X + \epsilon$ can be fit in many ways by means of least squares (which gives the same parameter estimates as Maximum Likelihood and almost the same standard errors), iteratively reweighted least squares, various other forms of "robust least squares," etc. The choice of fitting is often based on convenience, expedience (e.g., availability of software), familiarity, habit, or convention, but at least some thought should be given to what is appropriate for the assumed distribution of the error terms $\epsilon_i$, to what the loss function for the problem might reasonably be, and to the possibility of exploiting additional information (such as a prior distribution for the parameters).

onestop · Answer 2 · 2011-01-30T12:50:51.010

10

This is a generalized linear model (GLM) with a log link function.

Any probability distribution on $[0,\infty)$ with non-zero density at zero will handle $y_i=0$ in some observations; the most common would be the Poisson distribution, resulting in Poisson regression, a.k.a. log-linear modelling. Another choice would be a negative binomial distribution.

If you don't have count data, or if $y_i$ takes non-integer values, you can still use the framework of generalized linear models without fully specifying a distribution for $\operatorname{P}(y_i|\bf{x})$ but instead only specifying the relationship between its mean and variance using quasi-likelihood.

edited Jan 30 '11 at 12:50

answered Jan 30 '11 at 12:30

onestop

16,816
2
53
83

Shame I havent been taught about it at the university :/ It looks that is will be helpful in this case, but I need some time to get deep into details. Thanks! – Tomek Tarczynski Jan 31 '11 at 09:53
Note that $y_i$ can always be rescaled to integer values when it is rational, eg measure pence/cents rather than pounds/dollars. Though you may want to round to the nearest pound/dollar anyway since the distribution of the pence/cents part of the price of goods will likely be very uneven (ie mostly 99). – James Jan 31 '11 at 11:13

score 3 · Answer 3 · answered Jan 30 '11 at 14:04

3

You can always use non-linear least squares. Then your model will be:

$$y_i=\beta_0\exp(\beta_1x_{1i}+...+\beta_kx_{ki})+\varepsilon_i$$

The zeroes in $y_i$ then will be treated as deviations from the non-linear trend.

answered Jan 30 '11 at 14:04

mpiktas

33,140
5
82
138

What about initial values of the parameters? What is good way to choose them? As I stated in an update it may be assumed that there are no continuous variables. – Tomek Tarczynski Jan 31 '11 at 10:02
@Tomek, I think there is no one good way of choosing them. Usually it depends on the data. I suggest mean for the intercept and zero for other coefficients. – mpiktas Jan 31 '11 at 13:12

Estimation of exponential model

3 Answers3

Linked