Are log-log models the same as lognormal models?

Question

I have a dataset that I want to fit according to

$$\log(y) = a + b_1\log(x_1) + b_2\log(x_2) +\cdots + b_k\log(x_k).$$

My statistical package has options to do a linear regression and lognormal. I am not sure which one I should choose.

This is not exactly your question, but this thread: [interpretation-of-log-transformed-predictor](http://stats.stackexchange.com/questions/18480/) may be helpful in thinking about these issues. — gung - Reinstate Monica, Mar 29 '13 at 16:25
Which package? Otherwise we'd just be guessing what the lognormal one is doing with the x-variables. — Glen_b, Mar 30 '13 at 01:12

gung - Reinstate Monica · Accepted Answer · 2013-03-30T06:08:21.150

4

Probably your best bet is just to form two new variables:

ly = log(y)
lx = log(x)

Then you can use those with a regular linear regression.

edited Mar 30 '13 at 06:08

answered Mar 29 '13 at 16:27

gung - Reinstate Monica

132,789
81
357
650

So I should avoid the lognormal function? – MultiplyImputed Mar 29 '13 at 16:31
That might be best. – gung - Reinstate Monica Mar 29 '13 at 18:07
Thanks, gung. However, if I have a lot of independent variables, it can be tedious to log transform each variable before I run the set through the statistical package. – MultiplyImputed Mar 29 '13 at 19:12
3

I don't see why. – gung - Reinstate Monica Mar 29 '13 at 19:13
Consider the following (in R): `N = 30;` `x = matrix(runif(N*5), ncol=5);` `y = runif(N);` `X = cbind(y,x);` `lX = apply(X, 2, log);` (Note that this would scale up to any number of columns.) – gung - Reinstate Monica Mar 29 '13 at 21:06

score 0 · Answer 2 · edited Mar 29 '13 at 18:03

0

Your original model will be non-linear.

$y = cx^b $ $(1)$

If you take the natural log on both sides: $\ln(y) = \ln(c) + b*\ln(x)$ $(2)$

So, in your model: $\ln(c)=a$

You can run equation 1 with lognormal [actually, it should be log linear] [no transformation of variables needed] or you can run equation 2 with linear regression. To implement later, you need to log transform the x and y variables as mentioned by @gung, i.e. $ly=\ln(y)$ and $lx=\ln(x)$ where $lx$ and $ly$ are the new variables created from $x$ and $y$.

Note that you can't run log-normal or log linear if either your $x$ or $y$ has negative values.

edited Mar 29 '13 at 18:03

dimitriy

31,081
5
63
138

answered Mar 29 '13 at 18:01

Metrics

2,526
2
19
31

Thanks. Is lognormal a generalized type of model that includes log-linear? Or is log-linear a type of logit regression? – MultiplyImputed Mar 29 '13 at 19:40
I am not sure whether there is a term `lognormal model`. I just know that there exists the term `lognormal distribution`. if $x$ is a normally distributed then $logx$ is log normally distributed: a logit model is a case where your outcome takes a value of 1 or 0. I assume that your outcome is a continuous variable. – Metrics Mar 29 '13 at 21:12
@user1493368 You have that backwards: if $x$ is normally distributed, then $e^x$ is log-normally distributed, since $\ln(e^x) = x$ is normal. $\ln x$ for normal $x$ is ill-defined, since you can't take the log of $x \le 0$, which is true with positive probability under the normal model. – Danica Mar 30 '13 at 00:17

EngrStudent · Answer 3 · 2013-04-02T16:37:16.407

-2

Transformation of coordinates

Initial form (linear) ln(y) = a + b*ln(x)

with algebra this becomes power y = exp(a)*x^b = A*x^b

So don't choose lognormal, choose to do a linear fit on transformed coordinates.

Some general "good practices"

If you aren't sure about your data, or your method then stick to the road "more traveled", the tried and true.
If you are going to spend money on a result of an analysis, spend good quality time making sure the quality of the result is compatible with the value of money it is going to inform.

There is a lot of "real world" data that in theory fits either a linear analytic form, but in practice this nearly never happens. Things are almost always more complex. The high value things are always more complex.

EDIT: whuber is right. I am expressing this in Engineering terms. Implicit in my notation is that all the expressions are y_approximation where:

y_true = y_approximation + error

Statistics folks consider it rigor to append an epsilon and explicitly indicate that there is error in the expression. The variable they often use to indicate the error is epsilon.

edited Apr 02 '13 at 16:37

answered Mar 29 '13 at 18:36

EngrStudent

8,232
2
29
82

4

Your algebra overlooks the terms that are truly crucial to a careful analysis of this problem: the "errors." As such I think this answer misses a key point. – whuber Mar 29 '13 at 19:10
1

Re the edit: Including the errors explicitly is not merely a matter of convention, taste, or technical rigor. The reason you need to write them down is that doing so will clearly show where you have made an algebraic mistake in your reasoning. (The transformed model does not have the additive error structure you claim it does.) – whuber Apr 02 '13 at 17:59
I am suggesting that when I say " y = A*x^b" what I mean is "y_approximation = A*x^b". I am exactly correct when saying what I meant, and in fact am the best person to say what it was that I meant. This is the difference between "Implicit" and "Explicit" because I am saying what the norm was that I was following. Would you mind explicitly articulating what would be required for an answer that did not miss your key point? – EngrStudent Apr 04 '13 at 23:59
1

"Mistake in your reasoning" did not refer to the faithfulness with which you expressed what you meant, but rather to the incorrectness of your answer. Upon exponentiation, $\log(y)=a+b\log(x)+\varepsilon$ becomes $y=A x^b \exp(\varepsilon)$ which does *not* have the structure of "$y = A x^b + \text{error}$." I hope it is clear to you how the two models differ. The "key point" to which I referred is this awareness that statistical models, by their vary nature, must represent variability, and that ignoring the variability leads to mistakes, paradoxes, and errors of all sorts. – whuber Apr 05 '13 at 12:34
Thank you. This is a good thought. The error is not in the domain, but the range. Typically the x-values are known nearly perfectly. The values known/measured are the "x" and "y" and not the "true model" if such a thing exists. Shouldn't the errors be applied as additive to them? log(y+epsilon) = a + b*log(x) – EngrStudent Apr 05 '13 at 13:28
I'm not sure I follow your last comment, but it does seem to be picking up what I believe is the crux of the matter: whether it is more appropriate (in this instance) to model the errors as additive or multiplicative (or perhaps something else). But please note that a model of the form "$\log(y+\varepsilon)=\ldots$" is a difficult one to fit or to interpret; one certainly wouldn't use standard regression techniques to fit it in this form. (Among other things, note that the expectation of $\log(y+\varepsilon)$ will not equal the expectation of $\log(y)$.) – whuber Apr 05 '13 at 13:35
When I measure temperature vs. time, my computer-driven DA converter has the time accurate to nanoseconds. The error in measurement of time is very small. The temperature measurement is much larger. My true model that uses the time (aka domain) to predict the temperature (aka range) is going to have its overall error driven by Temperature measurement, not time measurement. Btw: how do you use LaTeX or whatever in comments, questions and answers? I feel like my notation is stuck in ascii. – EngrStudent Apr 05 '13 at 18:43
With additive error in temperature measurements, you would wish to model temperature ($y$) versus time ($x$) in the form $y=f(x;\beta)+\varepsilon$. That's perfectly standard. If you think error is multiplicative you might posit $y=f(x)\varepsilon$ and transform that to $\log(y)=\log(f(x;\beta))+\delta$ where $\delta=\log(\varepsilon)$. When $\delta$ is approximately normal, then $\varepsilon$ is (by definition) *lognormal*. Even more generally you might posit $y\sim F(x;\theta)$ for a parametric family of zero-mean distributions $F$. (Enclose $\TeX$ in dollar signs, exactly as in Q's & A's.) – whuber Apr 05 '13 at 18:50
There is uncertainty in the measure of the temperature. $T=\hat{T}+\delta T$. There is uncertainty in the measure of time. $t=\hat{t}+\delta t$. Although I use an exponential form there can be convection, losses, other true physics going on that isn't accounted for in my model so the model itself is going to have an error term that is different than the measurements. $\hat{T}=F\left (t,\theta \right ) + \delta F $. So my true system is $\hat{T}+\delta T=F\left (t+\delta t,\theta \right)+\delta F \left ( t+\delta t,\theta\right)$. Question: How to handle not knowing "true" model? – EngrStudent Apr 05 '13 at 20:00
For a start, see the references on this site to "Deming regression" and "errors in variables" models. (They don't really apply to your example, though, because as a practical matter your temporal uncertainty is of no consequence.) Also investigate threads tagged [tag:model-selection] and [tag:model-comparison]. – whuber Apr 05 '13 at 20:22

Are log-log models the same as lognormal models?

3 Answers3