4

I have a model where the volume ($V$) of a finger is normally distributed, with mean $\mu = \beta_0 L^{\beta_1}D^{\beta_2}$ (where $L=$ length, $D=$ diameter and $\beta_i \in \Bbb R$ for $i=0,1,2$) and some variance $\sigma^2$.

I was thinking that if this is a GLM (although I am not sure if we have an exponential family here), a log link function would be appropriate since $\ln (\mu)$ is a linear function of the natural log of the predictors: $$\ln (\mu) = \ln(\beta_0 L^{\beta_1}D^{\beta_2}) = \ln(\beta_0) + \beta_1\ln(L) + \beta_2\ln(D) $$

However, a Poisson distribution is not ideal since $V, L, D \in \Bbb R$ are not counts.

So, I am not sure if I should proceed with this, or if another link function would be more appropriate. Any suggestions would be appreciated.

Amara
  • 41
  • 2
  • I'm confused with you mentioning Poisson GLM, do you mean that $\log V$ is normally distributed or $V$ itself? – Tim Sep 17 '21 at 08:00
  • 1
    @Tim I mean that $V$ itself is normally distributed. – Amara Sep 17 '21 at 08:06
  • 2
    What you suggest is a glm that can be fitted with `glm(V ~ log(L) + log(D), family=gaussian(link="log"))`. But I would expect `lm(log(V) ~ log(L) + log(D))` (assuming that $V$ conditional on $L$ and $D$ follows a lognormal distribution) to give a better fit to the data. This is also (for good reasons) how most allometric data such as this is analysed, see https://en.wikipedia.org/wiki/Allometry – Jarle Tufto Sep 17 '21 at 10:32
  • 1
    @JarleTufto Just to clarify, are you saying that my model $V$ ~ $N(\beta_0 L^{\beta_1}D^{\beta_2}, \sigma ^2)$ is fitted with `glm(V ~ log(L) + log(D), family=gaussian(link="log"))`, but you think it would be better fit by `lm(log(V) ~ log(L) + log(D))`? – Amara Sep 17 '21 at 10:46
  • @Manuel Yes, that's what I'm saying. – Jarle Tufto Sep 17 '21 at 10:48
  • @Jarle The model as described in the question is of the form $V\sim\mathcal{N}(\beta_0L^{\beta_1}D^{\beta_2},\sigma^2).$ (The $\sigma^2$ is implicit in the statement that $V$ is normally distributed.) Taking logarithms will not produce the second model you write. This is a nonlinear regression with (conditionally) Normal response, which is your first model (but not the second). One can decide between them by viewing a spread-vs-level plot of the residuals in each case. – whuber Sep 17 '21 at 13:26
  • @whuber How do you know for sure it is a GLM? Can this be shown be proving that the probability density function of $N(\beta_0 L^{\beta_1}D^{\beta_2},\sigma ^2)$ is a part of the exponential family (specifically the one-parameter exponential family as we have a known mean and unknown variance)? – Amara Sep 17 '21 at 13:46
  • It is the first model expressed by @Jarle. – whuber Sep 17 '21 at 14:29
  • @JarleTufto If I define the model as `glm(V ~ 0 + log(L) + log(D), family=gaussian(link="log"))` would it still satisfy $V$ ~ $N(\beta_0 L^{\beta_1} D^{\beta_2},\sigma^2)$ ? I was just thinking that an intercept term other than 0 doesn't really make sense - a finger with zero length and diameter would also have zero volume. – Amara Sep 17 '21 at 15:28
  • @Manuel No, that would imply that what you call $\beta_0=1$ so that is not something you would want to do. You already achieve zero expected volume for for zero length and diameter (or as their log goes to minus infinity) for both model alternatives. – Jarle Tufto Sep 17 '21 at 15:34
  • 1
    @whuber Yes, I fully agree that these are different models (and not different ways of fitting the same model as perhaps implied by @Manuel). – Jarle Tufto Sep 17 '21 at 15:38
  • 2
    Manuel, the question expressed in your latest comment about whether to include an intercept is answered at https://stats.stackexchange.com/questions/7948. The short answer is that you ought to include it. In fact, in what way would a nonzero intercept make no sense? In your model its exponential is a multiplicative factor in $V$ and so it would *have* to vary if you re-expressed $V$ with different units of measurement. That suggests including an intercept is necessary. – whuber Sep 17 '21 at 15:58
  • 1
    @whuber Upon further consideration, I realised that a nonzero intercept does make sense. Thank you for clarifying this further. – Amara Sep 17 '21 at 16:18
  • @JarleTufto I am curious about your recommendation of a different model. Does `lm(log(V) ~ log(L) + log(D))` still incorporate an expected value of $\beta_0 L^{\beta_1}D^{\beta_2}$ ? – Amara Sep 17 '21 at 16:24
  • @Manuel Under the alternative model $V$ is lognormal so the expected value would be $E V=e^{\mu+\sigma^2/2}=\beta_0 L^{\beta_1}D^{\beta_2}e^{\sigma^2/2}$ so yes, you are correct except for the additional factor $e^{\sigma^2/2}$ having to do with the skew of the log-normal. There is a long (and rather tiresome) discussion about these issues in the literature, see https://scholar.google.com/scholar?&q=packard+allometry. – Jarle Tufto Sep 17 '21 at 16:35
  • @JarleTufto Do you want to summarize your comments in an answer? I think that would be a great answer to the question. – COOLSerdash Sep 18 '21 at 14:53

1 Answers1

3

I have a model where the volume ($V$) of a finger is normally distributed, with mean $\mu = \beta_0 L^{\beta_1}D^{\beta_2}$

You can rewrite this as

a model where the volume ($V$) of a finger is normally distributed, with mean $$\mu = \exp \left( \beta_0^\prime + \beta_1 L^\prime + \beta_2 D^\prime \right)$$ where $L^\prime = \log(L)$, $D^\prime = \log(D)$ , $\beta_0^\prime = \log(\beta_0)$

So the mean can be expressed as a linear function of the independent variables $\beta_0^\prime + \beta_1 L^\prime + \beta_2 D^\prime $ wrapped inside a non-linear function.

That classifies as a GLM. (provided that the deviation parameter $\sigma$ is independent from $L$ and $D$, ie. constant)

although I am not sure if we have an exponential family here

The normal distribution is in the exponential family.


Instead of the Poisson distribution you can use other distributions. The log link does not restrict this. Jarno's comment shows how you can do it in R glm(V ~ log(L) + log(D), family=gaussian(link="log"))

See also What is the objective function to optimize in glm with gaussian and poisson family?

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161