2

I have a dependent variable C and an independent variable VPT. VPT is the average volume per tree (in cubic foot) of a timber stand. C ist costs resulting from a cost function, that calculates the harvest costs per cubic foot (in $).

The plot between the two looks like this: enter image description here

The abrupt change at roughly 28 cubic feet results from interaction within the cost function. I want to create a regression model for the two. Based on the graph I assume it must be something like this:

fit <- lm(C ~ I(VPT^(-1)))

Is there a way to determine what the best fit is? Do I just have to try around until I get something that fits, or is there a more efficient way?


The cost function takes in four parameters:

  • Volume per tree (VPT)
  • Trees per acre (TPA)
  • Skidding Distance (SD)
  • Slope (S)

Ultimately I want to create a regression that represents the cost function. Something like this:

fit <- lm(C ~ ß1xS + ß2xSD + ß3xTPA + ß4xVPT + ß5)
ustroetz
  • 741
  • 1
  • 8
  • 14
  • 2
    This ends up being a FAQ, see [In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?](http://stats.stackexchange.com/a/3530/1036). Although it is asked about the log transformation instead of the reciprocal, much of the advice applies the same. I've also added the [tag:data-transformation] tag, see the highest voted questions in that tag. – Andy W Apr 13 '14 at 12:51
  • 1
    These data exhibit some special characteristics: the response appears to be bounded below by some function of VPT and there is an abrupt change in variability near VPT=28. Merely transforming the response will not address these important features. If you would edit the question to describe the data (and what they represent and how they were collected) and explain *why* you want to create a regression model, you might get answers that are ultimately more helpful in achieving your aims. – whuber Apr 13 '14 at 15:03
  • 1
    While transformations won't tend to address those features, they may help to make them more obvious. That effect in the variability whuber notes may also be due to some other variable (though not necessarily one you have to hand); on a more nearly linear scale, two groups might 'stand out'. – Glen_b Apr 13 '14 at 17:09
  • @whuber: I updated my question regarding your comment. – ustroetz Apr 16 '14 at 13:21
  • Thanks; that new information is helpful. It immediately suggests you should be regressing cost, rather than cost per cubic foot, against the other four variables. After you have done that you can compute cost per cubic foot. Any further advice would require an entire textbook in regression analysis, touching on aspects that have been thoroughly covered in thousands of threads on our site. So that we don't just go over all that old ground again, please try to edit your question to focus on some new issue specific to your data. – whuber Apr 16 '14 at 14:32
  • What do you mean with `regressing cost`? – ustroetz Apr 16 '14 at 15:22

0 Answers0