1

I have (x,y) data where x takes on the integers from 2 to 100 and y is a continuous variable. I want to smooth the data with a polynomial regression of y on x, but I know that varying x from 2 to 3 affects y much more than varying it from 99 to 100. I am thinking of trying various monotonic transformations of x, such as 1/x and log(x), and seeing for which transformation f(x) the polynomial regression of y onto f(x) has the smallest RMSE. Has this problem been studied?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Fortranner
  • 586
  • 2
  • 12
  • 1
    There are over 1600 questions on [data transformation](https://stats.stackexchange.com/questions/tagged/data-transformation) on this site. Is there some reason why you need to do polynomial regression, or might [spline](https://stats.stackexchange.com/questions/tagged/splines) modeling work better? Showing a plot of your data might help get an answer that is more on-point for your study. – EdM May 15 '19 at 16:12
  • Would you please post example data? I would like to run it through the "function finder" on my curve fitting web site zunzun.com to see what it might suggest in terms of candidate equations. – James Phillips May 15 '19 at 16:26
  • Yes, this has been studied extensively. The account I gave of transformations in regression at https://stats.stackexchange.com/a/4833/919 recommends first dealing with $y$ to make its response homoscedastic if possible. Then look at re-expressions of $x$ that might linearize its relationship with the response. – whuber May 15 '19 at 17:18
  • @Fortranner I'd highly recommend against using "function finders" - or in typical cases, even automated transformation-finders; it is better to come from an understanding of your variables. Throwing a gigantic laundry list of models at a set of data is rarely a suitable thing to do (if you're doing it on the same data you're fitting the final model to, you won't be doing meaningful inference on the results). If you have no good model, some kind of additive model or GAM is likely to be more useful. What are you trying to achieve with this modelling? – Glen_b May 16 '19 at 03:10

0 Answers0