6

How can you derive formula and regression coefficients for a regression model of a form $y(x)= A + B\, x + C\, \cos (2 \pi x) + D\, \sin (2 \pi x)$? I know that there are automatic tools who can do that if I provide the data but I need a formula and a procedure. Thank you in advance.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Kata Rina
  • 61
  • 1
  • 1
  • 3
  • possible duplicate of [Fit a sinusoidal term to data](http://stats.stackexchange.com/questions/60994/fit-a-sinusoidal-term-to-data) – Sycorax Jul 29 '15 at 22:06
  • @user777 While it could be argued to be answered there (I considered pointing to that one before responding to this), I think perhaps the focus of the question there is different enough that someone looking for a simple answer to how to do regression on transformed predictors could miss that point there. – Glen_b Jul 29 '15 at 22:09
  • @user777 Not only is it worth having the link, it's a good question to raise in any case. People may well disagree with me. The question may close via the normal vote. – Glen_b Jul 29 '15 at 22:25
  • @user777 I had generalized the title way from specific reference to trig transformations quite deliberately, in order that we didn't need to deal with every transformation future readers might come up with individually; that's why my answer here discusses the more general case. Now the question is back to being just about trig transformations. – Glen_b Jul 29 '15 at 22:27
  • @Glen_b The last thing I want to do is get into an edit war, but I do note that the question body is expressly about trigonometric transformations. Perhaps an extension of the question body to match a more expansive title is in order. IMHO, the title should summarize the question content, but because both can be calibrated to better match each other, this is easily fixed. – Sycorax Jul 29 '15 at 22:32
  • @user777 I think that having the question body about a specific transformation is fine with the more general title (my aim was to catch people who search for things like "transformed x"); the advantage of the more specific question body is that some people are happier while things are concrete - so I'd rather leave the Q. body concrete, but this one could still serve as a question for which others would be duplicates. Like you I hesitate to edit again, at least until we agree on the best course. – Glen_b Jul 29 '15 at 22:48
  • 1
    @Glen_b So as not to encumber this question with our comments, I've created a meta post for further discussion. http://meta.stats.stackexchange.com/questions/2630/what-is-the-proper-balance-between-question-title-specificity – Sycorax Jul 29 '15 at 23:05
  • I am sorry if I caused any trouble with my title, this is the first time I am asking a question. I took a look at the question '' Fit a sinusoidal term to a data '' before posting my question but it didn't satisfy what I need because it has only trigonometric elements unlike my curve so I don't think it's a duplicate. In any case, thank you both for offering help. – Kata Rina Aug 01 '15 at 14:51
  • Years later, yes, but you might be interested in a recent answer of mine: https://stats.stackexchange.com/a/477773/247274. – Dave Jul 20 '20 at 16:32

2 Answers2

10

You simply compute $x_c=\cos(2\pi x)$ and $x_s=\sin(2\pi x)$ and perform a plain multiple linear regression of $y$ on $x, x_c,$ and $x_s$.

That is you supply the original $x$ and the two calculated predictors as if you had three independent variables for your regression, so your now-linear model is:

$$Y = \alpha + \beta x +\gamma x_c + \delta x_s+\varepsilon$$

This same idea applies to any transformation of the predictors. You can fit a regression of the form $y = \beta_0 + \beta_1 s_1(x_1) + \beta_2 s_2(x_2) +...+ \beta_k s_k(x_k)+\varepsilon$ for transformations $s_1$, ... $s_k$ by supplying $s_0(x_1), s_2(x_2), ..., s_k(x_k)$ as predictors.

So for example, $y = \beta_0 + \beta_1 \log(x_1) + \beta_2 \exp(x_1) + \beta_3 (x_2\log x_2) + \beta_4 \sqrt{x_3x_4} +\varepsilon$ would be fitted by supplying $\log(x_1),$ $\exp(x_1),$ $(x_2\log x_2),$ and $\sqrt{x_3x_4}$ as predictors (IVs) to linear regression software.

The regression is just fitted as normal to the new set of predictors and the coefficients are those for the original equation.

See, for example the answer here: regression that creates $x\log(x)$ functions, which details a different specific example.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Thank you for answering but unfortunately I still don't know how to solve it.This means that I have to use the least squares method with three predictors instead of one? In the link that you provided the regression curve is calculated with a matlab built in function so I can't see the derivation of the formulas for the regression coefficients and if it is more simple than without transforming the predictors. – Kata Rina Aug 01 '15 at 14:49
  • The answer to your specific problem is in the first sentence above. [The rest of my answer deals with the general case of any transformed predictors. You can safely ignore the rest if you wish.] So yes, exactly as it explicitly states in that first sentence, you supply those 3 predictors to the regression. What are you using to compute your regression? – Glen_b Aug 02 '15 at 00:51
0

You can find list of methods used for solving of linear regression problems in this article from Do Q Lee:

Numerically efficient methods for solving Least-Squares problems

Most commonly used methods for these kind of problems are:

  1. Normal equations method using Cholesky factorization. It is the fastest method but numerically unstable. Normal equations is basically system of linear equations. You get this system by computing partial derivations using every predictor and setting this partial derivation to zero. This corresponds to finding global minimum of error term.

  2. QR factorization. More accurate and broadly applicable, but may fail when matrix of linear system of equations is nearly rank-deficient.

  3. Singular value decomposition. It is expensive to compute, but is numerically stable and can handle rank deficiency. You can use tool like Matlab to compute SVD of choosen matrix. If you are deploying customized solution you can use software package like LAPACK or its Intel clone which is heavily optimised using x86 assembler and from september 2015 completely free for everyone.

In all three cases you need to find a solution to system of linear equations. There are not analytical formulas for regression coefficient except for very simple cases like for example line fitting.

truthseeker
  • 231
  • 1
  • 6