Functional Forms of Independent Variables

Question

If our objective is to ascertain the relationship (specifically, sign and significance of Beta coefficient) between independent variables and dependent variable in an OLS regression (cross sectional or time series or panel regressions), should we actually care about an appropriate functional form of the independent variables? I have observed that in order to establish such relationship and to explain variation in the dependent variable, research papers keep on suggesting fresh set of variables as potential determinants of dependent variable without discussing the issue of functional form. For example, in majority of corporate finance research, some financial variables (explanatory variables) are picked up from the financial statements to explain variation in another financial variable (dependent variable) without even making a mention of the issue of functional form.

However, I have read in Gujarati (2009) that one must transform an independent variable by looking at its graphical relationship with dependent variable and then include it in the model. Further, I also have not come across a research paper which reports the results of RAMSEY RESET to suggest the appropriateness of the functional forms used by them in the model.

Thanks!

Could you please explain what you mean by "functional form of the independent variables"? — whuber, Dec 24 '18 at 16:17
By functional form, I mean the mathematical transformation which is generally used to redefine a variable. For eg. taking log of a variable (log transformation), taking square of a variable (quadratic transformation), taking inverse of a variable (reciprocal transformation) etc. — Prateek Bedi, Dec 24 '18 at 17:45
You are therefore asking "should we actually care about [the mathematical transformation which is generally used to redefine a variable]?" This seems tautological, because one re-expresses a variable for some particular reason, not just for fun! For some information about this, please see the hits at https://stats.stackexchange.com/search?tab=votes&q=box-cox%20score%3a5. That leaves your question about the RESET test: are you trying to ask whether this test might be used to find appropriate ways to re-express variables? — whuber, Dec 24 '18 at 17:53
Let me re-frame the question. As mentioned here: https://stats.stackexchange.com/questions/298/in-linear-regression-when-is-it-appropriate-to-use-the-log-of-an-independent-va/3530#3530, one of the reasons for log transformation is to linearize a variable. Now since we are trying to fit a linear relationship between DV and IV in a regression and we observe that the graph does not suggest the same, we transform the IV in order to achieve a linear relation graphically. My question is why don't we always verify and attempt to achieve this graphical linear relationship between DV and IV? — Prateek Bedi, Dec 24 '18 at 18:29
It's worth doing if you can--but there are many possible mitigating factors. There might not be enough data to identify a nonlinearity. A linear approximation might be accurate enough (for the range of observed independent variables) and easier to interpret. The analyst might not be aware of the possibility of a transformation. The analyst might elect to adopt a statistical procedure that handles the nonlinearity in a different way (such as through a GLM with a nonlinear link function). The linearization might ruin other valued properties of the model. — whuber, Dec 24 '18 at 18:42
So this means that we should always try to achieve this graphical linear relationship between DV & IV. Now, the literature (corporate finance, specifically) I am dealing with does not seem to do so. In particular, independent variables are introduced by papers (in top ranked journals) to explain the variation in DV with no mention of an attempt to achieve a graphical linear relationship. Actually I am working on a review paper & I intend to highlight this practice as a weakness of the extant literature. I just want to assure myself that this indeed is a weakness & I am not missing something. — Prateek Bedi, Dec 24 '18 at 19:01
I think you misread me. Indeed, regression in general does not require any kind of "graphical linear relationship," nor is failure to check on one necessarily a "weakness." You need to be more specific about the context, the assumptions, and the objectives of the analysis before you can justify such aspersions. — whuber, Dec 24 '18 at 19:35
Got your point. Now the issue I have is this: I have observed that research papers do not explicitly check for the correct functional form that should be ideally used to model the relationship between Y and X (it could be quadratic/cubic/inverse etc.). This 'correct' form can be observed by looking at the graph of these two variables. Gujarati (2009) discusses this in (Section 13.4) & points out that an incorrect functional form results in biased & inconsistent parameter. Hence, I believe it is necessary to check for a correct functional form by looking at the graph & transform the variable — Prateek Bedi, Dec 25 '18 at 14:27
Some quick reactions: first, research papers never report all the modeling and diagnostic procedures that are needed. This is understandable because they typically focus on matters of scientific, rather than statistical, interest; but it is also unfortunate because many (most?) researchers come to think they needn't do any more than is typically reported in the papers they are familiar with. Second, "looking at the graph" can be insightful but is typically a poor way to identify the form of a relationship. Third: "bias" is not inherently bad: it's important to *quantify* it. — whuber, Dec 26 '18 at 16:11
Thanks for the reply. Agree with your first point. Regarding second point, could you please suggest another way to identify the form of a relationship. Agree with the third point: a minimal bias can often be ignored. — Prateek Bedi, Dec 26 '18 at 19:16
Some methods I have described on this site are *careful modeling* (https://stats.stackexchange.com/a/64039/919 and https://stats.stackexchange.com/a/34186/919), [Tukey's three-point method](https://stats.stackexchange.com/a/35717/919), and [spread vs. level plots](https://stats.stackexchange.com/a/74594/919). There are many more methods, ranging from a "fractional polynomial" method for logistic regression described by Hosmer & Lemeshow all the way to machine-learning models like [SVMs](https://stats.stackexchange.com/questions/23391). — whuber, Dec 26 '18 at 19:54

Functional Forms of Independent Variables

0 Answers0