1

Please confirm if we can make use of Normal Q-Q Plot to determine for normality of continuous variables, when the independent var is plotted against the dependent var, prior to conducting a regression analysis. If a straight line is obtained as attached, does this confirm that the sample data is normally distributed.

To double check the above, I conducted a normality test using Shapiro-Wilk Test. I obtained p<0.05, implying they were not normal.

How should I proceed? enter image description here

Vyas
  • 309
  • 1
  • 4
  • 11
  • 3
    The only thing ever assumed normal in regression is the reiduals of the model. Not the independent variables, and not the dependent variable. – Matthew Drury Sep 04 '17 at 14:46
  • 3
    Your observed values are clearly discrete at 2(1)8 and so not continuous at all; hence why you expect it to be normally distributed is unclear. That said, the approximation looks about as good as is likely -- yet it is also irrelevant, as regression makes no assumptions about marginal distributions. – Nick Cox Sep 04 '17 at 14:47
  • There is also an implication that linear regression may not be a good choice for you. Could the response ever be zero or negative? – Nick Cox Sep 04 '17 at 14:49
  • so I should not consider the Q-Q plot to determine if the residuals are normally distributed in a linear regression. – Vyas Sep 04 '17 at 15:01
  • 1
    On the contrary, @Matthew Drury's comment implies that is a fair thing to do. – Nick Cox Sep 04 '17 at 15:56
  • correct, the above Q-Q plot is generated when I plotted independent var v/s dependent var. Ok. I see. @Matthew pointed out that i should plot the residuals, not the independent and dependent var. I believe an inspection of a plot of the unstandardized or standardized residuals against the predicted standardized predicted values should suffice. Thanks – Vyas Sep 04 '17 at 16:09
  • There is some potential for confusion here. Small points first. (1) A Q-Q plot is for one variable only and is nothing to do with what you are also plotting. Perhaps you are using SPSS or some software which automatically gives you extra plots if you ask for them. (2) I am confident that @Matthew Drury was not advising against a plot of the data. (3) More trivially, see https://stats.stackexchange.com/questions/146533/versus-vs-how-to-properly-use-this-word-in-data-analysis on the term "versus". – Nick Cox Sep 04 '17 at 16:17
  • Most important: As for what suffices for your purposes, I don't think we can advise at all, as you give absolutely no information on the model and how well it fits. You could usefully show us a scatter plot with regression line added. – Nick Cox Sep 04 '17 at 16:18
  • Confirming after a long plane ride. A plot of the raw data is certainly more impotant than any qq plot one could devize. – Matthew Drury Sep 04 '17 at 20:12
  • 1
    Voting to close as unclear. OP is asking what to do next, but this plot alone cannot tell us. – Nick Cox Sep 05 '17 at 06:22
  • I'm voting to close this question as off-topic because the post asks for advice without giving the information needed in order to give that advice. – Peter Flom Sep 05 '17 at 10:59

0 Answers0