0

I want to build a simple regression model with non-time series such as Client ID Number. when testing the validity of this model, do I need to check for autocorrelation, Heteroskedasticity and Normality?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • 1
    Homoskedasticity and normality of errors are ideal conditions for many kinds of regression model (but not all!). Any kind of cluster structure (e.g. people within families or schools; firms within industries or countries) can give rise to dependence issues regardless of whether the data are time series (or spatial series). – Nick Cox Oct 06 '20 at 16:10

1 Answers1

0

Dependence problems can only be checked if you know which data could be dependent on which other data. In a time series setting this is easy; observations that are close in time will normally be correlated. In a non-time series situation there may or may not be such information - the client ID number may have a time meaning, or mean something else like somebody because client of a particular department... I don't know... occasionally dependence can work along the values of one of the variables in the dataset, maybe even an external variable not used for regression. If there is background knowledge and data allowing you to check for a specific dependence pattern, this is a good thing to do. But in some situations such information doesn't exist.

Getting yourself an impression to what extent your data (the residuals, not the original data!) are approximately normal and homoscedastic is informative about your data, therefore a worthwhile thing to check. However some violations of these (not all) are actually quite harmless. What is less harmless are outliers, particularly leverage outliers (i.e., outliers in x-direction). You should surely look out for them!

Christian Hennig
  • 10,796
  • 8
  • 35