However there are often buys in odd circumstances which factor into the price, that is not (and cannot be) addressed directly in the features of the analysis.
- Isn't that what error terms in a regression are supposed to capture: variation in the outcome variable that isn't explained by the features of your model?
If your question is how to deal with outliers in general under the assumption that extreme observations are probably bad data
Some standard approaches are:
- Trimming the data. Eg. ignore 1% of most extreme observations.
- Winsorizing the data. Replace observations above or below some cutoff with the value of the cutoff. (This isn't quite extreme as trimming the data, which deletes extreme observations entirely.)
Some fancier approaches to outliers (ignore if this is at all confusing):
- You can do things like ellipsoidal peeling. Find the minimum volume ellipsoid which encloses your data than remove observations along the surface.
- Estimate regression with Huber Loss function or something less sensitive to outliers than OLS. Or maybe maximum likelihood estimator with t distributed rather than normal distributed errors, etc...
- Quantile regression.
- You could adopt some Bayesian view as to whether an observation is bad data.
Beware the problems of mishandling outliers...
In many cases, such as returns for financial securities, removing or ignoring outliers can be hugely problematic. Often times, all the action is in the outliers! Major stock market crashes, company bankruptcies, etc... are hugely important.
For situations involving safety, (eg. auto-crashes etc...), ignoring bad outliers can be even worse! You don't want to winsorize observations such that observations where people die get replaced with observations where people are mildly injured. That would be perhaps criminal negligence.