Suppose that we have two databases : Database_1
and Database_2
. Database_1
has 300 samples and Database_2
has 700 samples. Database_all
is combination of two databases.
Is finding outliers using
abs(X-mean(X))>=1.9*std(X)
inDatabase_all
is equal to finding outliers separately inDatabase_1
andDatabase_2
? In both cases we will remove same samples or not?Suppose that we will remove outliers with column 1 on above databases. In column
2
and3
we have data that we want create a label for every sample like this(column_2_value-column_3_value)/mean(column_2)
(mean(column_2)
is average of all values in column2
) We want calculate this label for all samples after removing outliers. In this case we have same label values if we remove outliers fromDatabase_all
compare to calculate after removing outliers separately inDatabase_89
andDatabase_90
?
Please mathematically prove this problem if you have time.
Thanks.