I checked different questions on similar topics, but none were exactly the answer I wanted and I am confused.
I am working with big data, the data has a bursty nature with high frequency.
I considered features one by one with respect to time (equivalent to one time-series) and want to remove outliers in selected time series.
I am implementing this in java using weka. I read a lot about this problem but I did not find which exact method would work best for any time series. It would be also great, if any optimal outlier method could also smooth time series because at the end of day, I need to give multiple time series data to PCA for finding correlation. As you know, PCA is sensitive to outlier, noisy nature and missing values.
I know, I cannot find all three things in one algorithm but for me, outlier detection is tough part.
Please give your views on it.
Thanks