machine learning on data with lots of fluctuation

Question

I have CSV files that contains data of Cache performance on a source with different workloads for a particular time period ! For each time interval data is recorded , It includes columns like ReadHits , WriteHits , Cacheusage , ReadMiss Etc .

   Ex of CSV FILE contents:

 Interval,ReadHits,WriteHits,Cacheusage,ReadMiss 

  1       ,  150  , 0   ,  15474 , 12

  2       ,   0    , 0   , 700375, 245  

  3       ,  15426 ,  1546 , 45121,195

Note : Each interval will be of same time period , Eg 1 interval = 40Sec

In each column data will be from 0 to 60k+ , this varies for each interval !!

   Eg : Interval 7    8    9    10  11

        Readhits 0   240  1680   0  2091

So this way it contains data with lots of fluctuation ranging between 0 and 60k+

Suppose i have data till 60 intervals ,how can i predict data from intervals 61 to 70 ?

I have used ARIMA model , random forest , kmeans and different machine learning algorithms but have never been able to predict close to actual values !

Which algorithm will be better on this kind of data for predicting data of next intervals?

Apart from prediction what other useful and innovative things i can do from Machine learning algorithms for above kind of data that can be useful for the user ?

Did you try log transform? That is really focusing on the size of values, and often indicated when values is wildly different. Tell us if that helped! — kjetil b halvorsen, Apr 14 '17 at 11:50

score 0 · Answer 1 · answered Apr 14 '17 at 09:05

Machine Learning works if there is a correlation between given data and class.

For what I can discern here, you are trying to infer 4 value at the same time using the same values of measured on previous intervals. I could be wrong but I think you are missing some information because how can you predict the number of reads at time 61 if you don't know what the machine is actually doing? If there is no pattern in time 1-60 then in my opinion you can't do much.

Anyway, for the fluctuations of data i think that the best way to handle them is to normalize every column with values between 0 and 1 (see: How to normalize data to 0-1 range?) so that you can measure the real fluctuation.

You can try to normalize them and then again your algorithms, let us know!

I hope it somehow helped

Best of luck

machine learning on data with lots of fluctuation

1 Answers1