I have a small doubt. My real data looks like this
Y values are random values of integers from 0 to 2000. X values run like 1,2,3,4,5,.. to 2 million.
Now, my task is to identify significant peaks and remove background noise.
To achieve this , I used 2 methods :
I created a sliding window of 50 elements. I divide y[50] by mean of y[1]..to y[49].Now, the modified values are plotted and the significant peaks are recorded.
I created a sliding window of 50 elements. I divide y[50] by y[49] and y[50] by y[48] and so on till y[50] by y[1], and take the average.Now, the modified values are plotted and the significant peaks are recorded.
Which is the best method to use? You can also suggest other methods.
Background info:
The 2 million numbers are the nucleotide positions of a an entire genome. For eg if the nucleotide code is AUGCAUC .. and so on 1st position correspond s to A , 2nd to U and so on. For each of this position,you have a coverage value which is important to predict other things. First I calculate these values, and see if the peaks correspond to the results. After visual inspection I have to empirically choose the cut off value of the peak. Say for eg 100. I am more interested in the method.
The first method does not seem to work out well. If the mean of previous 49 elements is low then the final value shoots up.
The second one seems to be fine. But, I need a good opinion from a mathematician.