Find density peak extremities in genomic data

Question

I've ~150,000 genomic position that seems to be clustered in specific genomic regions (hotspot). However these "hotspots" may have different sizes (from very small ~ 10,000bp to very large ~500,000bp - bp = base pair). Could someone give me some advice to detect such peaks ? My idea was to use a small window-based approach and to find adjacent small-windows were the number of positions are significantly higher than random (using simulation).

Here's an subset of my data focuses on a portion of one chromosome. The top panel shows each individual genomic positions of interest (one vertical bar represents one site). The bottom panel shows the density computed using ggplot's stat_density using adjust=0.001 and bw=1000. I manually added the the red lines to show the information I want to extract from such data. An important point would be to extract only peak region that are more dense than by chance. I was thinking to perform a simulation were I randomly distribute 150,000 genomic sites and computes a kind of background density in order to compare with my real data. Any advice ?

Edit : I add the same plot with 5 random set of genomic sites (same size as the real dataset). My idea is to extract these region over the background.

Thanks

You could compute the kernel density of the physical position of your markers. That way you could detect the peaks by using the baseline density as a comparison point. — Riff, Dec 12 '16 at 09:07
Ok so in R use density() on the data. But how estimate the baseline density ? — Nicolas Rosewick, Dec 12 '16 at 09:16
Density=region where the number of positions is higher than expected by chance — Nicolas Rosewick, Dec 13 '16 at 07:11
For the baseline density you could perhaps use a [Poisson](https://stats.stackexchange.com/tags/poisson-process/info) null model? (This could give a "p value" then ... but is there a standard way to do this in your field?) — GeoMatt22, Apr 28 '17 at 01:22
I know [this guy](https://arxiv.org/pdf/1405.1400.pdf) does some work on it. You may want to look at it. — Josh, May 01 '17 at 14:56
See https://stats.stackexchange.com/questions/36309/how-do-i-find-peaks-in-a-dataset, https://stats.stackexchange.com/questions/175648/how-to-determine-if-there-is-a-peak-in-the-data — kjetil b halvorsen, Jul 27 '21 at 20:00

Find density peak extremities in genomic data

0 Answers0