1

I have histograms produced from two sets of data recording. One of background noise of values $X$ and another $Z$ with a signal of values $Y$ present such that $Z = X + Y$.

How can I estimate a histogram of the distribution of $Y$ given that I can only record $X$ and $Z$?

$X$ and $Z$ are not recorded simultaneously, so the samples taken of $X$ in $Z$ may not be precisely the same.

I learned that if I had $X$ and $Y$ already, I could convolve the histograms to produce $Z = X + Y$ predictably using the convolution theorem to calculate $Z_{hist} = f(\hat{f}(X_{hist}) * \hat{f}(Y_{hist}))$ in a pointwise manner, where $\hat{f}$ and $f$ are the Fourier and inverser Fourier transforms respectfully. I inverted this to produce $Y_{hist} = f(\hat{f}(Z_{hist}) / \hat{f}(X_{hist}))$ and found it worked when the data sampled from X was exactly the same as summed into Z, but when the data is sampled separately the resulting histogram has negative bin counts and a non-integer sum.

Is there a better way than the convolution theorem, or a way to massage the data to produce a workable answer?

An example of the situation might be as follows.

  1. Simulate arbitrary, differing distributions $X$ and $Y$.
  2. Produce $n$ samples of $X$ and store a histogram $X_{hist}$.
  3. Produce $m$ pairs of samples from $X$ and $Y$ and sum them, calling it $Z = X + Y$. Store these sums into a histogram $Z_{hist}$.
  4. I would like to produce an approximate histogram $Y_{hist}$ using only $X_{hist}$ and $Z_{hist}$ and the knowledge that $Z_{hist}$ was generated by $Z = X + Y$. I am trying to estimate the distribution of $Y$.
fuzzyTew
  • 127
  • 4
  • Your description is too vague for the question to be answerable. Could you provide an example or an illustration? – whuber Jan 13 '19 at 22:44
  • @whuber thank you for the feedback. Is this good enough now? – fuzzyTew Jan 13 '19 at 23:16
  • note to self if I ever dig this up in the future: if needed, I could probably do this by estimating each output bin separately, possibly with error determined with a weighted sum of errors from the possible bins that can produce that particular output. there's a description of histogram error at https://stats.stackexchange.com/a/214496/233920 – fuzzyTew Jan 13 '19 at 23:24
  • I got a downvote after adding my example. I'm new here, and it would be great to know a next step to improve this question further. – fuzzyTew Jan 14 '19 at 03:19
  • I am sorry about the downvote. It's best to ignore anonymous, unexplained downvotes: they mean nothing when applied to posts by new users. – whuber Jan 14 '19 at 14:37
  • Have you made any progress on this? – NEN Mar 13 '21 at 21:34
  • I have not worked on this further, but I'm still curious about it. A machine learning algorithm could probably solve this from the data if needed. – fuzzyTew Mar 14 '21 at 11:22
  • A fruitful set of search terms is [histogram deconvolution](https://www.google.com/search?q=histogram+deconvolution) – whuber Mar 14 '21 at 16:18

0 Answers0