3

While I mostly have a good understanding for the mathematical aspects of statistics, I find myself struggling with seemingly arbitrary choices:

For example I have a 2D matrix of chemical compounds showing whether they co-locate or not. (binary 0/1)

To reduce the noise, I want to remove outcomes (0/1) that have a probability over 90% at random. (for example based on concentrations and random movement)

Now how can I ever justify why I decided for a 90% cut-off and not 75% or 99% instead? Do you have any resources that help with such "arbitrary" choices and how to defend against potential claims of data-manipulation? I wonder how you statistics veterans deal with such decisions!

(Basically whatever I do, it will look like I screened through every cut-off and selected whichever looks best for the scientific publication.)

KaPy3141
  • 745
  • 4
  • 18

1 Answers1

2

I would like to provide my take on this from academic experience. You are absolutely right that such criteria are often arbitrary. However, over time I observed two popular ways of dealing with this.

First, you could search relevant literature on the topic, and see what criteria researchers in your field use. Usually, such choices come with justifications, which can help you to think about the appropriateness of their choices. You could also use statistical textbooks as an alternative.

The second method might simply be to develop a theoretical justification for your choice. I will provide an example from psychometric research. Often when modeling latent constructs, two choices are possible: a) to assume that the latent construct causes specific observed behaviors; b) in the opposite, observed behaviors cause latent construct. Depending on a theoretical justification in favor of the either choice, one would proceed with a different statistical model.

Note: @nope provided a very sensible idea, and you should, of course, see how robust your results are based on removing outcomes with varying degrees of probability.

PsychometStats
  • 2,147
  • 1
  • 11
  • 27