16

I recently started reading the Bayesian criticism of the p-value and it seems that there is a lot of discussion around the fact that a frequentist approach is not that good when the Null Hypothesis is true.

For instance in this paper the authors write that "p-values overstate the evidence against the null [...] this does not have to do with type-I or type-II errors; it is an “independent” property of p-value."

To illustrate this point, the authors show that when the null is true, the p-value has a uniform distribution.

What I do not get is that even when the null is true, a frequentist approach, thanks to the Central Limit Theorem, is still able to construct confidence intervals that includes 0 (non-significance) at the appropriate $\alpha$ level.

I do not get why the fact that the p-value is uniform when the null is true shows that a frequentist approach is biased. And what does it mean "independent property of p-value"?

enter image description here

library(tidyverse)
library(broom)

n=1000
x = rnorm(n,100,30)
d = 0
y = x*d + rnorm(n,0,20)
df = data.frame(y,x)
plot(x,y)
abline(lm(y~x), col = 'red')

r = replicate(1000, sample_n(df, size = 50), simplify = F)
m = r %>% map(~ lm(y~x,data = .)) %>% map(tidy)

# Central Limit Theorem
bind_rows(.id = 'sample', m) %>% filter(term =='x') %>% ggplot(aes(estimate)) + facet_grid(~term) + geom_histogram()

s = bind_rows(.id = 'sample', m) %>% filter(term =='x')
s$false_positive = ifelse(s$p.value < 0.05, 1, 0)
prop.table(table(s$false_positive))

# uniform
hist(s$p.value, breaks = 50)
giac
  • 821
  • 5
  • 20
  • 6
    Deborah Mayo (and discussants) on this issue: https://errorstatistics.com/2017/01/19/the-p-values-overstate-the-evidence-against-the-null-fallacy-2/ Mayo thinks that the idea that p-values "overstate the evidence against the null" is a fallacy, and I tend to agree. – Christian Hennig Aug 04 '21 at 23:33
  • In the errorstatistics article they are complaining about the realism of the .5 prior on the perinull, that maybe it should be lower. But in many applications, such as genomic screening, where a prior most coding regions are unrelated to the phenonmon of interest, the perinull prior is actually much higher than .5. And then there is ESP studies ... – BigBendRegion Aug 05 '21 at 00:49
  • 10
    "authors show that when the null is true, the p-value has a uniform distribution." This is not some suprising feature discovered by the authors. It is *by design* (it's also not universally the case, but when it's true it's because it is meant to do just that). This would be as strange as saying someone who is opposed to cars has shown that "brakes serve to impede the forward progress of the car" (Yes, that's exactly what they're supposed to do, this is not news). – Glen_b Aug 05 '21 at 03:52
  • 3
    In order for the p-value to be uniform under $H_0$ the test statistic must be continuous and the test exact. – BruceET Aug 05 '21 at 05:57
  • A lot of problems come from people interpreting p-values as probabilities of $H_0$ being true (or 1-p as probability of the $H_A$ being true etc.), or when people want to interpret a failure to reject as $H_0$ as $H_0$ being true (see also terrible takes like "The drug did not affect all cause mortality compared with a placebo: hazard ratio of 0.10 (95% CI 0.01 to 1.01; p=0.051)."). And people don't get that with a tiny sample size (=no power for realistic effect sizes), you have that uniform distribution even under the alternative (limiting case N=0 and just drawing P ~ U(0,1)). – Björn Aug 05 '21 at 06:45
  • 3
    Let me contribute a recent paper by [Sander Greenland](https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1529625) in which he addresses some common criticism of $p$-values (including that they overstate the evidence). – COOLSerdash Aug 05 '21 at 08:42
  • @Lewian thanks for the link, it's super interesting. "P-value denier: If I update the prior of .5 that I give to the null hypothesis (asserting toxin levels are of no concern), my posterior for H0 is still not all that low, not as low as .05 for sure." ^^ – giac Aug 05 '21 at 11:12

6 Answers6

19

The point that the authors are trying to make is a subtle one: they see it as a failure of NHST that, as $n$ gets arbitrarily large, the $p$-value doesn't tend to 1. It's a bit surprising that this doesn't contain any discussion of equivalence testing. To me it's somewhat obvious and reasonable that the p-value maintains its uniform distribution when the null is true considering larger and larger $n$. Large $n$ means having sensitivity to detect smaller and smaller effects, while the false positive error rate remains fixed. So under the somewhat constrained setting of the null being exactly true, the behavior of the $p$-value distribution doesn't depend on $n$ at all.

  1. NHST is, in my mind, desirable specifically because there's no way of declaring a null hypothesis to be true, as my experimental design is setup specifically to disprove it. A non-significant result may mean that my experiment was underpowered or the assumptions were wrong, so there are risks associated with accepting the null that I'd rather not incur.

  2. We never actually believe that the null hypothesis is true. Typically failed designs arise because the truth is too close to the null to be detectable. Having too much data is kind of a bad thing in this case, rather there's a subtle art in designing a study to obtain only enough sample size so as to reject the null when a meaningful difference is present.

  3. One can design a frequentist test that sequentially tests for differences (one or two tailed), and depending on a negative result, performs an equivalence test (declare that the null is true as a significant result). In the latter case one can show that the power of an equivalence test goes to 1 when the null is in fact true.

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • 2
    I suggest replacing normal -> usual to avoid above confusion – innisfree Aug 05 '21 at 08:51
  • I’m not sure this really addresses the criticism - is p a useful measure of evidence? The criticisms say that it isn’t. – innisfree Aug 05 '21 at 10:19
  • 2
    I struggle with the point that we never believe the null to be true. I am a social scientist and in many cases you seriously test the null to be true, for instance when you study if racial or sex differences in earnings have converged over time. – giac Aug 05 '21 at 10:52
  • 1
    "Large n means having sensitivity to detect smaller and smaller effects, while the false positive error rate remains fixed", how can that be true!? it is so un-intuitive! What about the confidence intervals getting smaller as $n$ grows? This surely must reduce the false positive rate?? – giac Aug 05 '21 at 10:55
  • @giac for a fixed sample, a shorter confidence interval will tend to have a lower confidence level, not a higher one. Increasing $n$ makes it possible to shorten the confidence level without reducing the confidence level. – fblundun Aug 05 '21 at 15:21
  • @giac To your 1st pt: I mentored several US-based social sciences investigators seeking K-awards (5 years gov't supported research/career grant). To stand out in a proposal, you need to be strategic and identify a meaningful problem. Given enough money and enough time, almost any hypothesis can be disproven. For instance, it is not enough that men and women simply earn different amounts of money, but that the income gap is more than 20% after controlling for education and tenure is remarkable. – AdamO Aug 05 '21 at 15:22
  • @giac to your 2nd pt: I'm not sure you've seriously thought (yet) about what is meant when we conduct a test "with a fixed alpha of... (e.g. 0.05)". Based on the expected behavior of the test statistic when the null is true **and for a fixed n**, we choose the critical value so that only $\alpha$ probability-area remains beyond the critical value. POWER comes when the null is false and the distribution shifts beyond the critical value in some measurable way. As we consider larger $n$, the choice of critical value changes. – AdamO Aug 05 '21 at 15:24
  • 1
    @AdamO it seems from your comments (which is expected) that there is a huge gap between statistical theory and applied quantitative social science. First, I take your point that any hypothesis can be disproven, but even the top economists never lay as precise numbers are your 20%, (racial, sex, class, ...) gaps are almost never specified in detail. Often you see "statistical" differences of very small effect size, which leads to your second point. Rarely in practice the critical value is adapted to $n$, it's just 0.05 by default, which might be problematic, but that's what is done in practice – giac Aug 05 '21 at 15:59
  • @giac you're right, I made an error. the critical value doesn't change with $n$, but rather the test statistic always converges to a stable distribution under the null, often $\mathcal{N}(0, 1)$ or $\chi^2_1$. What this means is that if you expressed the *significance boundary* in terms of the hazard ratio, sample mean, or sample proportion, you expect the boundary to converge toward the null arbitrarily for large $n$. – AdamO Aug 05 '21 at 16:17
  • @giac I disagree with your remarks about "precise numbers [as] your 20%" - you need to invoke an actual effect when presenting a power analysis and sample size justification. You don't see this level of discussion in the published research, but it's very much a part of planning. Running analyses higgledy-piggledy and without oversight partly explains the ever pervading problem of publication bias. – AdamO Aug 05 '21 at 16:18
  • @AdamO I don't disagree with you, it would be ideal if researchers would talk more openly about effect size and sample size justification. I just rarely see it. And journals favour so much "significant" results. It's difficult to change this system I guess. – giac Aug 05 '21 at 16:46
8

I think you’re conflating two different arguments against p-values. Let me spell them out.

  • By definition, p is distributed uniformly under the null hypothesis (or as uniformly as possible in discrete settings). So p isn’t going to be a useful measure of evidence in favour of the null when it’s true. Evidence in favour of the null isn’t going to accumulate as we collect data, as p just makes a random walk bounded by 0 and 1. I don’t see where the your source mentions bias in this context.

  • There exist mathematical theorems showing that p is much less than the posterior of the null for any choices of priors for parameters in the alternative hypothesis. This is what’s meant by overstating the evidence. For example, p = 0.05 under some assumptions corresponds to a null hypothesis that is at least about 30% plausible.

Let me note that p isn’t necessary at all in frequentist inference, and it’s use isn’t supported by any coherent framework. Indeed, there are really two kinds of frequentism here.

  • Fisher, in which p is used and used as a measure of evidence. It is this one that the above arguments attack

  • Neyman-Pearson, in which if p is used at all it’s used only to define a rejection region. The connection - or lack thereof - between p and evidence is neither here nor there.

Lastly, let me clarify confusion about the connection between NHST and proof by contradiction. In NHST, when faced with a small p-value, we face the dichotomy

  • we observed a small p and the null was true or
  • we observed a small p and the null was false

which evidently reduces to nothing more than the fact that we observed small p. It gets us nowhere deductively. Proof by contradiction on the other hand obviously does allow us to deductively prove non-trivial things. What required here isn’t deductive logic - like proof by contradiction - but inductive logic.

innisfree
  • 1,124
  • 6
  • 23
  • "So [the p-value] isn’t going to be a useful measure of evidence in favor of the null when it’s true," except that if the null is true we would see a p-value less than or equal to that observed 100(p)% of the time. The p-value shows the plausibility of the hypothesis given the data. See confidence curves. – Geoffrey Johnson Aug 05 '21 at 22:37
  • 1
    The p value doesn’t show the plausibility of the null given the data by any non-trivial definitions or meanings of plausibility, or of given that I’ve ever heard. – innisfree Aug 06 '21 at 05:48
  • So if I test one hypothesis and it has a p-value of 0.40, I use the same experimental result to test another hypothesis and it has a p-value of 0.04, and a third hypothesis has a p-value of 0.0000000000000001, all of these hypotheses do an equally good job explaining the observed data? All of these hypotheses are equally plausible? See Fisherian frequentism and confidence curves. You even listed Fisher in your response. – Geoffrey Johnson Aug 06 '21 at 13:36
  • @GeoffreyJohnson There is also a question of how likely the hypothesis is to start with. This is part of the Bayesian criticism of the p-value. – Andrea Aug 06 '21 at 16:35
  • The hypothesis is either true or it is false, unless the parameter is an unrealized random variable that exists in a state of superposition. See quantum mechanics. – Geoffrey Johnson Aug 06 '21 at 16:38
  • @GeoffreyJohnson it is not unusual for a hypothesis with a lower p-value to be more plausible than another hypothesis with a higher p-value. e.g. let $H_1$ be the hypothesis that [Paul the psychic octopus](https://en.wikipedia.org/wiki/Paul_the_Octopus) is better at predicting football results than me, let $H_2$ be the hypothesis that I am better, and suppose that I successfully predict 6/10 winners while he successfully predicts 10/10 winners. There must be another ingredient beyond the p-value feeding into our assessment of these hypotheses - something like a prior. – fblundun Aug 10 '21 at 13:26
  • We can view a prior as a normalized likelihood from a previous experiment. The likelihood, and therefore the p-value, comparing the hypotheses in your experiment above points to the octopus being a better psychic than you. That is all anyone can say. – Geoffrey Johnson Aug 10 '21 at 13:57
  • @GeoffreyJohnson you don't think I can say with very high confidence that I'm better at predicting winners than the octopus, because I have knowledge of which teams are considered better, while the octopus has no concept of football? (For the sake of this argument let's assume we are satisfied that the octopus's keepers are conducting the experiment fairly and not influencing him towards the team they think will win.) – fblundun Aug 11 '21 at 11:56
  • If we have a historical likelihood regarding your previous predictions we can include that in the analysis. This is precisely what the Bayesian would do in the form of a prior and interpret this as probability statements about the parameter instead of probability of the experiment given the parameter. We could use data collected on someone else and assume their predictive ability is the same as yours and incorporate that likelihood into the analysis. We could also use a likelihood based on hypothetical experimental evidence. In each case the Bayesian would interpret these as priors. – Geoffrey Johnson Aug 11 '21 at 12:03
6

For me a core issue here is that the Bayesian criticism of the p-value is based on Bayesian reasoning that a frequentist would not normally accept. For the Bayesian, the "true parameter" is a random variable (as probability distributions are about formalising uncertainty, and there is uncertainty about the value of the true parameter), whereas for the frequentist the "true parameter" is fixed and the basis of probability calculations (as probability distributions are about how data will distribute under idealised infinite replication).

The Bayesians start from a prior distribution over the parameter, which according to frequentist logic does not normally exist (unless we're in a situation where various "true" parameters are indeed generated in some kind of repeatable experiment as in "empirical Bayes" situations).

Updating the prior by the data, the Bayesian will produce a posterior and can then make statements about what the probability is that the true parameter lies in a certain set or takes a certain value. Such statements can not be made in frequentist logic, and surely the p-value doesn't do such a thing.

What's behind the "p-values overstate the evidence" issue is that some Bayesians actually interpret the p-value as (some kind of approximation of) the probability that the null hypothesis is true, in which case they can compare it with the same probability computed by Bayesian logic. Depending on the prior, one could then come to the conclusion that the p-value is too low or too high.

(This paragraph added after comments:) The connection between this and the statement about "evidence" is that some Bayesians tend to think that only probabilities that hypotheses are true (and certain quantities derived from them) qualify as valid measurements of evidence. This means that in order to accept the p-value as a measure of evidence, they need to interpret the p-value in this way. A frequentist can still think of a p-value as a measurement of evidence, but this measurement would then be something essentially different, as probabilities of hypotheses being true don't normally make sense in frequentist logic.

The problem here is that this (a probability of the null hypothesis being true) is not what the p-value is; according to frequentist logic there is no such thing as a "true prior" that could be used as a basis for this, and the p-value is a probability computed assuming the null hypothesis to be true, rather than a probability that the null hypothesis is true. Therefore a frequentist shouldn't accept the Bayesian computation as "what the p-value should be". The Bayesian argument (not shared by all Bayesians!) here is that a Bayesian interpretation of the p-value isn't as good as proper Bayesian analysis, but the frequentists can say that the p-value shouldn't be interpreted in this way in the first place.

The Bayesians have a point though in the sense that the p-value is often misinterpreted as a probability of the null hypothesis being true, so their criticism, although not applicable to a correct understanding of the p-value, applies correctly to what some people make of it. (Furthermore Bayesians can claim that parameters should be treated as random variables rather than as fixed, which is a more philosophical discussion and doesn't concern p-values in particular but the whole of the frequentist logic.)

Christian Hennig
  • 10,796
  • 8
  • 35
  • 2
    Many, many frequentists and practitioners interpret p as a measure of evidence, including Fisher. It’s not just some Bayesians – innisfree Aug 05 '21 at 10:46
  • @innisfree I am not arguing against interpreting p as a measure of evidence. The issue is that Bayesians tend to think that only a "probability of a hypothesis" qualifies as "measure of evidence" (as far as they think so, I guess not all of them do). For a frequentist, something different can be a measure of evidence. – Christian Hennig Aug 05 '21 at 10:48
  • I also struggle with the point that frequentist have no prior. My field is social science, and it seems that in most studies, people test specific prior. It seems to me that a lot of prior are other "groups of reference". For instance, to study the gender pay gap, you take the men's earnings distribution as a prior and compare the women's earning distribution. How is that not a prior? – giac Aug 05 '21 at 10:49
  • 2
    @giac It is not a prior in the Bayesian sense of being a distribution over possible true parameters. The men's earnings distribution is a distribution of (potential) observations, not of true parameters. – Christian Hennig Aug 05 '21 at 10:51
  • I am not sure I get it... I am wondering now if in social science we actually are not interested in "true" parameters but rather in comparing group distributions, whatever the true parameters are. – giac Aug 05 '21 at 11:15
  • I also feel that in most complex research questions, it is nearly impossible to get informative priors. In most cases, priors will come from empirical observed values from group comparisons. Imagine trying to get prior on stuff like the effect of education on racial distribution of earnings. I am not sure why priors would be helpful here, except when you create counterfactual "priors", which is a null distribution again construct with reference to empirical observation of other groups. – giac Aug 05 '21 at 11:17
  • @giac "rather in comparing group distributions, whatever the true parameters are" - well, you can model a difference between groups as a difference in true parameters. Regarding informative priors, that goes far beyond the original question. I'm not advertising the use of priors here, you need them only if you want to go Bayesian, and then you better have a proper Bayesian introduction on them, that hopefully also will give some guidance on their choice. – Christian Hennig Aug 05 '21 at 12:12
  • But the arguments that the OP mentions are against p as a measure of evidence. So we must try to settle whether the arguments are reasonable and whether p indeed is a reasonable measure of evidence. This isn’t moot point: it isn’t just a few Bayesians misconstruing p as evidence, it’s used as evidence throughout science – innisfree Aug 05 '21 at 13:08
  • @innisfree I don't dispute that. Neither am I saying that Bayesians "misconstrue" the p-value as evidence. What I'm saying is that Bayesians should not *at the same time* hold that the only valid measures of evidence are posterior probabilities of hypotheses being true (or quantities derived from them), and then take the p-value as a measure of evidence. It is fine by me to do the latter, but then you shouldn't stick to the former. – Christian Hennig Aug 05 '21 at 13:20
  • No one is using p-values and posteriors of hypotheses at the same time. Berger et al are indeed comparing them to show that p-values could give misleading impressions about the strength of evidence, if one wishes to use p that way – innisfree Aug 06 '21 at 05:51
  • Obviously the Bayesians don't *use* the p-value in this way, because they think a Bayesian approach is superior. However they criticise the p-value on the basis that it doesn't do well if interpreted in this way. – Christian Hennig Aug 06 '21 at 09:40
  • Your comment "For the Bayesian, the "true parameter" is a random variable ..." see [Would a Bayesian admit that there is one fixed parameter value?](https://stats.stackexchange.com/questions/83731/would-a-bayesian-admit-that-there-is-one-fixed-parameter-value) – kjetil b halvorsen Oct 11 '21 at 13:35
5

The discussion here is excellent but at the heart of the matter is that in an attempt to "let the data speak for themselves", i.e., to be objective, the frequentist approach jettisons the desire to obtain measures of evidence in favor of assertions. As explained eloquently in Bernoulli's Fallacy, Clayton takes us step by step through statistical history to explain how this happened and what harm it has done. One of the harms is that outside information was prevented from being brought into the analysis. One of his excellent examples is ESP research where he shows that obtaining a low p-value is meaningless without factoring in the low likelihood that the laws of physics are allowed to be suspended. So one can analyze all day the amount of evidence in a p-value but I don't think that is quite worth the trouble.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • What do you mean by "outside information"? – giac Aug 05 '21 at 12:51
  • I presume he means prior information that we might have about the plausibility of the null and alternative – innisfree Aug 05 '21 at 13:00
  • Using a frequentist approach prevents "irrelevant outside information" from being brought into the analysis. There is nothing preventing a meta-analysis of relevant information. – Geoffrey Johnson Aug 05 '21 at 13:41
  • I think there is. Meta analysis is a special case of outside information that usually rests on an exchangeability assumption and uses data from other studies. I'm speaking of more general cases than that, including outside information where there are no data, i.e., no previous studies. – Frank Harrell Aug 05 '21 at 14:40
  • So meta-analysis using hypothetical data, or making the exchangeability assumption and using other studies. It comes down to whether we want to measure the experimenter or the experiment. – Geoffrey Johnson Aug 05 '21 at 17:06
  • We need outside science + experiment. Without that, NHST has helped to cause the reproducibility crisis. And I'm nitpicking about the definition of _meta analysis_. – Frank Harrell Aug 05 '21 at 19:17
  • 3
    The apparent inability to incorporate or respond to "outside information" lies ENTIRELY with the Neyman-Pearson hypothesis test. Fisherian (or should I say neo-Fisherian?) p-values are simply indices of evidence (against the null according to the data and model) and that index of evidence can be (should be) used in conjunction with all other sources of relevant information for a scientific inference. Do not conflate p-values with an automatic decision rule hypothesis test. See this open access chapter for a full explanation https://link.springer.com/chapter/10.1007/164_2019_286 – Michael Lew Aug 05 '21 at 21:58
1

The idea that p-values overstate the evidence against the null hypothesis is partly due to a misunderstanding about the nature of p-values (as others have mentioned). A p-value can be regarded as a measure to address the hypothesis testing question: Is the data consistent with the null hypothesis? Bayesian posterior probabilities are Bayesian answers to a different question: Given the data, what is the relative plausibility of each hypothesis? The key word here is relative. It should hardly be surprising then that answers to such different questions can differ, even within the one school. In short, P-values are about consistency with one hypothesis, and not about the relative plausibility of two hypotheses.

0

The key to appreciating frequentist inference is to frame it similar to a proof by contradiction. It is by design that the p-value follows a uniform distribution under the null, i.e. $20\%$ of the time you get a p-value less than or equal to $0.2$. It answers the question, "If this other hypothesis is true, how often would I get a result like the one I've just witnessed?" It's like playing devil's advocate. A reality check. If the p-value was not uniformly distributed under the null you would not get this reality check. There is no bias nor is there an "independent property" because the population parameters do not randomly change from one repeated experiment to the next according to anyone's belief.

Bayesians have a very different way of thinking, so it is not surprising that these criticisms would be levied. Bayesian "probability" measures the belief of the experimenter, and beliefs are not facts. Any claim that the p-value overstates the evidence compared to a posterior probability is mistaking beliefs for facts.

There are one-to-one analogs on everything between the two paradigms. Bayesians often criticize frequentism for not incorporating outside information into an analysis, but Bayesian updating of a prior into a posterior maps to a frequentist meta-analysis with p-values and confidence intervals [1] [2]. Bayesians often criticise frequentism for not accounting for all uncertainty in a model when making predictions, but Bayesian predictive distributions map to frequentist prediction intervals [3]. However, this one-to-one mapping is not a reason to use an unfalsifiable subjective definition of probability.

Even in non-normal models, central limit theorem aside, it is possible to construct confidence intervals by inverting a hypothesis test or a cumulative distribution function to find a range of plausible values of a parameter.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Geoffrey Johnson
  • 2,460
  • 3
  • 12
  • 2
    There’s so much to unpack here it’s hard to know where to start. I’ll address the similarity with a proof by contradiction in my answer. – innisfree Aug 05 '21 at 04:33
  • Bayesian always say that frequentists do not use informed prior or a "prior distribution over parameters" (such as the answer below), but when you write "To the frequentist a predictive p-value or prediction interval uses historical data to test hypotheses", does this contradict this Bayesian claim? – giac Aug 05 '21 at 12:59
  • [@giac](https://stats.stackexchange.com/users/83585/giac) a predictive p-value is testing a hypothesis about a future as of yet unobserved observation. It is not testing a hypothesis about an unknown parameter using an observed observation. See my paper linked above on prediction intervals and this [wikipedia entry](https://en.wikipedia.org/wiki/Prediction_interval). – Geoffrey Johnson Aug 05 '21 at 13:17
  • 2
    There is no contradiction or inconsistency between modeling uncertainty about parameters using random variables and believing that those parameters don't change from one experiment to the next. – fblundun Aug 05 '21 at 15:51
  • It is not an inconsistency as the intention is the same in both paradigms. However, when parameters are treated as random variables the resulting "probability" statements are unfalsifiable unless we can verify the parameters were actually sampled from the prior (reference class problem). Therefore the only interpretation is that "probability"=belief, and beliefs aren't facts. If the prior is chosen in such a way that the posterior is dominated by the likelihood, Bayesian belief is more objectively viewed as confidence based on frequency probability of the experiment. See the links above. – Geoffrey Johnson Aug 05 '21 at 17:02
  • It comes down to whether we want to measure the experimenter or the experiment. – Geoffrey Johnson Aug 05 '21 at 17:09
  • 3
    Hi @GeoffreyJohnson. I notice that you've spent the morning editing links to your other answers into each of your answers and questions. There's no need to do this -- people can find the answers you've written by looking at your profile (https://stats.stackexchange.com/users/307000/geoffrey-johnson). Additionally, I see you're also linking to your published works at the end, but without any apparent connection to how they directly answer the question. This seems a lot like self-promotion and not about answering the question. – Sycorax Aug 07 '21 at 18:01
  • Hi @Sycorax. Is there any harm in leaving the links? Certainly people can view my profile to see all the questions I have answered, but the links are intended as a reference to related threads. – Geoffrey Johnson Aug 07 '21 at 18:04
  • 3
    This is strictly a Q&A site. Content that does not directly answer the question is subject to removal. The links might be of interest, but do they answer the question? How? This website has search and tagging features which can turn up these and many other relevant pages. In particular, linking to your published works because they are articles about Bayesian statistics, with no content in the answer relating the content of the papers to the question, seems over the line. More information: https://meta.stackexchange.com/questions/57497/limits-for-self-promotion-in-answers – Sycorax Aug 07 '21 at 18:18
  • 1
    Hi @Sycorax. The three publications compare Bayesian and frequentist methods for inference on a parameter, construct ancillary pivotal quantities for prediction, and use meta-analysis for combining information and transfering inference. I pulled directly from these papers when providing my answers. In my opinion they are not simply tangentally relevant to the questions. Do I need to explain this above each link? Are you suggesting I remove the links instead? Thanks for you help. – Geoffrey Johnson Aug 07 '21 at 19:09
  • 1
    If you feel that the content in the links directly answers the question, then you can summarize the key points of the links, explain how they answer the question, and include the link as a citation. Right now, it is not at all clear how these links answer the question; for instance, how does a question about Bayesian criticism of p-values relate to "Transfer Learning via Meta-analysis"? Alternatively, you can simply remove the links. – Sycorax Aug 07 '21 at 19:19
  • @Sycorax, done! – Geoffrey Johnson Aug 07 '21 at 19:36