Is it reasonable to set the null hypothesis as the well-accept classic model?

Question

Suppose that I did an experiment and collected a dataset. In previous literature these type of dataset are explained by a well-accepted model, called Model M. I believe that the Model M cannot explain my data, so I need to test it.

One procedure to accomplish this is through a hypothesis test:

$H_o:$ The data's distribution is same as the Model M's predicted distribution.

$H_1:$ The data's distribution is not distributed according to Model M.

Is this procedure reasonable? Or has anyone set the null-hypothesis as an existing model? If anyone has done it then I think I can justify this procedure, too.

I think, at a first glance, this null should be fine. However, usually the $H_o$ is something like "the data is random", and the $H_1$ is something like: "the data is explained the model". I am basically swapping the position of $H_o$ and $H_1$.

All details are taken from the real world problem as exact numbers:

The dataset contains seven columns and 1246 ranks. For the seven columns, there are six independent variables, $x\in\mathbb R^6$, and one dependent variable $y\in \{-1,1\}$.

For each $x$, variables are inputted in the experiment, and the resulted $y$ will be either -1 or 1.

The experiment is done for 1246 times, each time with a different $\theta$.

The model M is a deterministic real function $\hat y=\text{sgn} [f(x,\theta)+\epsilon]$.

$\epsilon$ is an error term. When $\text{sgn} [f(x,\theta)+\epsilon]=0$, set $\hat y=1$.

Of course, to make it a statistical model, there has to be an error term, which is either normal distribution or Extreme Value Type I Distribution (because this is basically a binary choice problem).

Then, there are two different ways of finding $\theta$:

$\theta$ is estimated through MLE using the entire dataset. In this case, $\theta$ is same for all 1246 rows.
The 1246 rows of data are equally divided into 89 different groups based on the person who did the experiment. 89 different $\theta$ will be estimated.

There are lots of goodness-of-fit tests. The null hypothesis is that the model is correct and we are looking for evidence to the contrary. — Geoffrey Johnson, Nov 28 '21 at 23:55
1. You have it the "usual" way around for a hypothesis test. "The data are random" is too vague for a hypothesis (random, how, specifically?). Once you pin down what that exactly means, you'll usually see it relate more closely to the sort of null you have in your question. 2. This is the second recent post I've seen from you where you appear to be *excessively* coy about your models. This is very likely to lead to a perfectly reasonable answer to the wrong question. Very often the specifics yield important details that get left out of a vague generalization ...and which may change the answer. — Glen_b, Nov 29 '21 at 03:36
@Glen_b Thanks for you comment. Those two questions are about different models. I was trying to put the question to be as general as possible because giving too much details will make this question too specific and it is against the rule of MSE (not sure about StatSE). Too my limited knowledge, MSE explicitly discourages specific questions that are too specific, only benefit one person, and consider home-work questions. Probably I have misunderstandings. Still much much appreciate for your efforts; I will attach specific examples next time. — High GPA, Nov 29 '21 at 04:54
Probably most questions at MSE are rather general. Cross Validated gets many applied questions from academic researchers and professional data analysts, where details are important. I think it’s fine for CV and MSE to diverge in this way. — Dave, Nov 29 '21 at 05:01
@HighGPA Stats in real-world situations (and indeed sometimes in relatively theoretical situations) tends to have counterintuitive subtleties, and there are very often details which may seem trivial but which turn out not to be. This is less common in mathematics overall. I'm not suggesting large amounts of detail, just enough that we can at least consider whether all the information we might need is present (i.e. for us to ask better questions). I understand the two models were different, but the concern (that some important detail needed for a good answer may be omitted) is the same. — Glen_b, Nov 29 '21 at 05:10
@Glen_b Details are added. Please let me know if further details are needed and I will be more than happy to provide any. Thanks again Glen and Dave. — High GPA, Nov 29 '21 at 05:38
1. By "1246 ranks" do you mean "1246 rows"? 2. I'd suggest that an additive error for a binary response is not usually an ideal choice. Further I don't see how either suggested error distribution would result in the response consisting only of 0's and 1's. Your problem sounds very similar to a common statistical problem, which is not approached the way you're coming at it. — Glen_b, Nov 29 '21 at 06:00
@Glen_b Trying to give an update now. A sign function must be outside of the error. — High GPA, Nov 29 '21 at 06:36
I'm sorry, I'm not quite sure what you mean, do you mean you're adding latent error term and then dichotomizing? That works, though perhaps the logistic distribution would be the most common choice then. — Glen_b, Nov 29 '21 at 07:17
@Glen_b You are probably right. I am not an expert in statistics; logistic distribution could be more popular. — High GPA, Nov 29 '21 at 09:30
1. I don't wish in any way to dissuade you from the distributions you mentioned, just to point out that there are other choices. Logistic-distributed latent error corresponds to [logistic regression](https://en.wikipedia.org/wiki/Logistic_regression#As_a_latent-variable_model), the Gaussian to [probit regression](https://en.wikipedia.org/wiki/Probit_model). 2. $\text{sgn}$ moves values to $\{-1,1\}$ (except that exact $0$ would usually go to $0$), not to $\{0,1\}$. — Glen_b, Nov 29 '21 at 10:01
@Glen_b You are right. Just updated. Sorry for being unclear. — High GPA, Nov 29 '21 at 10:07
Oh, and if the extreme value distribution you mentioned was the Gumbel, you'd get a cloglog model. There's other choices but those three would cover well over 99% of the GLM models that people fit to binary data. (Edit: yep, type I is Gumbel ...I don't remember the numbers) — Glen_b, Nov 29 '21 at 10:11

Dave · Accepted Answer · 2021-11-29T03:03:40.827

That’s fine. A standard example of this is the Shapiro-Wilk test, which has a null hypothesis that the data come from a normal distribution and an alternative hypothesis that the data do not come from a normal distribution.

Keep in mind, however, that hypothesis testing is literal, and, given a large sample size, will (correctly) indicate statistical significance when there might not be practical significance.

Is normality testing 'essentially useless'?

Is it reasonable to set the null hypothesis as the well-accept classic model?

1 Answers1