Testing the goodness-of-fit for the binomial distribution

Question

As specified in the title, I'm trying to understand how to test the goodness-of-fit for the binomial distribution; to this aim, I followed what suggested in this link.

Particularly, I'm investigating about the possibility to model the independent variable, here defined as $Y$ on the basis of binomial distribution; $Y$ is a dummy variable that can assume values 0 (failure) or 1 (success), and, moreover, it is related to a categorical variable assuming values from 0 to 9 (such values are increasing in the probability of success).

I need for testing whether $Y$ follows a binomial distribution $B(p,n)$, where:

$p$: number of successes;
$n$: number of repeated experiments;

$n$ has been assumed equal to 1, since the experiments is never repeated.

So, I implemented the following SAS code to to estimate the parameter $p$:

proc genmod data = dataset;
    model Y = /dist=binomial;
    output out = predbin
                 p = p; /* p: binomial parameter estimate */
run;

data expected_binomial_distribution;
    set predbin;
    do Y = 0 to 1;
        prob_bin = pdf("binomial",Y,p,1);
        output;
    end;
    stop;
run;

and the following one to estimate the goodness-of-fit:

proc means sum nway data = expected_binomial_distribution;
    class Y; 
    var prob_bin;
    output out = goodness_of_fit sum=_testp_;
run;

ods output onewaylrchisq = LR_SpecifiedProportions
           lrchisqMC = LR_Exact_MC;
proc freq data = dataset;
    table Y / chisq(testp = goodness_of_fit
                        df = 1
                        lrchisq lrchi);
run;
ods output close;

The degrees of freedom value is set equal to 1 since Y can assume 2 values only.

Is this interpretation of what the link suggests for fitting and testing the binomial distribution correct?

Moreover, since the test reject the null hypothesis, one cannot assume to model the $Y$ variable by using the logistic regression model; so, what alternatives one has to model the dummy variable $Y$ in terms of distribution (negative binomial, Poisson,...) or link function (logit,...)?

Hi @knrumsey and thanks for the comment!! According to you, such models do not have any assumptions on the dependent variable distribution? Thanks for your help!! — Quantopik, Mar 08 '19 at 14:49
I have to retract my earlier comment. CLM's are designed for cases where the *response variable* ($y$) is ordinal. [This question](https://stats.stackexchange.com/questions/195246/how-to-handle-ordinal-categorical-variable-as-independent-variable) provides several methods for handling binary response with ordinal covariates. The first two answers are excellent if you believe that the probability of success relates monotonically with the ordinal predictor variable. The third answer is more flexible, and requires no such assumption (and may be less powerful as a result). — knrumsey, Mar 11 '19 at 17:20

Testing the goodness-of-fit for the binomial distribution

0 Answers0