4

I am trying to check variable importance of this gender variable. I know if p value is less than 0.05 then its important otherwise not but what is it's giving p-value < 2.2e-16. I have tried other methods too but giving the same for all of the categorical variable. I have pasted output for only one variable. SO, how to interpret this. (variable to be considered or not).

Frequency of my data points

 table(data$gender,data$target)

 Output: 
          N       Y
  F 2107566 2560932
  M 1307442 1567399
  U       3      16

To test statistical significance score:

chisq.test(table(data$gender,data$target))

Output:

        Pearson's Chi-squared test

    data:  table(data$gender, data$target)
    X-squared = 86.9407, df = 2, p-value < 2.2e-16

Note : I think one of the possible reason might be because of 7.5 million rows. So, will this be solved by sampling the data for checking the significance.

Vignesh Prajapati
  • 360
  • 2
  • 5
  • 18
  • 4
    This means $p< 2.2*10^{-16}$ and is effectively close to zero (actually numerically undistinguishable from 0). Do you also ask how to interpret such a small p-value? There are many questions here asking that very thing. – Momo Sep 23 '15 at 21:29
  • 2
    `<2.2e-16` means $0.00000000000000022$. It is (very much) less than $0.05$. On a different note, what kind of gender is `U`? – gung - Reinstate Monica Sep 23 '15 at 21:31
  • Why do you think p values below 5% are important? – Michael M Sep 23 '15 at 21:36
  • @gung, U means unknown (or intersex). – Vignesh Prajapati Sep 23 '15 at 23:17
  • 4
    Relevant: http://stats.stackexchange.com/questions/78839/how-should-tiny-p-values-be-reported-and-why-does-r-put-a-minimum-on-2-22e-1 – Glen_b Sep 24 '15 at 02:05
  • @gung Either it's "unknown" for missing data, or it's some indicator for nonbinary genders. New Zealand officially recognizes several genders, for example. – Sycorax Sep 24 '15 at 02:53

1 Answers1

-1

Important parts of the Q is answered. That p-value enables you to undoubtedly report on whether your hypoethsis is true or not. It is far more better than when p-value is at the critical level something like 0.05