I have two variables, both of which are categorical and binary/dichotomous, and I need to determine if there is any 'correlation' between them. Note that I am using this term carefully, as I know 'correlation' technically measures the change in one variable, which is not applicable for categorical, dichotomous variables. I simply want to know if there is a link between my two sets of data.
One variable is whether a gene is a 'pseudogene' or not (1 for pseudogene, and 0 for non-pseudogene), and the other is whether the gene is a 'complement' gene or not (1 for complement, and 0 for non-complement).
An example of the data is as follows, where each row is a single gene (imagine this but on a scale of about 500,000 rows):
pseudo complement
0 1
0 0
1 1
0 1
0 1
1 0
Many of my extensive Google searches have told me that, for two categorical variables, the chi-square test is appropriate. I've tried using this but my results don't seem to be very reliable - more research has told me that the context of my data is also not appropriate, as the test concerns comparing different populations, whilst my variables are unrelated. So chi-square is probably completely out of the ballpark.
Similarly, I see some suggestions that the phi coefficient test is designed for comparing 2 dichotomous variables - however it seems again that the context of the test is not appropriate for my data.
Which statistical test should I be using for testing any correlation/link between these two variables? (Bonus if you can tell me how to do this in R, but my main concern is just deciding which test is appropriate.)