I'm trying to compare which features of a website more active and less active users make use of. I've divided up the users into "active" and "inactive" and there are several page types they can visit. So now I have a contingency table like this:
user type| feature1 | feature2 | feature3 | feature4 | feature5
---------------------------------------------------------------------
active | 1000 | 2000 | 3000 | 4000 | 5000
inactive | 50000 | 40000 | 30000 | 20000 | 10000
So now I want to figure out which features are over-represented in the usage patterns of active users compared to inactive users.
Is comparing the conditional probability of cells in each column a reasonable way to do this, e.g.
P(feature5 | active) / P(feature5 | inactive) = 1/3 / 1/15 = 5
So in this case active users seem to be 5 times more likely to make use of feature5.
Is that a fair interpretation of the odds and if not what are the problems with that interpretation?