I have X and Y variables, as well as a cluster variable (State). X and State are derived from Database A, while Y and State are derived from Database B.
X is a sentiment score ranging between -1 and 1, while Y is a yes or no (0 or 1) response.
In Database A, I aggregate X into average-X by state, while in Database B, I aggregate Y into percentage-Y by state. Then I combine the two datasets as follows:
In the combined data structure, my new outcome is percentage-Y, while I do have the numerator and denominator that give rise to percentage-Y.
I have heard that from here - "The most natural way fractional responses arise is from averaged 0/1 outcomes. In such cases, if you know the denominator, you want to estimate such models using standard probit or logistic regression".
It seems since I do have the denominator information, I can avoid using the Fractional outcome regression and just stick with the standard Logistic regression.
However, how exactly can I model a logistic regression based on the denominator information?