I already checked other posts in this area, but still couldn't get a fit to my issue: I have the following preconditions:
Software: preferred SPSS v21, possibly R
Sample size: 5655 (will get around 8x times larger in 2 weeks, but I already want to find out statistical pitfalls)
DV: binary; 1 = success, rest 0 = fail - sales conversion
IV1: categorical; 5 different conditions - different designs
IV2: categorical; 3 different conditions - different discount (0,5,10)
and the following problem:
DV = 1 has only 36 cases. IV2 is naturally related to DV, since one of the 3 conditions of IV2 is only applied when DV = 1. Meaning for all DV = 0, IV2 will also equal 0. Conditions of IV1 are randomly assigned to each case. Essentially, I want to find out whether IV2 is a moderator of IV1 -> DV, however, SPSS gives me very strange pseudo-R²s, odds ratios and so on.
After googling I found out it might be a problem of quasi-complete separation. Subsequently, I tried to run Firth logistic regression in SPSS, which never stopped running and in R, which gave me the following error:
Error in chol.default(x) : the leading minor of order 15 is not positive definite
Such that I have the following questions:
- is my issue solely related to the sample size?
- how can I fix the leading minor error?
- how can I approach the problem in general (methods)?
Additional info: The different discounts (IV2) are applied to a specific set of products (let's call them set A (0 discount), set B (5 discount) and set C (10 discount)). Now a visitor comes and is randomly assigned to one of the categories of IV1, which is effecting how A, B and C are displayed to the visitor.
It is only tracked to which category of IV1 the visitor was assigned and which product (from set A, B, C or none of them) was purchased.
I only want to find out the conversion rate of A, B and C and especially which category of IV1 predicts a higher conversion. On top I assume the discount (IV2) to have a moderating effect on the effects of IV1.
I know its probably not optimal to do it like this, but it was not possible in another way. Maybe my reasoning is incorrect, but I hoped to test the above for a moderation effect.
I ran a bayesglm today, it indeed drove down the odds ratios, so that is already a little success, although the results are mostly insignificant, which is probably due to the small sample size. However, I am not even sure whether the bayesglm applies to my situation. Further, I am unsure about the prior means and scales which I am trying to understand at the moment. Any feedback of how I can approach my situation is welcome, thanks!