If you want to justify the use of BIC: you can replace the maximum likelihood with the maximum a posteriori (MAP) estimate and the resulting 'BIC'-type criterion remains asymptotically valid (in the limit as the sample size $n \to \infty$). As mentioned by @probabilityislogic, Firth's logistic regression is equivalent to using a Jeffrey's prior (so what you obtain from your regression fit is the MAP).
The BIC is a pseudo-Bayesian criterion which is (roughly) derived using a Taylor series expansion of the marginal likelihood $$p_y(y) = \int L(\theta; y)\pi(\theta)\mathrm{d} \theta$$ around the maximum likelihood estimate $\hat{\theta}$. Thus it ignores the prior, but the effect of the latter vanishes as information concentrates in the likelihood.
As a side remark, Firth's regression also removes the first-order bias in exponential families.