I wanted to mention a few more solutions that are quite straightforward and have not been mention, yet.
Bayesian logistic regression
We can use Bayesian logistic regression and set prior distributions that put no prior weight on negative coefficient values. Of course, your exact choice of prior distribution may have an influence on the inference, but since you seem to have some prior information that may be a good thing. A plus of this approach is that you get credible intervals for parameters.
For example, if you use R
and the brms
package, something like this should work in theory, but runs into some challenges:
library(tidyverse)
library(brms)
set.seed(1234)
mydata = tibble(y=rep(c(0,1), each=25),
x1=rnorm(n=50, mean=y, sd=1),
x2=rnorm(n=50, mean=y, sd=1))
fit1 = brm(data=mydata,
formula=y~0+x1+x2,
family = bernoulli(link = "logit"),
prior=set_prior(class="b", prior="uniform(0,10)"),
control=list(adapt_delta=0.999))
We can of course debate, whether a uniform(0,10) prior is sensible, but an log-odds ratio of 10 is huge, so I thought I'd cut of the distribution there. However, we run into sampling problems, because we did not tell the sampler that the parameter is always >=0. With e.g. a exponential(0.01) prior on the coefficients, that issue still occurs. There's two solutions to this (and possibly a third if you can tell
brms
about the constraint):
Firstly, one of the cool things about MCMC sampling is that you can introduce constraints retrospectively, e.g. by sampling without a constraint and then subsetting the samples:
fit2 = brm(data=mydata,
formula=y~x1+x2,
family = bernoulli(link = "logit"),
prior=c(prior(class="b", prior="normal(0,10)"),
prior(class="Intercept", prior="normal(0,10)")),
control=list(adapt_delta=0.999))
library(tidybayes) # Only needed for the median_qi function
as.data.frame(fit2) %>%
filter(b_x1>=0 & b_x2>=0) %>%
dplyr::select(-lp__) %>%
median_qi()
This gets us:
b_Intercept b_Intercept.lower b_Intercept.upper b_x1 b_x1.lower b_x1.upper b_x2 b_x2.lower b_x2.upper .width .point .interval
1 -0.7483035 -1.638089 0.03658298 1.136726 0.2859683 2.247933 1.011623 0.354213 1.851013 0.95 median qi
The main limitation of this approach is that when you have lots of coefficients (curse of dimensionality) or the data kind of favors a negative coefficient for some, when in either case you may end up without (many) samples that are all positive.
The more general solution is to tell the sample about the constraint (which is easy enough to do in rstan
by coding a Stan model ourselves):
library(rstan)
scode = "
data {
int n;
int<lower=0, upper=1> y[n];
real x1[n];
real x2[n];
}
parameters {
real<lower=0> beta[2];
}
model {
beta ~ uniform(0,10);
for (record in 1:n){
y[record] ~ binomial_logit(1, beta[1]*x1[record] + beta[2]+x2[record]);
}
}
"
smodel = stan_model(model_code = scode)
sfit = sampling(object = smodel,
data=list(n=dim(mydata)[1], y=mydata$y, x1=mydata$x1, x2=mydata$x2))
summary(sfit)$summary
Note that this is rather inefficiently coded (the loop over records and not using a matrix of covariates so that we can just do matrix multiplication to get the "xbeta"s).
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 1.166326 0.012648042 0.5134742 0.215388737 0.80786712 1.12813511 1.5047327 2.2471028 1648.1265 1.001271
beta[2] 0.122852 0.002284301 0.1072845 0.004358814 0.04249053 0.09305318 0.1763618 0.4019377 2205.8067 1.002139
lp__ -33.951458 0.041777118 1.1712864 -37.180331160 -34.42739042 -33.57252903 -33.1131970 -32.8095330 786.0483 1.004368
General minimization tools such as PyTorch
If you don't mind not having credible intervals/standard errors, then using any generic minimization tool and defining a suitable averaging function works. E.g. you can use a tool like PyTorch like this, if you want no intercept term and not just positivity, but also a sum-to-zero constraint:
class MyAverager(nn.Module):
def __init__(self, n_models, n_targets):
super(MyAverager, self).__init__()
self.betas = nn.Parameter(torch.randn(size=(n_models, n_targets)))
self.softmax = nn.Softmax(dim=0)
def forward(self, inputs):
# Assume input tensors indexed as sample, model, target
# self.betas for weights indexed as model, target
wgts = self.softmax(self.betas)
x = torch.mul(inputs, wgts).sum(dim=1)
return x
And like this if you only want the positive coefficients (but no sum-to-zero constraint) and no intercept:
class MyAverager(nn.Module):
def __init__(self, n_models, n_targets):
super(MyAverager, self).__init__()
self.betas = nn.Parameter(torch.randn(size=(n_models, n_targets)))
def forward(self, inputs):
# Assume input tensors indexed as sample, model, target
# self.betas for weights indexed as model, target
wgts = torch.exp(self.betas)
x = torch.mul(inputs, wgts).sum(dim=1)
return x
Obviously, there's nothing unique about PyTorch that enables this and you can just as easily do this with Keras / TensorFlow, or if you are not a Python user with the corresponding R (or whatever else you are using) packages.