There are two good choices for you. The first, and it is the lesser of the two, is step-wise regression. It will go through your model, gradually altering the model, looking for the best model as measured by some criterion such as the AIC or BIC. The second is better for theoretical reasons, amongst others. It is to solve a combinatoric solution.
You could construct a sequence of do loops or for-next loops that includes or excludes all possible combinations of variables. This is doing the same thing as step-wise, but you are covering every combination. Why this is of added value is that if there is a theoretical reason for some variable to be in the model, then you can force it to be in your model. Instead of using a loop to include it or exclude it, it must be there.
You should then calculate either the AIC or the BIC. The model with the smallest value of the information criterion you chose would win. They differ in how they penalize model complexity. With small sample sizes, the AIC will tend to overfit.
What does matter with the AIC or the BIC is the relative ranking of the set of models. If you had different data sets and different problems, an AIC of -23 for one problem couldn't be compared to an AIC of -22 for another. However, with the same data set facing the same problem, a -23 is a better model than a -22.
The difference in calculation between the two methods is how they penalize added model structure. The AIC adds a penalty of $2k$ to its value, where $k$ is the number of independent variables. A model with two independent variables has a penalty of 4, while one with three has a penalty of six. The BIC adds a penalty of $log(n)k$, where $n$ is the number of observations in the sample. As a rule of thumb, their results will be highly concordant.
A way to think about the trade-off between the AIC and the BIC is in what they are trying to accomplish. The BIC grants equal prior weight to each model, while the AIC grants lower prior weight to models with more parameters. The AIC does this by directly penalizing structure by having a fixed penalty for $k$. The BIC is making a tradeoff between sample size and structure.
If you have a small sample, it may improve the predictive power of the model to add a variable as you are increasing the amount of natural variation for the model to work with. On the other hand, once the sample size starts to become large, two variables that covary may be providing mostly the same information. It can cause the model to be a worse model because the effect of the added information from an added variable is being offset by the effect of colinearity to the point that it turns into noise.
In any case, the reason to use stepwise is that it is almost certainly a built-in function of your software. The reason to be combinatoric is that it is better. In any case, do not use p-values as a criterion even though they will be correlated with the AIC and BIC.