For nested models, the AIC bears a very close relationship to likelihood-ratio testing. For nested models that only differ by a single fitted parameter, the relationship is exact. Putting aside the delta-AIC of 2 rule of thumb for now, if you simply choose between 2 models based on which has the lower AIC, then this is equivalent to basing the choice on a $\chi^2$ test p-value of 0.157. (Note that this is less stringent than a classic p < 0.05 cutoff.) For any given number of differences in fitted parameters with nested models you can easily use the AIC definition to convert the AIC difference to a corresponding log-likelihood difference for a $\chi^2$ test. So with nested models one should expect some relationship between delta-AIC values and the statistics underlying standard significance tests.
It's not completely clear from your answer, however, whether the models being evaluated are actually nested. With 3 predictors A, B, C you could in principle evaluate the null model, each predictor individually, the combinations A+B, A+C, B+C, and the full A+B+C model. If you are evaluating all those possibilities then the models can't all be nested one inside the next.
The parsimony/AIC issue is more complicated. A model close to the model with the minimum AIC could either have more or fewer fitted parameters than the minimum-AIC model. Under the interpretation of a delta-AIC value as estimating the probability of information loss with a different model, models with the same delta-AIC from the minimum model have the same strength of evidence. So choosing the most parsimonious model within a delta-AIC of 2 would be imposing an additional restriction that isn't necessarily in keeping with the information-theoretic basis of the AIC. For example, if you were to do model weighting as proposed at the above link, there would be no reason to prefer a more-parsimonious over a less-parsimonious model having the same delta-AIC.
The Wikipedia page on AIC provides one clue to what might be going on with your observations:
Note that AIC tells nothing about the absolute quality of a model, only the quality relative to other models. Thus, if all the candidate models fit poorly, AIC will not give any warning of that. Hence, after selecting a model via AIC, it is usually good practice to validate the absolute quality of the model. Such validation commonly includes checks of the model's residuals (to determine whether the residuals seem like random) and tests of the model's predictions.
That said, you need to be careful in interpreting p values for individual coefficients in a multiple-regression model. In some circumstances (e.g., highly correlated predictors) one might have a model that is significantly different from the null overall but for which individual regression coefficients don't pass the standard p < 0.05 cutoff. And as noted in a comment, if any predictor is categorical with more than 2 levels, you have to be very careful in just looking at p-values of coefficients.
So one should certainly make sure that a model developed this way is better than a null model, whether in terms of standard significance tests or in the ability to make useful predictions on new data. The p values for the individual predictors, however, should be of less concern.