Remember that ROC curves are constructed by considering all thresholds, while metrics like accuracy, sensitivity, specificity, precision, and recall only use one threshold. When you configure your software to calculate the precision and recall of the models when the threshold is changed, I would expect you to find that the high-AUC model tends to outperform the low-AUC model.
However, it usually is preferable to evaluate the probability predictions, rather than applying thresholding. Two common ways of doing this are called log loss ("cross-entropy loss" in a lot of neural network circles) and Brier score. Frank Harrell has two good blog posts about this topic.
Damage Caused by Classification Accuracy and Other Discontinuous Improper Accuracy Scoring Rules
Classification vs. Prediction
Stephan Kolassa wrote a nice answer to a question of mine that gets at this topic, too.
Note that strictly proper scoring rules like log loss and Brier score need not agree about which model performs better (fairly easy to simulate), so it should not be expected that AUC and precision or AUC and recall agree on the better model, either.