I see image-classification models from torchvision package don't have a softmax layer as final layer. For instance, the following snippet easily shows that the resnet18 output doesn't have a sum = 1, thus the softmax layer is certainly absent.
from torchvision import models
import torch
model = models.resnet18(pretrained=False)
x = torch.rand(8,3,200,200)
y = model(x)
print(y.sum(dim=1))
So, the question is, why pytorch vision does not put a softmax layer in the end? And how much putting a softmax layer can improve performance? And why?