If you are going to use SMOTE, it should only be applied to the training data. This is because you are using SMOTE to gain an improvement in operational performance, and both the validation and test sets are there to provide an estimate of operational performance. In the case of the validation set it is so that we can choose hyper-parameters that give the best operational performance. In the case of the test set so that we have an unbiased estimate of how well the system will perform in operational use.
For retraining on the amalgamated train, validation and test sets afterwards, then you need to apply SMOTE to those in the same way that you did in the training set. However, if any of your hyper-parameters are sensitive to the size of the training set (and regularisation parameters will be) then you need to do the model selection again, so I would just amalgamate the training and test sets (and apply SMOTE to them) and perform the model selection again using the validation set for tuning the hyper-parameters without SMOTE so you can estimate operational performance.
SMOTE basically does two things: Firstly it resamples the dataset to give greater promenance to the minority class. This is basically saying that misclassifying a minority class example as belonging to the majority class is a worse kind of error than misclassifying a majority class example as belonging to the minority class. This is basically cost-sensitive learning. Most modern classifiers (and a lot of old ones as well) can deal with different misclassification costs in a different way, by weighting the examples from each class differently in the cost function, or by making a probabilistic classifier and changing the threshold value away from 0.5 to some lower probability (assuming the minority class is the "positive" class). That is likely to be rather more efficient as it doesn't increase the size of the training set.
The other thing SMOTE does is to apply some regularisation. It "blurrs" the training data by adding synthetic examples that conceal the exact location of the training examples and makes them harder to memorize (i.e. it mitigates overfitting, which can be a problem if you heavily weight a small number of minority class examples). However, again most modern classifier systems (and a lot of old ones) have built in forms of regularisation that are likely to be better. The form of regularisation used in SMOTE is a bit weird in that it implies linear structures in the data that aren't part of the data generating process - just adding noise to the training examples would have a similar effect, with a more regular blurring.
In short, if you are using a modern classifier system (or a good old one like regularised logistic regression), and using it well, then SMOTE is probably not going to help much (and may make things much worse - YMMV).
The real question is deciding what performance criterion is likely to be relevant for your application, so rather than use accuracy, minimise the expected misclassification loss (taking into account the different false-positive and false-negative costs). If the misclassification costs are not equal, accuracy is not a good performance metric‡!
‡ For applications where the false-positive and false-negative costs are equal and minimising expected loss is the goal, then accuracy is a good criterion for performance evaluation (but maybe not for model selection, which is not the same thing, see my answer here: Why is accuracy not the best measure for assessing classification models?).