Feature selection should be done before CV. If you select features during CV, then they will change depending on the training data -- there are techniques that exploit this but at the beginner level, you should first consider selecting features before CV.
Splitting the data into a fixed portion for training and testing is also inefficient.
Instead do this:
Select features that best predict class membership or best predict the function using the entire dataset. Note, I always like to use a separate feature $filtering$ method to identify informative features prior to and separately from classification, in order to minimize selection bias. Recurrent feature selection or $wrapping$ uses the classifier to select features -- and commonly has greater risk for selection bias, so filtering is better (less biased). Separating the feature selection filtration from the classification step is very beneficial when generalizing results from future data not used for training/testing -- so I always keep the two ``far removed'' from one another. (that is, I don't want the classifier to select any features). Use, for example, statistical hypothesis tests (t-test, Mann-Whitney test, F-test, Kruskal-Wallis test), or information gain (entropy), or Gini index for feature filtration (selection).
Divide the objects uniformly into ten folds $\mathcal{D}_1, \mathcal{D}_1,\ldots,\mathcal{D}_{10}$
First, train with objects in the 9 folds $\mathcal{D}_2, \mathcal{D}_3,\ldots,\mathcal{D}_{10}$ and test objects with the trained system in fold $\mathcal{D}_1$.
Next, train with objects in the 9 folds $\mathcal{D}_1, \mathcal{D}_3,\ldots,\mathcal{D}_{10}$ and test objects with the trained system in fold $\mathcal{D}_2$.
Repeat the above up to a point where objects in fold $\mathcal{D}_{10}$ are tested and
9 folds $\mathcal{D}_1, \mathcal{D}_2,\ldots,\mathcal{D}_{9}$ are used for training.
For each object in each test fold, increment the confusion matrix $\mathbf{C}$ (with dimensions $\Omega \times \Omega$) with a one in element $c_{\omega,\hat{\omega}}$, where $\omega$ is the true class of the object and $\hat{\omega}$ is the predicted class. Thus
After each 10-fold CV, total accuracy for classification is the sum of the diagonal elements of $\mathbf{C}$, divided by the total number of objects, i.e., $Acc=\sum_\omega^\Omega c_{\omega\omega}/n$.
Note that the above methods are called a 10-fold CV. You should next $repartition$ the objects into 10 folds again but this time after randomly shuffling (permuting) the order of all objects, then repeat the above 10-fold CV. This will ensure that objects assigned to the folds are different. Repartition ten times, each time performing a 10-fold CV, then calculate total accuracy. This will then be called a ``ten 10-fold CV''.
Once you perform ten 10-fold CV, you can select features after, for example, mean-zero standardizing, normalizing into range [0,1], or fuzzifying. The key point is that, on average, classification accuracy will change with the features used. But first, get a handle on classification accuracy using the first group of features. Then, if you want to select features a different way (maybe after transforming their values), then run a complete ten 10-fold CV every time you change the features.
For accuracy determination following ten 10-fold CV, use $Acc=\sum_\omega^\Omega c_{\omega\omega}/ \sum_\omega^\Omega \sum_\hat{\omega}^\Omega c_{\omega,\hat{\omega}}$, which is equal to the sum of the diagonal elements of $\mathbf{C}$ divided by the sum of all elements of $\mathbf{C}$.