I was looking at this question to help determine a good range of values to grid search to optimize a support vector machine classifier. In the second answer the poster links to this paper (section 2.3.2) in which they describe that a good default value for the Cost parameter for an SVM is "the inverses of the empirical variance in feature space, which can be calculated by $s^2 = \frac{1}{n} \sum_i K_{ii} - \frac{1}{n^2}\sum_{i,j} K_{ij}$ from an $n×n$ kernel matrix $K$. "
I'm not a mathematician, and I am really struggling to understand what exactly is symbolized in this equation. When they say "feature space" I assume they are talking about the matrix of features and their values for each case. i.e. I have 10 features an 10 cases, so I have a 10x10 matrix in feature space. Is that a correct interpretation? I believe that the $\sum_i$ is a summation from $i$ to $n$ but I am not sure what is meant by the double subscript $i$ in $K_{ii}$ or $K_{i,j}$ . Perhaps this has something to do with the kernel matrix that is used in SVM that I am not understanding. Any help is greatly appreciated.