I'm trying to find a a measure of "linearity" among different simulated data sets and how I could control for such differences.
This idea came from studying Hastie's and Tibshirani's work around variable selection using the LASSO. A measure that they use a lot is the Signal-to-noise Ratio in order to see in which situations might a procedure or algorithm might perform better. It is defined as such:
$$ SNR = \frac{{\rm Var}(f(X))}{{\rm Var}(\varepsilon)} $$
I'm wondering if there is a measure similar (in spirit) to the SNR that could measure how non-linear a data set is, for simulations studies (meaning that there could be some constructions that are only applied to simulation settings).
The two sources for the SNR that I have read are 1 and 2, in case anyone is interested.
Edit:
Here by non-linear I mean model that can not be explained only by the main effects (linear combinations). In that sense non-linear variables should be added onto the data generating process for the simulation i.e. (log transformations, spline transformations, indicator variables and interactions).
What I am looking for is a way of quantifying and generating those datasets in such fashion as the SNR metric.