Optimum approximate theory D-Optimal design

Question

I am using optFederov, from package AlgDesign, to create D-Optimial designs. However, I am a bit confused on what measure to use as efficiency measure. I'd like to use D-Optimal but I don't know which one is exactly. In the vignette of the package they explain that the determinant of the design is 3.675919 and the optimum approximate theory D is 3.8 making this design about 97% efficient.

1.- But how does he come up with this 3.8? How can I compute it?

2.- When and why should I use D, Ge and Dea?

accoring to the manual D is:

The kth root of the generalized variance: det(M)^(1/k), where det(M) is the determinant of the normalized dispersion matrix M – i.e. M=Z'Z/n, where Z=X[rows,]

kwadratens · Answer 1 · 2021-03-25T12:58:08.547

OK, this is a bit complicated, but i will try to explain some issues here.

First of all, you need to know that you can calculate the theoretical (i.e. the maximum possible) and practically obtained (i.e. the ones that you get in a given configuration of cards and their sets) values for D-efficiency. The "quality" of the plan (for example in DCE method) is assessed with these coefficients, where the practical and theoretical value give information about the extent to which the reduced (partial) card plan will give you similar possibilities of calculating statistical coefficients on the results obtained by full plan method. The case is therefore very complicated; here we have a plan that allows us to "theoretically obtain D = 3.8" and practical D for a specific variant, D = 3.68, which gives an approximate percentage result of 3.38 / 3.80 = 97% of the plan effectiveness.
The most complicated issue is obtaining the maximum theoretical plan, or in other words, the maximum D for a particular plan. The calculation of the maximum D is related to how many main effects and interaction effects in the resulting analisys plan then we want to use. There are many examples of such information obtaining, but all of them are very illegible. Why? Well - we do not know what analysis plan the researcher set up for himself and for what plan he calculated the maximum possible D that could be obtained. In the example above, we can create a matrix only for main effects, for interactive effects, or for first or second order interactive effects. The maximum value of D will vary depending on these findings. I admit that I do not know for what plan this indicator was obtained and I hope that someone has figured out this example with D = 3.8.
Anyway - the matter is complicated by the fact that different programs have different algorithms for measuring D and often the same plans in r or other software will give completely different D and other indicators. The reason is often an encoding issue: orthogonal encoding, dummy encoding, or effect encoding changes the practical D values (you will get different wr values if you encode your data -1, 0 and 1 than if you encode them 1 2 and 3)... which should be interpreted in the context of theoretical D... which in turn depends on the size of your research plan of analyzes.
Due to this chaos, for some time published articles do not interpret D or other indicators, but simply give their raw values. Instead, the researcher tries to justify somehow the research plan he chooses and maximize the practically obtained indicators simply for such a plan. Now it's time for the r exercise in the example above.

For Your example we got in r

library(AlgDesign)
dat <- gen.factorial(5,3)
des <- optFederov(~quad(.),dat,nTrials=15,evaluateI=TRUE)
des$D

which should give us the value of d

[1] 3.675919

Knowing that

The D-efficiency values are a function of the number of points in the design, the number of independent variables in the model, and the maximum standard error for prediction over the design points.

and

The best design is the one with the highest D-efficiency. Other reported efficiencies (e.g. A, G, I) help choose an optimal design when various models produce similar D-efficiencies.

we can check all of posibilities and use best one. In "dat" variable we have full plan of 125 possibilities/cards/research conditions/whatever. So we can check any variant between 5 cards for optFederov (minimal number) and all cards used. We will create an array variable that will allow us to save all the possibilities.

min <- 10
max <- 125
eff_table <- matrix(ncol = 6, nrow = max)
colnames(eff_table) <- c("nTrials","D","A","I","Ge","Dea")
for (loop_num in min:max) {
des <- optFederov(~quad(.),dat,nTrials=loop_num,evaluateI=TRUE,crit = "D",nRepeats = 100)
eff_table[loop_num,1] <- loop_num
eff_table[loop_num,2] <- des$D
eff_table[loop_num,3] <- des$A
eff_table[loop_num,4] <- des$I
eff_table[loop_num,5] <- des$Ge
eff_table[loop_num,6] <- des$Dea
}

Counting will take a while. Check differences with nRepeats and without; In my opinion, we should always use multiple retry, which is with this option. Then we get the best possible configuration in this regard. In

eff_table

You should get all informations for subsequent options for the number of cards. Something like this

 nTrials D A I Ge Dea
(...)
 [14,] 14 3.704358 0.8132031 7.820313 0.893 0.887
 [15,] 15 3.675919 1.2555973 8.848874 0.775 0.749
 [16,] 16 3.666758 1.2910678 8.807469 0.742 0.706
 [17,] 17 3.667087 1.3115654 8.693003 0.704 0.657
 [18,] 18 3.674348 1.1586909 8.544911 0.778 0.752
(...)

You can now choose the best practical option for your research plan and compare the ratios. Note: one can also get with this method estimates for the full factorial plan, which are the maximum values you can get at all with mentioned method. On this basis, you can decide which plan is the best... and calculate the percentage effectiveness ratio in relation to the full plan.

Hope that will help You a bit. Feel free to ask and comment on this.

Additional informations and issues:

The D-efficiency values are a function of the number of points in the design, the number of independent variables in the model, and the maximum standard error for prediction over the design points. The best design is the one with the highest D-efficiency. Other reported efficiencies (e.g. A, G, I) help choose an optimal design when various models produce similar D-efficiencies. https://itl.nist.gov/div898/handbook/pri/section5/pri521.htm
interesting example of D counting: https://legacy.sawtoothsoftware.com/forum/23896/calculate-design-efficiency-manually
D-Efficiency definition and general formulas: D-efficiency is the relative number of runs (expressed as a percent) required by a hypothetical orthogonal design to achieve the same determinant value. It provides a way of comparing designs across different sample sizes. https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/D-Optimal_Designs.pdf
Other ideas about maximum D: How to derive variance-covariance matrix of coefficients in linear regression

Optimum approximate theory D-Optimal design

1 Answers1

Linked