2

I have morphological data from two different determined groups (It and Nd), where the variables are heterogeneous (continuous, semi-qualitative, binomial). I want to know if the groups differ morphologically and if so, which variables account most for the difference between them.

For this I thought using gowdis() {FD} and then performing a Principal Coordinate analysis (PCoA, which is equivalent to a MDS) using cmdscale() {MASS} for my analysis. The use of a Gower distance should allow me to use the complete data set, with all the different kind of variables. Most of the examples found, so also the R-help, concerned ecological data (like dune and dune.env).

However, I have following questions:

  1. Is my basic thinking correct? Somehow, I would have expected to use some kind of constrained multivariate analysis.

  2. I have quite a lot of missing values, which are not only linked to a specific sample or variable (a part of my data set is from a previous study). How can I handle it? From my reading, I thought that Gower distance can handle missing values but see 3)

  3. How can I manage the warning missing species scores? Because of this, I cannot plot the centroids of my variables nor create an arrow showing the impact of variables. As far as I know, this issue is linked to the missing values.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
Bettina
  • 21
  • 3
  • 1
    Although nobody can prohibit you doing PCoA with Gower metric, it is not warranted from the view of geometrical correctness. Gower isn't euclidean or even metric distance. PCoA is basically a metric MDS based on PCA; it is appropriate mainly to euclidean distance. I may recommend you to use nonmetric MDS with Gower: you are likely to get better, more interpretable results. – ttnphns Dec 05 '13 at 14:05
  • 1
    Gower "handles" missing data in a manner similar to "pairwise deletion" (as far as I know). If you have many missing values that will distort the matrix of distances. Consider doing some imputation (for example hot-deck imputation) of the missing. – ttnphns Dec 05 '13 at 14:09
  • But I was not questionning your intention of doing MDS. It might be that MDS is not what you really need. – ttnphns Dec 05 '13 at 14:11
  • Hello all! Thanks for these answers. So maybe you can help me...I'm pretty new in R and far from beeing an expert with multivariate analysis. – Bettina Dec 09 '13 at 09:18
  • 1) I can understand your remark about the compatibility between Gower and the metric MDS. I will try to use a nMDS instead. 2) for the missing variables, I was thinking of replacing the NA with the group mean in cases where less than 25% of values are missing. I will take a closer look at the variables having a lot of missing values and maybe reject some (>50% of NA). – Bettina Dec 09 '13 at 09:24
  • 3) However, I'm wondering which other method you have in mind? Even if it would be a great deal to handle all the variables (including the descriptive ones), I'm actually considering to let them out from my classification/discrimination, to allow some simpler method. I was thinking of a DFA (lda or qda??), but most of my variables don't show a normal distribution, i.e I went for non-parametric tests. As you see, I still welcome remarks and help! #Bettina – Bettina Dec 09 '13 at 09:28

0 Answers0