1

I have some weird data that I don’t know how to treat them!

I have a data of metabolite measurements in different groups of samples with 5 replicate in each group:

Groups      treatment   diet
Group1:     yes         A
Group2:     NO          A
Group3:     yes         B
Group4:     No          B

Each sample has been measured for 500 different metabolites. but the measurement values are so weird since:

  1. The measured values are the signals and not the concentration, which means the metric units are different (i.e. value 2 means totally different in metabolite 1 comparing to metabolite 2).
  2. There are some missing values, which means that it wasn’t possible to detect those metabolites in that specific sample but it doesn’t mean it is zero! E.x. as below.

    samples         metabolite1
    Group1          12374
    Group1          NA
    Group1          NA
    Group1          NA
    Group1          46091
    Group2          128025
    Group2          90689
    Group2          129950
    Group2          76813
    Group2          66439
    

What I want to do:

  1. First, I would like to do a principle component analysis to see if there is any clear separation between the groups.
  2. And then I would like to study if any of the factors: treatment or diet or the interaction has any effect on each metabolite.

What do you suggest me to do with this data?

P.S. I analyze my data in R!

Rozita
  • 11
  • 4
  • Regarding your "missing data", these are *censored* because the values are below the limit of detection. You might find this thread helpful: [How small a quantity should be added to x to avoid taking the log of zero?](http://stats.stackexchange.com/q/30728/7290) More generally, what are you trying to find out from these data? What model do you want to fit? – gung - Reinstate Monica Nov 23 '16 at 15:24

1 Answers1

3

Missing values

As @gung says, a first step would be to go and find out why there are NAs:

Signals instead of concentrations and scaling

Study factors

ASCA (ANOVA Simultaneous Component Analysis), PCA-ANOVA and rMANOVA (regularized MANOVA) could be starting points.

cbeleites unhappy with SX
  • 34,156
  • 3
  • 67
  • 133