1

I am not professional user of R (but interested in) and I have data set of 900 patients and only 19 of them were positive in a analysed variable (disease progression), and I have many parameters for progressed and non-progressed patients (majority of them are categorical).

I need to analyse specific features of progressed group. Please advise me what statistical methods are appropriate for such disproportional data set?

  • 2
    Can you please tell us some more context? How many, and what kind of variables did you measure? What is the goal? Which disease? ... – kjetil b halvorsen Jan 29 '20 at 23:57
  • 2
    No matter what you do, there is a limit to how much you can learn from only 19 positive cases. Related: [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) – Stephan Kolassa Jan 30 '20 at 08:43
  • 1
    If you are only interested in the progressed group, the number in the unprogressed group is irrelevant. But what do you want to fiigure out? Usually, you would want to compare groups. – Peter Flom Jan 30 '20 at 13:59
  • Dear collegues good day and thank you for your responces. I need to describe data base more preciselly. I measured 35 parameters in 900 cancer patients. Variables are classicas such as age, menopausal status (yes/no), tumor size, mts lymph nodes, surgery type, cancer type and grade, etc. As you see some of variables are numerical and some are categorial. 19 patient progressed afer surgery. I have not aim to create a predictive model but to test a hypotesis that progressed patients are differ from non-progressed in some parametres. – trotsenkoivan Jan 30 '20 at 20:54
  • And finally the question is: how to compare gruops correctly in unbalanced dataset? Or if I found differences how to chech that they are not random? Thank you – trotsenkoivan Jan 30 '20 at 21:00

0 Answers0