I do not think that this is a difficult question, but I guess someone needs experience to answer it. It is a question that is asked a lot here, but I did not found any answer that explains the reasons of choosing an appropriate ML algorithm.
So, let's suppose we have a set of data. And let's suppose I want to do clustering (This could be classification or regression if I also had labels or values or my training set data).
What should I consider before choosing an appropriate algorithm
? Or I just choose algorithms in random?
In addition how I choose any data preprocessing
that can be applied at my data? I mean are there any rules of the format "IF feature X has property Z THEN do Y
"?
In addition are there any other things except preprocessing and choosing my data that I miss and you want to advice me about them?
For example, lets suppose that I want to do clustering. Is saying "I will apply k means at that problem
" the best approach? What can improve my performance?
I will chose as best answer the answer that is much more justified and explains everything that someone should consider.