2

In my project I want to compare different classification algorithms to solve a specific problem with a specific dataset. To do this, I divided the dataset in 2 parts.

  1. With the first (bigger) part I am doing cross validation. In this step I try to find out the best parameters for each algorithm (for example the number of neurons in the hidden layer of a neural network), as well as search for the best threshold for the classification.

  2. I use the second part of the dataset to evaluate and compare the algorithms, using the parameters found on the previously step.

My question: How these two steps (parameter tunning and generalization test) can be mapped in a data mining process (KDD, SEMMA or CRISP-DM)?

Should the two steps be two different iteration of the process or perhaps each step should be a different project/process (with the initial stages in common)?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Can you explain how you think KDD or SEMMA or CRISP-DM is related to the very general procedure of train-test-validate splits? – Sycorax Aug 15 '18 at 23:22
  • @Sycorax At that time I was starting an academic project and I was wondering if I would need to use and describe the data mining process used. Precisely I would like to understand the relationship between KDD and what I was doing (your question). At the end, I did not use any of them directly. My master dissertation is in portuguese but if you want you can check my paper you can access: [http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234&lng=en&nrm=iso&tlng=en] – andrebrujah Aug 17 '18 at 11:47
  • 1
    It sounds like you know the answer to your question; it's perfectly acceptable to write an answer to your own question – Sycorax Aug 17 '18 at 14:32

0 Answers0