Cross-validation for parameter tuning in data mining process (KDD)

Question

In my project I want to compare different classification algorithms to solve a specific problem with a specific dataset. To do this, I divided the dataset in 2 parts.

With the first (bigger) part I am doing cross validation. In this step I try to find out the best parameters for each algorithm (for example the number of neurons in the hidden layer of a neural network), as well as search for the best threshold for the classification.
I use the second part of the dataset to evaluate and compare the algorithms, using the parameters found on the previously step.

My question: How these two steps (parameter tunning and generalization test) can be mapped in a data mining process (KDD, SEMMA or CRISP-DM)?

Should the two steps be two different iteration of the process or perhaps each step should be a different project/process (with the initial stages in common)?

Can you explain how you think KDD or SEMMA or CRISP-DM is related to the very general procedure of train-test-validate splits? — Sycorax, Aug 15 '18 at 23:22
@Sycorax At that time I was starting an academic project and I was wondering if I would need to use and describe the data mining process used. Precisely I would like to understand the relationship between KDD and what I was doing (your question). At the end, I did not use any of them directly. My master dissertation is in portuguese but if you want you can check my paper you can access: [http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234&lng=en&nrm=iso&tlng=en] — andrebrujah, Aug 17 '18 at 11:47
It sounds like you know the answer to your question; it's perfectly acceptable to write an answer to your own question — Sycorax, Aug 17 '18 at 14:32

Cross-validation for parameter tuning in data mining process (KDD)

0 Answers0