3

I have a dataset with two features and one outcome. I was asked to separate the data into three parts such that 70% of the data is a training set, 20% is for validation and 10% for testing. The model will be linear regression.

Why would I need both a validation set and test set here? I am not selecting a type of model or tuning hyperparameters.

There are no options to select a model as it is a linear regression in the form of $y = b_1a_1 + b_2a_2 + b_3$, where we will get $b_1$, $b_2$, and $b_3$ from the training set. I will test the model with the test set and report the error. So what is the need for a validation set?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 2
    http://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set – Alex R. Aug 30 '15 at 02:39
  • Welcome to the site, @SwethaTanamala. I believe you will find the information you need in the linked thread. Please read it. If you still have a question afterwards, come back here & edit your Q to state what you've learned & what you still need to know. Then we can provide the information you need without simply duplicating material elsewhere that already didn't help you. – gung - Reinstate Monica Aug 30 '15 at 02:55
  • 1
    @gung , actually I read the answer you have posted .. I understood that one ,, Now I edited the question , so please answer this question – Swetha Tanamala Aug 30 '15 at 03:04
  • Can you clarify what you are asking that is distinct from the linked thread? I have trouble following your question. Can you state what you understand from there & what you still need to know? – gung - Reinstate Monica Aug 30 '15 at 03:07
  • 1
    @gung Actually in my data set two features and one response is there .., I need to do a linear regression for that ..so I use a regress (matlab inbuilt function) to get the y = b0 + b1*a1+b2*a2 , then I will b0 , b1 and b2 from my training set itself .. then I will check with this from my testing test .. then there is no need of validation set right..?? – Swetha Tanamala Aug 30 '15 at 03:14
  • 1
    I think you are right; if you only have one model and you don't have to tune hyperparameters (like e.g. the capacity of a vector machine) then training and test are sufficient, but only in that context, see http://stats.stackexchange.com/questions/168807/why-splitting-the-data-into-the-training-and-testing-set-is-not-enough/168815#168815 –  Aug 30 '15 at 08:10

0 Answers0