What kind of model to use if the dataset are too similar?

Asked Jul 12 '19 at 07:11

Active Jul 12 '19 at 07:11

Viewed 62 times

I am kinda stumped at a certain problem that I am currently having.

So lets say I have 2 different types of categorical data which labels Normal State and Anomalous State. Both data contains continuous data that are from 500 to 800kg.

What happens if the data from Normal State is extremely similar, if not identical to the Anomalous State? Does that mean that my training data is bad and I should just ignore it?

asked Jul 12 '19 at 07:11

Axois

2

It simply means that the weights are similar between the two states, so you will not easily be able to predict the weight from the state, or vice versa. – Stephan Kolassa Jul 12 '19 at 07:23
@StephanKolassa would that mean this will run the risk of underfitting in my model because of the noise present? [link](https://stats.stackexchange.com/questions/416740/how-to-better-separate-the-data-points?noredirect=1#comment777710_416740) If you refer to my other post, i believe that is the problem I am facing right. Is there any way around it? – Axois Jul 12 '19 at 07:28
1

I think [How to know that your machine learning problem is hopeless?](https://stats.stackexchange.com/q/222179/1352) may be helpful. – Stephan Kolassa Jul 12 '19 at 07:32

What kind of model to use if the dataset are too similar?

0 Answers0