I'm making a logistic regression model but am unsure about whether it is right or not to do the following:
I'm trying to predict if a person will buy a high cost hotel, given by hotel_spend > 250
. I know columns such as flight_spend
and vehicle_spend
are acceptable inputs to the model but am unsure if I could use total_spend
as it contains information about hotel_spend
which is used to create the target. This is highlighted by the last row where (hotel_spend == total_spend) > 250
. My head tells me I shouldn't do this as I'm using the hotel_spend
to predict if they will spend a certain amount on a hotel.
I'm looking for advice if this is acceptable or not. In my head I don't think should be done, just looking for other opinions.
flight_spend hotel_spend vehicle_spend total_spend \
20 49 33 102
0 59 0 59
65 100 40 205
150 250 50 450
0 300 0 300
hotel_spend_high_spend_label
0
0
0
1
1