1

I have data from a random sample of hotel bookings. I want to study the cause and effect relationship between number of days in advance that the hotel was booked (or book lag, also called nday_booking_early) and the price paid per night for the booking.

Book lag and price vary when the booking is made on the weekend versus a weekday.

How should I study this relationship? Do I need to run multiple regression only, or experimental design as well?

enter image description here

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • Well, please start by explaining what the variable names mean and what they represent. – Jim May 20 '18 at 14:03
  • @Jim I have updated the variable names. Thank you in advance. –  May 20 '18 at 14:32
  • 3
    You cannot study cause and effect because you do not measure confounding variables. I know straight away that income is a confounder: full time employed people are more likely to book on the weekend because that's their free time, and they are more likely to spend more. Maybe "cause-and-effect" is too ambitious. Why not describe the differences you alluded to earlier with simple mean differences in a regression model? – AdamO May 22 '18 at 12:50
  • In addition to @AdamO 's point it is not clear what "run experimental design" means. – Peter Flom Jul 23 '19 at 11:27

1 Answers1

0

If you simply want to predict "Price per night" by "nday_booking_early" a simple linear regression would be sufficient.

In this case, your prediction ŷ ("Price per night") would be predicted by a value of x ("nday_booking_early") that is assigned a coefficient w e), yielding the prediction formula

ŷ = w * x

This would, of course, neglect all the other information in your table. If you want to predict your variable "Price per night" also by "country_code" and/or "check-in_dayname", you would need to run multiple regression.

Uyt Poit
  • 1
  • 2
  • Thank you for your answer. I have run a multiple regression (price_per_night~Hotel+(checkin_date:n_night)+nday_booking_early). However, there is a correlation between price per night and booking in weekend. nday_booking_early of weekend booking is also higher than weekday. I am confused how to conclude the cause and effect between nday_booking_early and price per night. –  May 20 '18 at 15:34
  • Can you explain what you mean by "conclude the cause and effect"? It's not clear where your confusion lies, which makes it difficult to give a helpful answer. – The Laconic May 20 '18 at 16:08
  • In regards to "cause and effect", any kind of regression will not give you the cause but merely correlation. Eventually, according to Karl Popper, you cannot _prove_ a hypothesis but only _disprove_ it. IMO your problem is **_multicollinearity_** that will screw up your multiple regression, making machine learning a more ideal approach for your predictions. A good discussion about multicollinearity in "traditional statistics" vs. ML can be found here: [link](https://stats.stackexchange.com/questions/168622/why-is-multicollinearity-not-checked-in-modern-statistics-machine-learning) – Uyt Poit May 20 '18 at 20:39