I'm currently doing a ML project (the goal is simply to clean the data set and apply some of the models we learned , like Random Forests, Ensemble learning, etc, and test the results) for a class and I'm cleaning my data set. It's about hotels/homestays and it has a couple of rows that correspond to review scores for different parameters (cleanliness, location, etc). It has another row that corresponds to the number of reviews that a certain hotel has.
The problem is, several hotels/homestays have a value of 0 in the number of reviews (they have no reviews maybe because they are new places??), so in those hotels/homestays there are NaN values in the other reviews columns (like cleanliness, etc).
I'm really torn on how to deal with this NaN. Obviously, dropping the rows is a bad idea, as these observations are about 10% of the overall data, which is quite a lot.
My question is: 1. Should I just assign a certain value (like -1, 0, something like that) to these cells that have a NaN value due to the fact that the corresponding hotels/homestays have 0 reviews, and therefore kind of ''grouping'' all of the places that are new/have no reviews OR 2. Should I try to fill those cells with either the mean of the columns, with interpolation or with a prediction algorithm? If i do this, though, it would make sense to transform the 0's in the number of reviews to NaN and then also filling them with another value, right? Because otherwise I'd have values in the reviews columns while the last_review column would indicate that that place has 0 reviews (which wouldn't make much sense).
Sorry for the long question and thanks in advance for taking the time to read this!!