Please help in the following problem: I have a set of data consisting of daily temperatures, gathered every hour, togheter with daily energy consumption. Given the temperature forecast on the next day, I need to find the most similar days taking into account the temperatures for finding out the probable energy consumption.
The dataset consist of data for 3 years with missing data. I have tried to use timeseries analysis, but the estimated values are pretty far from the real ones, so I need an another approach.
My thoughts:
- using some kind of similarity distance for this finding the most similar day and using the coefficient for adjusting the probable energy consumption. Do I need cosine similarity or the euclidean distance is enough?
- clustering...somehow...what kind?
Please advise.
Thank you,