1

Please help in the following problem: I have a set of data consisting of daily temperatures, gathered every hour, togheter with daily energy consumption. Given the temperature forecast on the next day, I need to find the most similar days taking into account the temperatures for finding out the probable energy consumption.

The dataset consist of data for 3 years with missing data. I have tried to use timeseries analysis, but the estimated values are pretty far from the real ones, so I need an another approach.

My thoughts:

  • using some kind of similarity distance for this finding the most similar day and using the coefficient for adjusting the probable energy consumption. Do I need cosine similarity or the euclidean distance is enough?
  • clustering...somehow...what kind?

Please advise.

Thank you,

Catalin
  • 121
  • 2

1 Answers1

0

Don't blindly try functions like cosine.

Figure out what is most appropriate to solve your problem. In particular, wrt. missing data.

You also need to decide whether you want squared errors, or not, or how to normalize, or not.

I'd try something like $$d^2(x,y):=\sum_i \begin{cases} |x_i-y_i|^2 & \text{if }x_i\text{ and }y_i\text{ defined}\\ p^2 & \text{if either is a missing value}\\ \end{cases}$$ Where $p$ is a penalty term for missing values, e.g. the average difference of any two defined terms in the data set.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96