0

Assuming I have a dataset of temperature data sampled every 5 minutes and I want to find out its mean. If we assume that the data was sampled from a discrete process we can use the arithmetic mean:

$\frac{1}{n}\sum_{i=1}^{n}{x_i}$

However, if we assume that the underlying process in continuous, the mean would be the definite integral:

$\frac{1}{t_n-t_o}\int_{t_1}^{t_n}{f(t) dt}$

where $t$ represents the time and $f(t)$ the corresponding temperature at that time.

My question is, assuming I can approximate $f(t)$ quite good, is it more reasonable to assume a continuous process and calculate the mean accordingly or to assume a discrete process and use the arithmetic mean.

T. Tim
  • 37
  • 5

1 Answers1

1

Well, you have to decide which model you want to assume behind your discrete data-points!

If you simply draw linear lines between your points, then averaging the discrete data-points is almost exactly the same as calculating the area/width. (because the 2 outer most data-points would have half weight)

So it's the method of fitting that makes the difference!

Maybe read this post, where people discuss probability driven fitting of discrete data-points.

KaPy3141
  • 745
  • 4
  • 18
  • Yes indeed when we assume simply a line between the points then the mean will be very similar, however, e.g. looking at the variance this is not the case. _"Well, you have to decide which model you want to assume behind your discrete data-points!"_ This is exactly the question. In case of my temperature data, I would say it is continuous and I would assume a line between the points. However, most of the people just assume a discrete process and use discrete methods and therefore I wonder if this has any deeper reason. And thanks for the post, I'll read it – T. Tim Feb 25 '20 at 11:51