1

There is a standard relationship between the probability distribution function and the cumulative probability distribution function of a random variable :

  • We can obtain the cumulative probability distribution function by taking the integral of the probability distribution function.

  • We can obtain the probability distribution function by differentiating the cumulative probability distribution function.

However, I am interested in knowing if these relationships hold in the case of empirical probability distribution functions (i.e. non-smooth functions which do not have clear definitions of derivatives and integrals). I will illustrate my question using the R programming language.

Suppose I measure the weight (lbs) of 100 students:

set.seed(123)

weight = rnorm(100, 200,10)

weight

weight
  [1] 194.3952 197.6982 215.5871 200.7051 201.2929 217.1506 204.6092 187.3494 193.1315 195.5434 212.2408 203.5981 204.0077 201.1068 194.4416 217.8691 204.9785
 [18] 180.3338 207.0136 195.2721 189.3218 197.8203 189.7400 192.7111 193.7496 183.1331 208.3779 201.5337 188.6186 212.5381 204.2646 197.0493 208.9513 208.7813
 [35] 208.2158 206.8864 205.5392 199.3809 196.9404 196.1953 193.0529 197.9208 187.3460 221.6896 212.0796 188.7689 195.9712 195.3334 207.7997 199.1663 202.5332
 [52] 199.7145 199.5713 213.6860 197.7423 215.1647 184.5125 205.8461 201.2385 202.1594 203.7964 194.9768 196.6679 189.8142 189.2821 203.0353 204.4821 200.5300
 [69] 209.2227 220.5008 195.0897 176.9083 210.0574 192.9080 193.1199 210.2557 197.1523 187.7928 201.8130 198.6111 200.0576 203.8528 196.2934 206.4438 197.7951
 [86] 203.3178 210.9684 204.3518 196.7407 211.4881 209.9350 205.4840 202.3873 193.7209 213.6065 193.9974 221.8733 215.3261 197.6430 189.7358

1) Based on this data, I can "calculate" the empirical probability distribution function of this data:

ecdf <- ecdf(weight)
plot(ecdf, verticals = TRUE, do.points = FALSE)

enter image description here

2) Using Kernel Density Estimation, I can also approximate the probability distribution function of this data:

library(ks)
fhat <- kde(x=weight, positive=TRUE)
plot(fhat, col=3)

enter image description here

My Question: Given some observed data, we can estimate the empirical cumulative probability distribution function and the empirical probability distribution function:

enter image description here

But does the same "integral-derivative" relationship hold true between both of these distributions?

Thanks!

stats_noob
  • 5,882
  • 1
  • 21
  • 42
  • Does anyone know why this question was closed? I don't think it's related to the tagged question? Thank you – stats_noob Nov 09 '21 at 17:12

0 Answers0