10

I have an integral equation of the form $$ T_1(x) = \int_0^x g(T_1(y)) \ d\hat{F}_n(y) $$ where $\hat{F}_n$ is the empirical cdf and $g$ is a function. I have a contraction mapping and so I am trying to solve the integral equation by using the Banach Fixed Point theorem sequence.

However, this runs very slowly in R and I am thinking it is because I am integrating using the sum() function for $x \in \hat{F}_n$ over and over again.

Is there a faster way to integrate using the empirical distribution with a function such as integrate()?

mpiktas
  • 33,140
  • 5
  • 82
  • 138
Newbie
  • 101
  • 3
  • 6
    Although this is really an R question rather than a stats question (and therefore probably belongs on stackoverflow)... could you post your code? In R, there are often multiple opportunities to obtain great runtime performance improvements, and w/o seeing the code, it's hard to tell which, if any, might apply. – jbowman Nov 12 '13 at 18:36

1 Answers1

14

Defining the empirical distribution function $$ \hat{F}_n(t)=\frac{1}{n}\sum_{i=1}^n I_{[x_i,\infty)}(t) \, , $$ it follows that $$ \int_{-\infty}^\infty g(t)\,d\hat{F}_n(t) = \frac{1}{n} \sum_{i=1}^n g(x_i) \, . $$ Hence, you don't need to use integrate() to solve this problem. This kind of R code

x <- rnorm(10^6)
g <- function(t) exp(t) # say
mean(g(x))

should be super fast because it is vectorized.

Zen
  • 21,786
  • 3
  • 72
  • 114
  • please note, I have added a related question as to why the integral of a function with respect to the empirical distribution, is the average of function evaluated at the observed points. https://math.stackexchange.com/questions/2340290/proof-that-integral-of-a-function-with-respect-to-the-empirical-distribution-is – texmex Jun 30 '17 at 03:59