Consider a number $n$ of independent, normally distributed variables $Y_1$, $Y_2$, ..., $Y_n$. Consider also the $n$ variables defined by the average of the last $h$ terms, $X_h = \frac{\sum^{n}_{k = n -h + 1} Y_k}{h}$ for $h \in \{1,.., n\}$. The variables $\sqrt{h}X_h$ are normally distributed.
I'd like to understand what is the probability of having $P\{\sqrt{h}X_h > T \text{ or } \sqrt{h-1}X_{h-1} > T \text{ or .. or } X_1>T \}$, in words having one or more of the $\sqrt{h}X_h$ to output above a threshold value.
Do you have any reference or suggestions about this problem? I can work out the $n = 2$ case through geometry working on the expression $P\{Y_1 + Y_2 > \sqrt{2}T \, |\, Y_1<T \}$. However I soon get lost into integral hell when dimension increases.
Thank you
EDIT:
I add my solution when $n = 2$.The probability we are looking for is equal to one minus the probability for all $\sqrt{h}X_h$ to output under threshold $T$, so:
$$ P = 1 - p(Y_1 + Y_2 < \sqrt{2}T\,|\,Y_1< T)p(Y_1<T) $$
We can calculate the term $ p(Y_1 + Y_2 < \sqrt{2}T\,|\,Y_1< T) $ geometrically (see picture), through:
$$ p(Y_1 + Y_2 < \sqrt{2}T\,|\,Y_1< T) = \int_{-\infty}^{\sqrt{2}T -T} \text{d}x N(x) + \int^{\infty}_{\sqrt{2}T -T}\text{d}x N(x)\Big[ 1 - \int^{T}_{\sqrt{2}T -x}\text{d}y N(y)\Big]=\\ = 1 - \int^{\infty}_{\sqrt{2}T -T}\text{d}x N(x)\Big( F(T) - F(\sqrt{2} T - x) \Big)=\\ = 1 - \Phi(T)\big(1 - \Phi(\sqrt{2}T - T)\big) + \int^{\infty}_{\sqrt{2}T -T}\text{d}x N(x) \Phi(\sqrt{2} T - x) $$
Naming $H(T)$ the last integral term, the probability we are looking for become:
$$P = 1 - \Phi(T)\Big( 1 -\Phi(T)\big(1-\Phi(\sqrt{2}T - T)\big) + H(T) \Big)$$
If $T = 3$, we find a probability $P \sim 2.5\%$ which implies a value larger than the probability $1 - \Phi(T)$ by a factor $\frac{P(T)}{1-\Phi(T)} \sim 1.823$.
Where we noted with $\Phi$ and $N$ the normal partition and probability density function respectively. This seems confirmed by simulations, in python:
import scipy.stats as sts
import numpy as np
N = 1000000
M = 2
host_matrix = np.zeros((N,M))
for n in range(N):
arr = sts.norm(0).rvs(M)
wind = np.cumsum(arr)/np.sqrt(np.arange(M) + 1)
host_matrix[n] = wind
Here a routine for calculating the value of the formula above:
def fp_prb(thr):
import scipy.integrate as integrate
def hard_term_f(x, T):
return sts.norm.pdf(x)*sts.norm.cdf(sqrt(2)*T - x)
hard_term = integrate.quad(lambda x: hard_term_f(x,thr),thr*(sqrt(2) - 1),+np.inf)[0]
F_t = F(thr)
F_sqrt = F(thr*(sqrt(2)-1))
prb = 1 - F_t*( 1 - F_t*(1 - F_sqrt) + hard_term )
return prb
The observed frequency and probability ratio can be computed through:
thr = 3
no_fp = len(np.argwhere((host_matrix[:,0] < thr ) & (host_matrix[:,1] < thr )))
fp_freq = 1 - no_fp/N
norm_freq = 1 - sts.norm.cdf(thr)
print('FP frequency: {:.4f}'.format(fp_freq))
print('Normal detections frequency: {:.4f}'.format(norm_freq))
print('Ratio: {:.3f}'.format(fp_freq/norm_freq))
An example output:
FP frequency: 0.0025
Normal detections frequency: 0.0013
Ratio: 1.841