There are very large number of data points ($\sim 10^{100}$), which form a (discrete) joint distribution $(X,Y)$, where $X,Y$ are discrete random variables. Note that we have no knowledge of these distributions. We sample a small number $n$ of data points randomly, and we can calculate Pearson correlation coefficient $\rho_s$ of the samples. Then, how can I infer the PCC $\rho_p$ of the population? In general, what is the relation between $\rho_s$, $\rho_p$, and $n$? If I sample, say, $100n$ data points, then does the sample have PCC close to $\rho_p$? We can assume none of the above distribution to be normal. Can we say anything interesting if we assume $n$ to be less than 20 (if not 50)?
Edit: If you think PCC is not very useful for the case one cannot assume normality of distribution, you can use other quantities such as Spearman's R instead.