1

Consider the following simulation experiment:

In order to estimate the area ($\theta$) of a circle drawn somewhere on a square of known side=10, a collection of random vectors $(X_i,Y_i), i=1,...,10$ are generated. Each $X_i$ and $Y_i$ are iid $\text{Uniform}[0,10]$.

From this, iid random variables $Z_i$ are obtained. Each $Z_i$ is $1$ if $(X_i,Y_i)$ is within the circle and $0$ otherwise.


I asked a version of this questions earlier and the comments below have been very helpful in getting me to an answer. However, while reviewing it again today, I encountered an issue that I would like to ask about.

From comments below, I got that the $Z_i \sim \text{Bern}(\theta/100)$ where $\theta$ is the area of the circle. Using this, I was able to get the estimator $100\frac{\sum_{i=1}^{10} z_i}{10}$ and prove that this is unbiased for $\theta$.

I was also able to show that $\sum_{i=1}^{10} Z_i$ is a sufficient complete statistic for $\theta$ and so $100\frac{\sum_{i=1}^{10} z_i}{10}$ is UMVUE for $\theta$.

However, if I use this estimator I can end up with an estimate of the area that is not possible (one larger than the largest possible circle). That is if all or almost all my points happen to fall in the circle (8, 9 or 10), I can get an estimate that's $(>25\pi)$.

My question is this: why isn't $25\pi\frac{\sum_{i=1}^{10} z_i}{10}$ a better estimator of $\theta$? This never gives an impossible value.

If I let my $Z_i's$ be $Bern(\frac{\theta}{25\pi})$ I think the rest of the solution for all of the requirements of the problem will follow, but I can't seem to justify that when the instruction of the problem is to have $Z_i$ be the indicator function that the point $(X_i,Y_i)$ is in the circle.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
user164144
  • 1,077
  • 7
  • 18
  • Are you sure the inicators are not intended to signal if the ramdom point is inside the circle? – Matthew Drury Jun 20 '17 at 00:43
  • If this is an inscribed circle then the radius is 5 and the area is 25 $\pi$. If you want to estimate it generate pairs of independent uniforms on [0, 10]. So what you are describing is to count the number falling inside the circle dividing by the number of pairs and then multiplying by 100 (the area of the square). You don't need to find the center to estimate the area. The points you generate fall uniformly inside the square. So as Matthew says you need to find the number inside the circle. – Michael R. Chernick Jun 20 '17 at 00:57
  • I am not sure. I did not think of it that way and may have made a mistake. the notation just says (Xi,Yi) $e$ C so I assumed that it was on the circle. What would be a good approach if the function did signal that the point was in the circle? – user164144 Jun 20 '17 at 00:59
  • So the estimator is W=[($\sum$Zi)/10]100 = 10($\sum$Zi)? Shouldn't I multiply by 25$\pi$ instead of 100 since 25$\pi$ is the maximum area possible for the circle? – user164144 Jun 20 '17 at 01:10
  • Also, does this mean that my Zi's are Bernoulli (A/100)? – user164144 Jun 20 '17 at 01:37
  • Besides the direct information in the linked duplicate (which is not an identical question but contains information which answers yours) there's some useful information in [this question](https://stats.stackexchange.com/questions/193990/approximate-e-using-monte-carlo-simulation). If you need something not covered by the duplicate or the wikipedia introduction, please ask a specific question. – Glen_b Jun 20 '17 at 01:47
  • That is true but it doesn't get you anywhere in computing A. – Michael R. Chernick Jun 20 '17 at 01:48
  • @Glen_b Unfortunately, this is not in a textbook that I have with me. It's a sample problem that I am working on as part of a review for some exams that I need to take in a couple of months. I already took graduate courses of prob and stat years ago but my work is mainly on application (stat consulting) so it's been a while since I did something like this. Thankfully, you and the others have been very helpful and I am very grateful for this community. – user164144 Jun 20 '17 at 01:49
  • @MichaelChernick. If the Zi's are Bernoulli (A/100) then that means I can find an MLE for A directly, correct? – user164144 Jun 20 '17 at 01:55
  • The method described already gives you the MLE. I think the value is that you can get approximately the standard deviation for the estimate to see how accurately you estimate the area. Which could also be used to determine how many points you need to reach a predetermined accuracy. – Michael R. Chernick Jun 20 '17 at 02:03
  • Wait -- you're studying for exams without any study material to learn the curriculum from? – Glen_b Jun 20 '17 at 04:24
  • @Glen_b Not exactly. I've taken the subject before but it has been years since I last encountered stat problems at this level (my work is much more basic). So I do have my old Casella and Berger as my main reference alongside the internet and you guys (God bless you). – user164144 Jul 09 '17 at 12:30
  • This is a much better version of the new question, thanks. I've made some additional edits. Please check it still conveys your intent. – Glen_b Jul 10 '17 at 03:12

1 Answers1

2

While it never gives an impossible value for the area, it's certainly no longer unbiased (its expectation will be $\frac{\pi}{4}\, \theta$, about 22% too small on average). Bias isn't necessarily a bad idea, but in this case that's a lot of bias in cases where the estimate was not an issue.

For example you might easily have six points in the circle (a reasonably likely number to come up if the circle isn't small - say about half the area or more).

ten points in a square, with six landing inside a circle that's roughly half the area of the square

The unbiased estimator you started with would estimate the area to be 60, but your scaled down estimator would say it was about 47.

Note further that you're much more likely (about 1.4 times as likely) to get 6 points in there when the area is near 60 than when it's near 47.

For fewer than 8 points it doesn't seem to make a lot of sense to scale it down like that - certainly not that much, though you may be able to reduce MSE if you shrink it a little (I haven't checked this though). For 8+ points falling within the circle, there's a clear argument for not saying the circle is larger than $25\pi$ (you know for sure it isn't) -- and your estimate cannot be worse (cannot be further from $\theta$) for doing so. You would lose unbiasedness by never saying more than $25\pi$ but you'd reduce the variance substantially.

(You might like to consider what the MLE of the area would be.)


Of course in large samples -- ones large enough for this to be a practical tool -- impossible values will become so rare as to make it essentially a nonissue. I do a lot of simulations, and for me $n=10^4$ when estimating a proportion like this is usually something I see as "too small", except as a rough feasibility check. I often do $10^6$ or $10^7$ simulations for proportions if they're not really close to a boundary - sometimes more.

Glen_b
  • 257,508
  • 32
  • 553
  • 939