Simulation: Generate random numbers that cluster around an average?

Question

I want to simulate a simple event that has variable empirical result/outcome. Generate random numbers that cluster around an average

For example, let's say we collect the data for how far people can throw a ball. The data may or may not be distributed normally. I want my code to generate a hypothetical throwing distance based on that distribution.

For example, say the mean throw distance is 10 feet, with StdDev of 2 feet. The simulator should generate most throws to be around 10 feet, but once in a while you can generate a 20 ft. distance. There is a probability of each distance that can be calculated? Any idea how I start to model this? I'm not sure what to search for.

I don't want to use a canned package like R, but want to understand this by generating this manually. Excel, Python, etc.

Is this one approach? Area under the curve? If $f(x)$ is the "bell curve" function of throwing distance frequency distribution histogram, and $g(x) = \int_0^x f(x)dx$ is some kind of cumulative density function. Generate a random number from 0 to 1 and and see where it intersects $g(x)$ ?

Better yet, What do you think of the following? I think I can discard the distribution concept, and just model a bell-like frequency histogram in Excel. Using 2 columns of data. a1=3. b1=34. Etc. (3, 34), (5, 45), (7,245)(10,350) (11,240), (12,145), (13,90), (14, 35), (15, 12) ............( 20, 1) In the 3rd column, I can create a cumulative total. From that, I can do a regression and get a function! Then I just take the inverse of that function and use it as a lookup function using f(x), where x is a random number from 0

I'm not sure to understand what do you mean by "manually" and "not using a canned package like R". Most languages have a built-in random numbers generator. If you want to fully manually generate throwing distances, I suggest using a ball and a measuring tape. — Pere, Apr 04 '19 at 13:32
See the response below. I want to make my own function, not use a library function. — JackOfAll, Apr 04 '19 at 13:36
What exactly is your question? Is it [how to generate random numbers "manually"](https://stats.stackexchange.com/questions/247094/generating-random-numbers-manually)? Or you ask about [inverse transform sampling](https://stats.stackexchange.com/questions/184325/how-does-the-inverse-transform-method-work/184337#184337)? Or is it about generating samples from normal distribution? Or about code in Excel for this? — Tim, Apr 04 '19 at 13:50
There are too many possible ways to answer this question. Indeed, you could generate the numbers $x_1,\ldots, x_n$ literally any way you like and then change them to $y_i = \bar x + \rho(x_i - \bar x)$ where $\bar x$ is the sample mean and $|\rho| \lt 1.$ That would cluster the data $(y_i)$ around their mean. Normally, that's not how one builds a successful model or a good simulation. What is needed is an *understanding* of the phenomenon you would like to model that is sufficiently well developed to identify likely distributions for the data. — whuber, Apr 04 '19 at 14:00
As usual, the question has been closed when it was clear - the question could have been answered with an example of how to generate that sample with code. — Pere, Apr 04 '19 at 14:05
Since I had already written my answer, but then the question was closed and I can't post it, I paste it here in order to explain it when the question gets reopened: — Pere, Apr 04 '19 at 14:08
#Linear congruential generator to get uniform distribution x — Pere, Apr 04 '19 at 14:08
You've got to be kidding me. The question is CLEAR AS DAY. How can you not understand what I am asking? I went out of my way to make the question crystal clear. Great case of OCD overzealous moderation obstructing discussion. Gimme a F'ing break. If YOU don't understand the question, that's not our problem, just get out of the way. — JackOfAll, Apr 04 '19 at 15:09
@Pere As usual, the clarity of the question depends on the experience of the reader. Those with little experience may see only one possible interpretation--but when one's job is to consider how *all* readers might react, it becomes possible to recognize that they all could interpret it differently, resulting in confusion for everyone. Jack, if you aren't sure how to clarify and narrow your question in response to Tim's previous comments, then please feel free to ask for more information and to explore our [help] for information about asking good questions. — whuber, Apr 04 '19 at 19:55
Very often, original posters don't know enough about the subject to formulate a clear question, but from the question can be seen what they need to understand, and even try to writer an interesting answer for other readers arriving to the question - guided by the question caption. I'm used to answer questions from my on-line and off-line students - which often involves guessing what they need to know, even when they make an effort to be clear - but I don't think that gives me an special superpower to understand unclear questions, compared to other users to the site. — Pere, Apr 04 '19 at 20:13
Furthermore, it's very frustrating to spend the time writing and answer and then not being able to post it because somebody has decided that the question is not clear enough to allow other people to answer. — Pere, Apr 04 '19 at 20:16

score 1 · Answer 1 · answered Apr 04 '19 at 13:22

1

In R, this call would return n numbers randomly chosen from a Normal distribution with mean 'm' and sd 's':

rnorm(n, mean = m, sd = s)

You did say that your data may not be normally distributed. An alternative would be to randomly choose a distance from the data you already have (sampling with replacement).

answered Apr 04 '19 at 13:22

Jonathan Moore

251
1
7

Is there a way to do it "manually", so I truly understand the process (ad how rnorm is implemented) ? I'd like to code this by hand. – JackOfAll Apr 04 '19 at 13:35
Here is the source for rnorm: https://github.com/SurajGupta/r-source/blob/master/src/nmath/rnorm.c – Jonathan Moore Apr 04 '19 at 14:45
I think the easiest way to do it manually would be to sample from the empirical data set of known distances. – Jonathan Moore Apr 04 '19 at 14:48

Simulation: Generate random numbers that cluster around an average?

1 Answers1