0

I'm after a nice method to draw from a range of integers given known prior probabilities.

Say I wanted an 80% chance of drawing 1, a 15% chance of drawing 2, and a 5% chance of drawing 3...

I'm obviously thinking about this wrong, since neither

[~,I] = max([.80 .15 .05] .* rand(1,3))

nor

[~,I] = max([.80 .15 .05] + rand(1,3))

appears to achieve this...(rand samples from a uniform distribution)

Does anyone have a suggestion that will extend to any range of integers?

** Additional Comment **

I realise now for rational probabilities it's trivial just to create a vector comprised of these options proportionally (80% 1s, 15% 2s, and 5% 3s), and then just randperm the contents of the vector and choose the first element over and over...

This method is bad if the probabilities do not behave nicely...

NBland
  • 79
  • 5
  • In R, there's the `sample` and `sample.int` functions. e.g. `sample(3,100,replace=TRUE,p=c(.8,.15,.05))` will generate 100 values from the distribution in your question. – Glen_b Aug 13 '17 at 00:56
  • Thanks @Glen_b. I need to implement a solution in MATLAB...and so it would be great to understand how this function in R is achieving this... – NBland Aug 13 '17 at 03:39
  • R is open source; you can read the code but I don't think you'll find it as enlightening in this case as the indicated duplicates. The [table](https://stats.stackexchange.com/a/68041/805) method would probably serve you well. It's quite fast -- you take a longish array and fill it with values in the right proportions so that you are almost sampling with the required probabilities; there's normally a few "leftover" cells at the end which take you to a second step of generation that take you up to the exact probabilities you need (and that step can use any convenient other method) – Glen_b Aug 13 '17 at 05:39

1 Answers1

0

Why don't you split your interval into three pieces of unequal length and assign the corresponding integer. Here the Matlab code:

%% define percentage
perc = [80, 95, 100];

%% number of random integers
nMax = 1e3;

%% generate integers
tic
vec = randi([1, 100],1,nMax);
t   = toc;
disp(['Generated ' num2str(nMax) ' random integers in ' num2str(t) 's'])

%% get indices
tic
% idx(numel(perc)).dat = NaN; % only needed if you have many intervals
for i = 1:numel(perc)
   idxVec(i).dat      = find(vec <= perc(i));
   vec(idxVec(i).dat) = NaN;
end
t = toc;
disp(['Indexed ' num2str(nMax) ' random integers in ' num2str(t) 's'])

%% plotting:
figure(1);
for i = 1:numel(perc)
    plot(idxVec(i).dat, i*ones(1,numel(idxVec(i).dat)), '.');
    hold on;
end
hold off

I'm not sure how big your database is. However, for my standard the code is fast:

Generated 10 000 000 random integers in 0.21746s
Indexed 10 000 000 random integers in 0.35026s

PS: You should remove the time taking (tic, toc) and the figure. Especially the plotting will take time for a large database.

Semoi
  • 574
  • 1
  • 4
  • 16
  • This is reasonable for my given example, but it becomes a bit clunky when I want to choose from a larger range of values. Thanks for your answer. :) – NBland Aug 13 '17 at 03:36
  • I edited the code, so that it is optimized for Matlab. – Semoi Aug 15 '17 at 17:34