7

I want to simulate a continuous data set/variable with lower/upper bounds of [1;5], while at the same time ensure that the drawn distribution can be considered as bimodal.

Searching for my problem, I found this source, which helps to simulate a bimodal distribution, however, it doesn't apply the lower/upper bounds: https://stats.stackexchange.com/search?q=bimodal+truncated+distribution

In contrast, the rtruncnorm function in R (from package truncnorm) helps me to simulate a normal (but not bimodal) distribution with lower/upper bounds.

Question now is, how can I combine both? Theoretically, I could just use the approach from the first link, i.e. generate a bimodal distribution with two underlying normal distributions and then just recalculate the drawn data with this approach (https://stats.stackexchange.com/a/25897/66544) to get my bounds.

Or I could generate two truncated normal distributions with the rtruncnorm function and then combine it to a bimodal distribution following the approach from the first link.

But I'm not sure if either of these approaches is mathematically justified.

NOTE: why do I want a range of [1;5] anyway? The real data would come from a survey where respondents will answer on a 5 point scale from 1-5 (continuously, not discrete), hence I need to simulate this finiteness.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
deschen
  • 479
  • 3
  • 12

2 Answers2

6

Another way is to use beta distribution. It is bounded on $[0;1]$.

So you just need to "move" half of simulated sample to $[1;3]$ and another half to $[3;5]$.

Here I use Beta(2,2) and Stephan Kolassa's framework:

nn <- 1e4
set.seed(1)
betas<-rbeta(nn,2,2)
sims <- c(betas[1:(nn/2)]*2+1,
          betas[(nn/2+1):nn]*2+3)


hist(sims)

enter image description here

Łukasz Deryło
  • 3,735
  • 1
  • 10
  • 26
4

The easiest approach would be to draw $\frac{n}{2}$ samples from a truncated normal distribution with one mean and another $\frac{n}{2}$ samples from a truncated normal distribution with a different mean. This is a , specifically one with equal weights; you could also use different weights by varying the proportions by which you draw from both distributions.

library(truncnorm)

nn <- 1e4
set.seed(1)
sims <- c(rtruncnorm(nn/2, a=1, b=5, mean=2, sd=.5),
                    rtruncnorm(nn/2, a=1, b=5, mean=4, sd=.5))

hist(sims)

histogram

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Coming back to this one (and since you asked), how would it be different if I want to simulate discrete choices, i.e. only values 1,2,3,4,5 are possible? Basically simulating a Likert-type scale? – deschen Oct 04 '21 at 08:39
  • You can use `sample` with a `prob` parameter that has two peaks. For example: `sample(1:5,1e5,prob=c(1,3,1,3,1),replace=TRUE)` – Stephan Kolassa Oct 04 '21 at 16:00
  • Right, that is simple, true. Didn't thought about this straightforward solution. – deschen Oct 04 '21 at 16:25