I have data with positive values ranging from 0 to 21 (min = 0, 1stQ = 0, Median = 2, Mean = 3.1, 3rdQ = 4, Max = 21), the distribution (using ggplot2::geom_density()
) looks like this:
I know for a fact (based on the scientific literature for my research) that there is a substantial proportion of negative values but this data cannot be collected.
Since my actual data is constrained to positive values, how can I get an estimate of the distribution allowing for negative values?
Could adding a constant to each observation help find the shape of the distribution and then be used to model the negative values? (example data below)
library(tidyverse)
# Example data
a <-rep(0, 59)
b <- rep(1, 31)
c <- rep(2, 23)
d <- rep(3, 20)
e <- rep(4, 10)
f <- rep(5, 9)
g <- rep(6, 6)
h <- rep(7,6)
i <- rep(8:21, by = 1)
df <- data.frame(config1 = c(a,b,c,d,e,f,g,h,i),
config2 = c(a+2,b+2,c+2,d+2,e+2,f+2,g+2,h+2,i+2)) %>%
pivot_longer(cols= c(config1, config2) ,names_to = "config", values_to= "values")
# my actual distribution is "config1", adding a constant gives "config2"
p1<-df %>%
ggplot() +
aes(x = values, fill = config) +
geom_density(alpha = 0.4)
p1