Getting weirdly small cdf and pdf values for a set of data of 5 members in R

Question

I am doing a Weibull and normal distribution analysis for a set of my data which are :

336256 620316 958846 1007830 1080401

So to avoid putting the whole code here, I refer you directly to the post I followed :

link

The PDF and CDF plots I get are so small and get this form :

Just as info , I put the Weibull distribution results here , to show the values that are small as well :

update :

as per request I put the code to plot pdf here :

my data are as indicated 336256 620316 958846 1007830 1080401

so to reproduce the code, you need to save it as csv and run this code, on your directory. My problem is that I can change and scale the graphs on

xs <- seq(0, 5, len=500)

by changing to :

xs <- seq(10, 1650000, len=5000)

I get :

The problem is when I change the second and third argument , I mean :

1650000 and len=5000 to for example 9650000, len=5000 , the peaks position also displace and don't remain in the same place so it's not only a re scaling the graph.

#-----------------------------------------------------------------------------
# 5. Bootstrapping the pointwise confidence intervals
#-----------------------------------------------------------------------------
library(MASS) 
library(car)

set.seed(123)

rw.small <- c(336256,620316,958846,1007830,1080401)

xs <- seq(0, 5, len=500)


boot.pdf <- sapply(1:1000, function(i) {
  xi <- sample(rw.small, size=length(rw.small), replace=TRUE)

}
)

boot.cdf <- sapply(1:1000, function(i) {
  xi <- sample(rw.small, size=length(rw.small), replace=TRUE)

}
)   

#-----------------------------------------------------------------------------
# Plot PDF
#-----------------------------------------------------------------------------

par(bg="white", las=1, cex=1.2)
plot(xs, boot.pdf[, 1], type="l", col=rgb(.6, .6, .6, .1), ylim=range(boot.pdf),
     xlab="x", ylab="Probability density")
for(i in 2:ncol(boot.pdf)) lines(xs, boot.pdf[, i], col=rgb(.6, .6, .6, .1))

# Add pointwise confidence bands

quants <- apply(boot.pdf, 1, quantile, c(0.025, 0.5, 0.975))
min.point <- apply(boot.pdf, 1, min, na.rm=TRUE)
max.point <- apply(boot.pdf, 1, max, na.rm=TRUE)
lines(xs, quants[1, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[3, ], col="red", lwd=1.5, lty=2)
lines(xs, quants[2, ], col="darkred", lwd=2)
#lines(xs, min.point, col="purple")
#lines(xs, max.point, col="purple")

*"I still get"* that is because you still didn't scale it properly enough. This still seems to be just a matter of making the graphs. Scale the graphs differently and you will be able to align them with the left graphs in your Weibull results overview (these have more on the x-scale, showing more of the entire curve, and less on the, taller, y-scale making the curve less flat). If after correct scaling you still do not get a good result, then show your (reproducible) code or otherwise it will be difficult to see what else could have gone wrong (it isn't clear how the code in the link is used). — Sextus Empiricus, Aug 05 '18 at 21:58
Note that small values for a pdf are perfectly fine. It is a probability *density* function, which will be lower if the total probability (the integrated total/mass/area will be 1) is spread out over a longer range events. — Sextus Empiricus, Aug 05 '18 at 22:10
@MartijnWeterings Thank you very much, I updated my question please check it — FabioSpaghetti, Aug 06 '18 at 07:10
*"The problem is when I change the second and third argument , I mean : 1650000 and len=5000 to for example 9650000, len=5000 , the peaks position also displace and don't remain in the same place so it's not only a re scaling the graph"* I don't see the problem. Can you explain it more. (also, is this question answered or not?, you have accepted the answer below, but you have made additional questions in the comments afterwards) — Sextus Empiricus, Aug 06 '18 at 11:18
@MartijnWeterings You are right , I did not have to accept yet. Ok the problem is that I thought that I only should re scale the graph to see more of it, but changing the len and maximum xi in the line I mentioned above, not only I re scaled the graph but I was also changing its values. This is my problem. I wish I was clear enough. I had fitdistr in my code, I added it now — FabioSpaghetti, Aug 06 '18 at 11:21
@MartijnWeterings I am very sorry! It had to be replaced with rw.small, cause rw.small was random , but I had to exert it on my data ! now it should work — FabioSpaghetti, Aug 06 '18 at 11:40
Are you able to do a favor to me, and edit the code ? now What I am not sure is which parts of the code are not needed to generate those plot, I believe all is needed — FabioSpaghetti, Aug 06 '18 at 11:42
Thanks a lot for your effective edits,should I accept them ? — FabioSpaghetti, Aug 06 '18 at 11:45
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/81208/discussion-between-fabiospaghetti-and-martijn-weterings). — FabioSpaghetti, Aug 06 '18 at 11:48

score 7 · Answer 1 · answered Aug 05 '18 at 21:20

7

The scale of $x$ on your graphs is nowhere near the five observed values (which have a mean value of 800,729.8, a minimum value of 336,256, and maximum of 1,080,401), and therefore you would expect the very small probabilities on your $y$ axis.

answered Aug 05 '18 at 21:20

Alexis

26,219
5
78
131

Thank you , I modified the code , but as you see it remains the same – FabioSpaghetti Aug 05 '18 at 21:29
1

It remains the same because the graph is the same graph with different axis *labels*. I agree with @MartijnWeterings first comment above. – Alexis Aug 06 '18 at 00:34
Thank you very much, so far, I thought I can only scale x, and the function pdf and cdf are scales as a consequence – FabioSpaghetti Aug 06 '18 at 06:19
@Martijin Weterings – FabioSpaghetti Aug 06 '18 at 07:09
I updated my question, please review – FabioSpaghetti Aug 06 '18 at 07:09

Sextus Empiricus · Accepted Answer · 2018-08-06T12:23:19.207

"the peaks position also displace"

Could you say what the coordinates of the peaks are in the two different cases?

I am not sure what you are all doing. The bootstrapping part is very vague (and I doubt it is correct to do it like that) so I took it out. The code below is creating fine graphs (peaks at the same coordinates every time, but of course when you scale the x-axis the position on the screen/plot shifts). Can you look into this and comment/explain again what your question about the graphs is?

library(MASS) 
library(car)

# settings
set.seed(123)
df <- c(336256,620316,958846,1007830,1080401)
xs <- seq(0, 2*10^6, len=500)

# estimation
MLE.est <- suppressWarnings(fitdistr(df, densfun="weibull", lower = 0))  
boot.pdf <- dweibull(xs, shape=as.numeric(MLE.est[[1]][1]), scale=as.numeric(MLE.est[[1]][2]))
boot.cdf <- pweibull(xs, shape=as.numeric(MLE.est[[1]][1]), scale=as.numeric(MLE.est[[1]][2]))

# plotting
layout(matrix(c(1,2),1))

plot(xs, boot.pdf, type="l", col=1, ylim=range(boot.pdf),
     xlab="x", ylab="")
points(df,dweibull(df, shape=as.numeric(MLE.est[[1]][1]), scale=as.numeric(MLE.est[[1]][2])),pch=21,col=1,bg=2)
title("pdf")

plot(xs, boot.cdf, type="l", col=1, ylim=range(boot.cdf),
     xlab="x", ylab="")
points(df,pweibull(df, shape=as.numeric(MLE.est[[1]][1]), scale=as.numeric(MLE.est[[1]][2])),pch=21,col=1,bg=2)
title("cdf")

legend(0.9*10^6,0.12,c("fit","data points"),col=1,lty=c(1,NA),pch=c(NA,21),pt.bg=c(0,2),cex=0.7)

@FabioSpaghetti it is unclear to me why you have accepted my answer? I have only cleaned up your code and was still trying to figure out what your question really was. Was it about the low values due to your imagine only showing the tail of the distribution (as alexis' answer points out), or was is about the low values of the pdf in general (and which is explained in the other question which Whuber links to, but then it is about high values instead of low values, but the principle is the same)? — Sextus Empiricus, Aug 06 '18 at 13:02
For future reference (helping visitors that may land on this page) we should make this question and answer more clear. Can you still clarify your question, even though you seem to have gotten something of an answer. — Sextus Empiricus, Aug 06 '18 at 13:04

Getting weirdly small cdf and pdf values for a set of data of 5 members in R

2 Answers2