1

I have used R programming language to perform Wilcoxon signed ranked test between two data:

47.96       13.8    
23.44       25.84   
38.98       31.84   
15.45       18.87   
18.91       19.46   
45.21       38.47   
21.16       10.17   
13.14       11.61   

The graphical representation of the result is showed in photo.

I am trying to understand how it is decided what will be the box range.

I understand that the horizontal lines on the graph are median values, while vertical lines on the graph present the range of my data, but I am curious what box range represents?

So in my case although there is no significant difference between two median values, the box range in the graph is very different.

enter image description here

This is the code I used:

#my_data <- read.csv(file.choose())
# Data in two numeric vectors
# ++++++++++++++++++++++++++
# Weight of the mice before treatment
Xray_holo <-c(47.96, 23.44, 38.98, 15.45, 18.91, 45.21, 21.16, 13.14)
# Weight of the mice after treatment
NMR_holo <-c(13.8, 25.84, 31.84, 18.87, 19.46, 38.47, 10.17, 11.61)
# Create a data frame
my_data <- data.frame(
  P_str = rep(c("Xray_holo", "NMR_holo"), each = 8),
  logAUC = c(Xray_holo,  NMR_holo)
)
print (my_data)
install.packages("dplyr")
library("dplyr")
group_by(my_data, P_str) %>%
  summarise(
    count = n(),
    median = median(logAUC, na.rm = TRUE),
    IQR = IQR(logAUC, na.rm = TRUE)
  )
# Install
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
install.packages("ggpubr")
# Plot weight by group and color by group
library("ggpubr")
ggboxplot(my_data, x = "P_str", y = "logAUC",
          color = "P_str", palette = c("#00AFBB", "#E7B800"),
          order = c("Xray_holo", "NMR_holo"),
          ylab = "logAUC", xlab = "Protein structures")


#install.packages("PairedData")

# Subset weight data before treatment
#Xray_holo <- subset(my_data,  P_str == "Xray_holo", logAUC,
                 drop = TRUE)
# subset weight data after treatment
#NMR_holo <- subset(my_data,  P_str == "NMR_holo", logAUC,
                 drop = TRUE)
# Plot paired data
#library(PairedData)
#pd <- paired(Xray_holo, NMR_holo)
#plot(pd, type = "profile") + theme_bw()


res <- wilcox.test(Xray_holo, NMR_holo, paired = TRUE)
res
sergio
  • 25
  • 6
  • Could you please share the code for where you ran the Wilcoxon Signed Rank Test and for producing the numerical output? – Todd Burus Jan 08 '20 at 03:10

1 Answers1

1

What you are looking at are boxplots. These are just visualizations of five-number summaries of the values. As you said, the horizontal line in the middle is median. The vertical lines extend out to the range (min and max). The box represents the first and third quartiles (bottom 25% and top 25% of the data, respectively). The "boxrange" you are talking about is called the interquartile range (IQR). It is measured as the difference between the third and first quartiles (often said as Q3-Q1).

Todd Burus
  • 632
  • 2
  • 12
  • Just to make sure I understood it. By just looking at the IQR of my plot, it seems to me that values are first sorted based on the number (min->max) and then the IQR is represented as values within 25%-75% range of sorted values? – sergio Jan 08 '20 at 03:20
  • @sergio Yes. Order from least to greatest. The five number summary is min, Q1, median, Q3, max. Then the IQR (=Q3-Q1) is the range of values in the middle 50% of the data (i.e. how wide the middle 50% is). It is analogous to standard deviation as a measure of variation in a dataset. – Todd Burus Jan 08 '20 at 03:24
  • @ Todd Burus I see, cool thanks! – sergio Jan 08 '20 at 03:27