I have a very large data set (~300'000 data points) and a subset of it (6000 data points), which shows the difference of travel time [in seconds] of agents before and after a road closure. I want to show the difference in their distribution.
However, the data set is so large, that the outliers are still so many and nothing can be read from the boxplot. A simple table would be an option of course, but I believe if done right, a graph can be more helpful to see the difference.
Really interesting, for the analysis, is the difference between -1000 and 1000. So I wonder, is it ok to simply truncate the data set or is it more appropriate to transform the data?