I am interested in assessing the divergence, or similarity or dissimilarity of 2 datasets that are the results of 2 different lidar instrument measurements. Each dataset has over 90,000 values and they do measure a continuous variable, let's say altitude. Each dataset has a bimodal distribution with a long tail on the second mode, and are highly correlated at 0.8 (Spearman). I was looking into both Kolmogorov-Smirnov test as well as Kullback-Leibler. I am very concerned that my distributions are bimodal and very large. I know that tests that will give a p-value are very sensitive to distribution types and number of measurements. For this reason I think that the KS test is not appropriate. So I remain only with the KL test. But ….. I wonder how this test is influenced by the number of data, the distribution, and how to interpret the result.
For example I gather that if I have 2 KL tests, for each the minimum values in one case is 0.5 and the other case is 0.9, then in the first case the 2 datasets are less divergent than in the second case. But how small the KL should be until I can say that "one dataset is a good approximation of the other"? Granted, even that is debatable, because I can look at divergence itself, or how different one dataset is from the other in a statistical sense (this is one exercise) or look at the similarity / dissimilarity of data interpretation, or what the data "tell" about the variable measured. So …. It is enough that actually I look only at correlations in this case? Maybe I should classify my data in a number of convenient classes and compare one classification to the other through kappa-statistics? Maybe this will tell me at least if the interpretation of one dataset is similar enough with the interpretation of the second dataset? Do you have any other ideas? References? I would appreciate any thoughts on the matter.
Edit
Thanks for your answers. Now i realized i should have included few more details. Both raw lidar data go through a standard processing phase that results in a regular grid xyz of pre - determined resolution (in my case 5 m by 5 m). Both datasets have same xy coordinates and only z varies. I know that one sensor has usually an error of +/- 15 cm vertical, and probably the other one is very similar as well (i can find out actually). Both sensors were collecting data over the same area. We don't have any ground truth for altitude there. We want to compare these processed data and not the actual xyz lidar cloud (in other words the DEM result .....). I hope this is now a little bit more clear.
Thanks.