2

I have two data sets both matrices of latitudes and longitudes.


$A = [(X,Y),(X_2,Y_2)]$
$B = [(X_3,Y_3),(X_4,Y_4),(X_5,Y_5)]$

They are of different sizes

I want to calculate by how much each of the pairs of (x,y) coordinates in A correlate with all of those in B.

So then can order A by those that correlate most closely with B.

I imagine it working a bit like B is used to create a heat map of the locations. So where there are more location clustered together in B the heat map has a higher value. Then each value of A is given a value based on what part of the heat map it is on.

rmaspero
  • 121
  • 3
  • 1
    Correlation is defined between two random variables. So, it is not obvious what do you mean by the correlation between a point and a set. Do you mean the distance? – Hossein Mar 30 '17 at 11:58
  • It is better to add the above comments into your question. – Hossein Mar 30 '17 at 12:28
  • What are you asking, exactly? The procedure you describe at the end sounds like you will compute a density on the sphere (aka "heat map") based on the $B$ data and then assign those densities to the points in $A$--but that appears to have little to do with any of the preceding descriptions which refer to "correlate" and "order." – whuber Mar 30 '17 at 14:13

1 Answers1

1

First of all, to compute distance between two latitude-longitude points refer to this stackoverflow question.

Here are two ideas for your question:

  • For each point $(X,Y)$ in $A$, find the $k$ points in $B$ that are nearest to $(X,Y)$. Then compute the distances between $(X,Y)$ and each of these points and then average these $k$ distances to find a measure of how much $(X,Y)$ is near to the points in $B$. This approach is similar to KNN algorithm in data mining. Here $k$ is a parameter which you should choose according to your application.
  • Cluster the points in $B$ using a density-based clustering approach such as DBScan, or a grid-based clustering such as STING, or a partitioning-based algorithm such as k-means. Then, for each point $(X,Y)$ in $A$, find the cluster which this point belongs to, and compute the distance between $(X,Y)$ and the center of that cluster. Alternatively, you can compute the average distance between $(X,Y)$ and all points in that cluster.
Hossein
  • 3,170
  • 1
  • 16
  • 32
  • 1
    Given that $(X,Y)$ are latitude and longitude on a sphere (or spheroid), please explain why you recommend computing *euclidean* distances and what those might actually mean. – whuber Mar 30 '17 at 14:11
  • 1
    Just a misunderstanding. The answer is edited. – Hossein Mar 30 '17 at 14:28