7

Can anyone recommend an easy way to cluster hundreds of GPS trajectories to find out their common paths? The GPS data is coming from different vehicles that have traveled thousands of miles.

Silverfish
  • 20,678
  • 23
  • 92
  • 180
thewaywewere
  • 173
  • 2
  • 8
  • 2
    Can you describe exactly what a GPS trajectory is? Is it a 3 dimension location? Does it have a 3 dimension speed vector or a 3 dimension acceleration vector? – Eric Farng May 17 '15 at 00:38
  • Hi Eric, the GPS trajectory means the waypoints with lan/lon and speed. Speed may not be the key parameter to consider for finding out the common path or route of the vehicles. Hope this clarifies. – thewaywewere May 17 '15 at 16:04
  • You can come up with some salient features describing each trajectory: travel time/distance, number of left/right turns, number of stops etc – Vladislavs Dovgalecs Aug 27 '15 at 22:57

4 Answers4

7

There is no easy way.

  1. there is no universally useful accepted definition of what is a cluster, so how could you do clustering?

    Similarity is not objective. If you use e.g. DTW then you do assume the complete series is relevant, not e.g. only parts of it. It works on the technology part, but that doesn't mean the results are what you are looking for...

  2. Complexity and cost. In particular if you are interested in partial overlaps, cost rises drastically. Segmentation itself is surprisingly hard and needs to be solved first.

First figure out what you want. Don't start with the method, but with the exact objective: what is a good result (and how would you use it, what is its impact?)

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • Anony, if there are hundreds sets of GPS waypoints from location A to B., then wanted to find out the common path from A to B. Then convert that common path, or cluster, to a polygon based geofence for detecting if any vehicle goes out of it to trigger an alert. This is what I'm trying to accomplish. Thanks. – thewaywewere May 17 '15 at 16:14
  • Are A and B known beforehand? How dirty is the data? – Has QUIT--Anony-Mousse May 17 '15 at 16:47
  • Locations of A and B are known. As to automate to determine the common path or geofence. The locations could be found by checking the stoppage of the vehicles. Say, speed is lower than or equal to 0.5Km and lasted for 6 hours. With using this approach, there will be many points discovered. To further processing that, adding a buffer to the point and union them together, I think, would be able to find the stoppage location or geofence for extracting waypoints from A to B. – thewaywewere May 18 '15 at 12:39
  • The data could be cleaned up a bit to take out the outliner due to GPS positioning error by checking the distance traveled in a data update interval and possible speed on the roads. – thewaywewere May 18 '15 at 12:48
  • For that, I would just search the closest good trajectory. No need to cluster anything. – Has QUIT--Anony-Mousse May 18 '15 at 14:10
  • From A to B, actually there are many routes or pairs. For each route, could be more than one common path. Thus, this is my thought of using clustering to find out the common paths (points), then convert them to geofences for any derivation detection. May need to consider the density of points. Tried to Google DTW, but not sure if it can help to extract the common paths. – thewaywewere May 18 '15 at 17:04
  • But why do you need "common" paths at all? Aren't all previously taken trajectories good for defining safe regions? The problem is: clustering algorithms work on instances, not subsequences. They don't have a notion of "common paths". – Has QUIT--Anony-Mousse May 18 '15 at 20:37
  • Just thinking by using clustering approach could be able to automate the task. The common paths approach likely needs to separate the paths, or waypoints, manually. The actual GPS data I had, actually consisted of many location pairs of A and B, and for each pair, there is likely more than one 'common' path. – thewaywewere May 19 '15 at 15:50
  • Don't expect too much magic from clustering algorithms. They will need to be carefully guided, they will also need segmentation. And most of the time, they will fail to produce the desired result - don't use them for automatation. Use them to understand your data better; but don't assume they are a magic button. – Has QUIT--Anony-Mousse May 19 '15 at 19:31
5

It seems you just need to estimate the spatial distribution of "good" locations belonging to ordinary paths in order to detect outliers, which is a way nicer problem than path clustering.

The naive but likely sufficient way is to convert the entire path bundle into a density raster with a resolution equal to your intended tolerance (~100m), and use it to rise alert whenever the vehicle detours onto an empty pixel (or below some threshold in case your data already has outliers).

  • You are right if with A and B. As for the density raster, could you elaborate more? Can you share some more specific tools or implementation, such as as Python, R, QGIS or else with some specific pakages/plugins? Thanks! – thewaywewere May 18 '15 at 12:56
  • Possiby gdal_rasterize, but you should ask on http://gis.stackexchange.com/ –  May 18 '15 at 15:43
  • There is a post on gdal_rasterize, http://gis.stackexchange.com/questions/109000/how-to-use-gdal-rasterize-how-to-know-which-attribute-to-burn-with-ogrinfo. Now, I think I got your ideal. One more option for my consideration. Thanks! – thewaywewere May 18 '15 at 16:47
2

Consider phrasing the problem as a graph of locations that make up a path and you want to find commonly occurring subgraphs. Try looking at frequent pattern mining approaches, specifically mining graphs, trees and structures.

I originally ran across this idea in the gSpan algorithm. It finds a hierarchy of subgraphs (from small to large) and does it efficiently by creating a lexigraphical order of nodes to traverse. The authors even have an implementation of gSpan to use.

You may run into problems with graph based approaches since I assume it's very unlikely that two lat/longs are exactly the same and you may need to round things off.

Eric Farng
  • 1,585
  • 10
  • 17
  • The paper looks overwhelmed. The approach seems more suitable if I have all the paths of locations 1 (A, B, C, D,....), 2 (D, C, A,E, ....) and so on. Then, use frequent pattern mining to extract the common trajectory(ies) of the stopped locations. Thanks your advise anyway! – thewaywewere May 18 '15 at 14:53
2

There are many different forms to approach this problem. There are a lot of research works done in this field (I guess since 2000) and a hand full of algorithms are introduced. Recently I read a good comprehensive survey that I recommend you read it: "Trajectory data mining: an overview" by YU ZHENG, Microsoft Research. To find your answer easier, you can directly read "Sequential pattern mining", "Distance/Similarity of Trajectories" and "Trajectory Clustering" sections which are more informative considering your question.