I'm working on an algorithm that takes in a vector of the most recent data point from a number of sensor streams and compares the euclidean distance to previous vectors. The problem is that the different data streams are from completely different sensors, so taking a simple euclidean distance will dramatically overemphasize some values. Clearly, I need some way to normalize the data. However, since the algorithm is designed to run in real time, I can't use any information about any data-stream as a whole in the normalization. So far I've just been keeping track of the largest value seen for each sensor in the start-up phase (the first 500 data vectors) and then dividing all future data from that sensor by that value. This is working surprisingly well, but feels very inelegant.
I haven't had much luck finding a pre-existing algorithm for this, but perhaps I'm just not looking in the right places. Does anyone know of one? Or have any ideas? I saw one suggestion to use a running mean (probably calculated by Wellford's algorithm), but that if I did that then multiple readings of the same value wouldn't show up as being the same, which seems like a pretty big problem, unless I'm missing something. Any thoughts are appreciated! Thanks!