3

I've come across this paper

https://uta-ir.tdl.org/uta-ir/bitstream/handle/10106/1827/Sukchotrat_uta_2502D_10083.pdf?sequence=1&isAllowed=y]

where it is described a k-Nearest Neighbors Data Description (kNNDD)-Based Control Chart. (pag 45)

First, the author describes the Local Outlier Factor (LOF) method and then the $K^2$ chart, where the control value is defined as the average euclidean distance of a point from its k nearest neighbors.

I can't really find a connection between the LOF algorthm and this control value. Am I wrong?

momomi
  • 125
  • 16

1 Answers1

3

At the very heart of LOF you will find "k-distance", the distance to the k-nearest neighbor.

The idea of using the k-distance is older than LOF. And the range 10..50 may be a good choice for LOF, but the usual kNN outlier detection often works best for k=1.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • Any idea why LOF is included in the paper? As you said there is a little connection, but in my opinion it adds anything else to the description of $K^2$ chart. – momomi Feb 04 '17 at 18:40
  • Probably he.was originally meant to.compare to LOF, too. – Has QUIT--Anony-Mousse Feb 05 '17 at 09:06
  • I've tried LOF but it seems not to work. There are too many obs with score >1. – momomi Feb 05 '17 at 09:11
  • 1
    Values *near* 1 are normal. Depending on your data set, outliers may start at 1.1, 2.0, 3.0, or 10... that is why you usually look at the M highest values only. But you should probably use an version adapted for time series rather than ignoring time (or worse, treating time as another attribute) – Has QUIT--Anony-Mousse Feb 05 '17 at 11:18