0

How reliable is using KS test (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html) for change-point detection (in a single vector)? Is there a better way for this, in Python?

okuoub
  • 27
  • 8
  • Could you please explain how you conceive of applying the KS test to find a changepoint? The reference you give doesn't address this. – whuber Jun 13 '18 at 19:13
  • @whuber I go over each point and apply KS on all time-series before and after this point, to test if the distribution differ – okuoub Jun 13 '18 at 19:18
  • 2
    That seems like it would cause pretty serious multiple testing problems. – Matthew Drury Jun 13 '18 at 19:56
  • @MatthewDrury You are correct, didn't think of it. Do you have any idea on how to solve it? – okuoub Jun 13 '18 at 20:18
  • I'm not a time series expert, so, unfortunately, I don't have much for you there. – Matthew Drury Jun 13 '18 at 20:21
  • Did you see this question: https://stats.stackexchange.com/questions/59895/python-module-for-change-point-analysis?rq=1 – Matthew Drury Jun 13 '18 at 20:22
  • 1
    The multiple testing problems can be handled--this approach is, in spirit, much like many other methods to find changes in level, slope, or variance in a series of data. However, it's not clear how effective it would be. It would likely not be a very powerful method because the KS test is looking for *any* kind of change rather than a specific change. The idea is intriguing enough to be worth some investigation, but first you ought to make sure this is the right test for whatever you're trying to learn about these data. – whuber Jun 13 '18 at 22:27
  • @whuber So my current approach is: Go over all the point with KS. find cluster of three point that got p < 0.05. (Data is float is the range 0.5 to 2) Now, I want to take that point as indices, to check of I had more one prior to them relative to after (In another vector). Does it sound reasonable? what is the best test to check if one vector had more 1's than another? (binary data) – okuoub Jun 14 '18 at 07:41
  • 1
    You need to provide details in your post. If your vector is binary, the KS test is inapplicable and much simpler methods are available. The use of clusters of three does not sound consistent with your stated problem. A test of whether one vector has a greater *proportion* of ones than another is called a *binomial test of proportions*--a search will take you to hundreds of posts about such tests on this site. – whuber Jun 14 '18 at 12:12

0 Answers0