I have assembled binary vectors (0/1 for all elements and equal weight and arranged in time order) that have been separated into different cohorts where a unique event of interest occurs. I have removed the event of interest element itself and the prior 3 months of elements from all vectors. Now, I take a new vector to test and calculate the average pairwise Jaccard similarity between this vector and each cohort individually.
My questions center on interpretation:
What is the statistical interpretation of an average pairwise Jaccard similarity score in this example? Can this be seen as a probability or not?
If the number of samples in these cohorts increase, can it be interpreted that this would improve the prediction?
If this is valid, what would be the best performance metrics for evaluating this (Precision/Recall, F Score, Cross Validation?
Any advice would be sincerely appreciated. I'm just curious if this idea might be useful as an alternative to traditional survival analysis/time-to-event in my use case.