I have a csv file file1.csv
in the following format
serialNo, timestamp, visits, confirm, timeSec
1, 1:55:40, 3, 0, 198
2, 7:42:56, 2, 1, 102
3, 13:20:32, 3, 0, 181
4, 15:26:56, 0, 1, 101
5, 10:36:46, 1, 0, 198
timestamp is the timestamp, visits is the no. of visits to a website, timeSec is the time spent in seconds and confirm is an ordinal
variable containing a 0/1
value
I have imported this into a pandas dataframe
I wish to see if there is any connection between
a) confirm and visits
b) confirm and timeSec
c) confirm and timestamp - e.g. whether there is a greater chance of a confirm=1 value between 2 time intervals.
I realize that there is a method in pandas to find a correlation
data['confirm'].corr(sessionData['visits'])
that uses the pearson correlation by default and it is evaluated to -0.04981167717341486
and data['confirm'].corr(sessionData['timeSec']) gets evaluated to 0.010440316272189443
My question is -
Is pearson correlation the correct inferential statistics tool to use in both cases a, b and c? Also, what are the different strategies I can use to find a connection as mentioned in a, b and c?