2

Let's say I have a geo-tagged dataset of all payment transactions for businesses in a city. I know whether each payment is made by cash or card, and have made a heatmap of where in the city the highest rate of cash payments occur. Now I have a hypothesis that businesses closest to ATM's have higher rates of cash payments since people leave ATM's with cash in their pockets. If I have this heatmap of the cash rate and the ATM as nodes on the map, how can I test my hypothesis? Is there a name for this type of problem or a typical approach to it? Note that I'm looking to solve this problem with Python, so programming-based solutions and package referrals would be appreciated.

NeonBlueHair
  • 389
  • 1
  • 2
  • 9

1 Answers1

0

In this case you can start in a simple way:

Discretise the heatmap in some regions, and label each region either "close to ATM" or "not close to ATM".

Then you can test your hypothesis that: cash payout rate is higher in the "close to ATM" regions. E.g. by using ANOVA predicting "cash payout rate" with "close to ATM" as the explanatory variable.

If you have enough data, this rough discretisation will be able to correctly tell if your hypothesis is true or not. If not, a hypothesis where you assume that a continuous variable, such as "distance to nearest ATM", might require less data to be able to give significant results.