I have two populations of donations (segment 2 people were subject to a different form):
seg 1 seg 2
count 1772.000000 1645.000000
mean 93.576185 108.259574
std 301.248967 241.837942
min 3.000000 3.000000
25% 20.000000 20.000000
50% 40.000000 50.000000
75% 72.750000 100.000000
max 9000.000000 6200.000000
They don't look very normal. This is the log distribution:
But their difference does:
means = []
for i in range(0,10000):
means.append(df_segment_1.sample(1772, replace=True).mean() - df_segment_2.sample(1645, replace=True).mean())
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
np.random.seed(42)
plt.hist(means, density=True, bins=30) # density=False would make counts
plt.ylabel('Probability')
plt.xlabel('Mean difference of donations');
So I wanted to know how can I do a t-test on this population in order to know if segment 2 is significantly higher than segment 1?