3
df = pd.DataFrame({'input': [3009861162, 548584145, 950178496, 984257236, 447403092, 447403094, 445305942, 445306198, \
                         2592658903, 2592921015, 2592920999], 'output': [2917869018, 622408909, 621393093, 749384917, \
782939349, 2930425029, 2930425313, 2930425249, 2393546529, 2527846001, 3601419873]})

plt.scatter(df['input'], df['output'], s=0.5)
plt.show()

enter image description here

It doesn't look like all the points are plotted. I believe that is because some numbers are numerically close to one another and are being overplotted. What are good ways to visualize such bivariate data in a scatterplot-like manner?

whuber
  • 281,159
  • 54
  • 637
  • 1,101
hlkstuv_23900
  • 133
  • 1
  • 6
  • 2
    Are you sure they're not plotted? They are more likely overlapped. You can try plotting each point different colours then reverse the direction of the vectors, see if the colour on top changes. – ReneBt Feb 16 '19 at 08:13
  • You can try adding "plt.title('There should be ' + str(len(df['input'])) + ' dots')" and manually counting as a quick check when the number of data points is few. – James Phillips Feb 16 '19 at 21:30

1 Answers1

3

In R, you can add some random noise to each plotted point (and make points "hollow") to make them a bit more visible while plotting. For example:

plot(input, output, col="red")

yields:

enter image description here

but

plot(jitter(input, factor = 50, amount = NULL), jitter(output, factor = 50, amount = NULL))

yields:

enter image description here

You can adjust the parameters of jitter a bit to add more or less noise as you see fit. I'd imagine there is a similar function in python or you could simply add some random noise to each point yourself. See here for some examples in python:

https://scientificallysound.org/2017/08/17/jitter-to-figures-python-r/

Switching to a log scale with jitter might help too. For example:

loginput<-log(input)
logoutput<-log(output)

plot(jitter(loginput, factor = 50, amount = NULL), 
         jitter(logoutput, factor = 50, amount = NULL), 
         col="blue", xlab="log input", ylab="log output")

results in:

enter image description here

StatsStudent
  • 10,205
  • 4
  • 37
  • 68