Plotting 0 in a log scaled axis

Question

I have a very large and sparse dataset of spam twitter accounts and it requires me to scale the x axis in order to be able to visualise the distribution (histogram, kde etc) and cdf of the various variables (tweets_count, number of followers/following etc).

    > describe(spammers_class1$tweets_count)
  var       n   mean      sd median trimmed mad min    max  range  skew kurtosis   se
1   1 1076817 443.47 3729.05     35   57.29  43   0 669873 669873 53.23  5974.73 3.59

In this dataset, the value 0 has a huge importance (actually 0 should have the highest density). However, with a logarithmic scale these values are ignored. I thought of changing the value to 0.1 for example, but it will not make sense that there are spam accounts that have 10^-1 followers.

P.S I mainly use Python for the analysis, but I could use Matlab or R if there are easy work arounds in these languages.

@amaatouq: I would have a look at [IHS](http://stats.stackexchange.com/a/1630/603) transformation (instead of the log). — user603, May 04 '13 at 19:06
Andre, For example, if I wanted to plot a histogram or a cdf of the dataset to show that 50% of spammers actually have between 0 - 10 tweets and 20% have between 11 - 100 and less than that between 101 - 1000 and this would go up to 10^5 as my max value is 669873. Dividing by a 100 or 1000 wouldn't let me convey this observation — Alhayer, May 04 '13 at 19:06
@user603: Thank you very much for the reference, but it seems to me that these transformations are applied to the values of x and not used to scale the measurement that displays the value of x using intervals corresponding to orders of magnitude. I am not sure of what I am saying, so please do correct me if I am wrong. What I am trying to convey will require me to use the actual values of x — Alhayer, May 04 '13 at 19:20
@amaatouq: you can apply this transformation to the axis (e.g. not to the data points themselves). If this is what you want (transform the axis, not the data) I can write a simple R example to do that. — user603, May 04 '13 at 19:24

Peter Flom · Accepted Answer · 2013-05-04T19:38:06.880

5

If you are just trying to visualize the distribution (and not using it in modeling) you can add 1 to all values, take logs, then write the axis to reflect this, e.g. in R, for density plot (same idea would work for other plots)

x <- c(rep(0,100), rep(1,30), rep(2,20), rep(3,10), rep(100, 2))
xt <- log10(x+1)
plot(density(xt), xaxt = 'n')
axis(1, at = c(0, 1, 2), labels = c(0, 10, 100))

edited May 04 '13 at 19:38

answered May 04 '13 at 19:22

Peter Flom

94,055
35
143
276

1

Do you know how you would call a modified log-scale like this? A log(1+x) scale? – Pertinax Mar 19 '19 at 12:49

Plotting 0 in a log scaled axis

1 Answers1