What is the name of this plot that has rows with two connected dots?

Question

I've been reading EIA report and this plot captured my attention. I now want to be able to create the same type of plot.

It shows the energy productivity evolution between two years (1990-2015) and adds the change value between this two periods.

What is the name of this type of plot? How can I create the same plot (with different countries) in excel?

Is [this pdf](https://www.eia.gov/outlooks/ieo/pdf/0484%282016%29.pdf) the source? I don't see that figure in it. — gung - Reinstate Monica, Aug 26 '19 at 15:03
Another name is [lollipop plot](https://www.huduser.gov/portal/periodicals/cityscpe/vol21num2/ch9.pdf), particularly when the observations have paired data being looked at. — adin, Aug 26 '19 at 18:57

gung - Reinstate Monica · Answer 1 · 2019-08-26T17:02:58.367

That's a dot plot. It is sometimes called a "Cleveland dot plot" because there is a variant of a histogram made with dots that people sometimes call a dot plot as well. This particular version plots two dots per country (for the two years) and draws a thicker line between them. The countries are sorted by the latter value. The primary reference would be Cleveland's book Visualizing Data. Googling leads me to this Excel tutorial.

I scraped the data, in case anyone wants to play with them.

                       Country  1990  2015
                        Russia  71.5 101.4
                        Canada  74.4 102.9
 Other non-OECD Europe/Eurasia  60.9 135.2
                   South Korea 127.0 136.2
                         China  58.5 137.1
                   Middle East 170.9 158.8
                 United States 106.8 169.0
         Australia/New Zealand 123.6 170.9
                        Brazil 208.5 199.8
                         Japan 181.0 216.7
                        Africa 185.4 222.0
           Other non-OECD Asia 202.7 236.0
                   OECD Europe 173.8 239.9
       Other non-OECD Americas 193.1 242.3
                         India 173.8 260.6
                  Mexico/Chile 221.1 269.8

BTW, "scrape" means estimate the values that the dots in the plot represent. FWIW, I used [Web Plot Digitizer](https://automeris.io/WebPlotDigitizer/). — gung - Reinstate Monica, Aug 27 '19 at 13:55
Or. trivially, dot chart. Precursors seem thin on the ground but do exist. See e.g. Snedecor, G.W. 1937. _Statistical Methods Applied to Experiments in Agriculture and Biology_. Ames, IA: Collegiate Press. This graph was dropped at some later point in the revision of this well-known text; it doesn't appear in editions with co-author W.G. Cochran, — Nick Cox, Aug 27 '19 at 14:12

Michael MacAskill · Answer 2 · 2019-08-27T06:44:47.913

The answer by @gung is correct in identifying the chart type and providing a link to how to implement in Excel, as requested by the OP. But for others wanting to know how to do this in R/tidyverse/ggplot, below is complete code:

library(dplyr)   # for data manipulation
library(tidyr)   # for reshaping the data frame
library(stringr) # string manipulation
library(ggplot2) # graphing

# create the data frame 
# (in wide format, as needed for the line segments):
dat_wide = tibble::tribble(
  ~Country,   ~Y1990,   ~Y2015,
  'Russia',  71.5, 101.4,
  'Canada',  74.4, 102.9,
  'Other non-OECD Europe/Eurasia',  60.9, 135.2,
  'South Korea',   127, 136.2,
  'China',  58.5, 137.1,
  'Middle East', 170.9, 158.8,
  'United States', 106.8,   169,
  'Australia/New Zealand', 123.6, 170.9,
  'Brazil', 208.5, 199.8,
  'Japan',   181, 216.7,
  'Africa', 185.4,   222,
  'Other non-OECD Asia', 202.7,   236,
  'OECD Europe', 173.8, 239.9,
  'Other non-OECD Americas', 193.1, 242.3,
  'India', 173.8, 260.6,
  'Mexico/Chile', 221.1, 269.8
)

# a version reshaped to long format (for the points):
dat_long = dat_wide %>% 
  gather(key = 'Year', value = 'Energy_productivity', Y1990:Y2015) %>% 
  mutate(Year = str_replace(Year, 'Y', ''))

# create the graph:
ggplot() +
  geom_segment(data = dat_wide, 
               aes(x    = Y1990, 
                   xend = Y2015, 
                   y    = reorder(Country, Y2015), 
                   yend = reorder(Country, Y2015)),
               size = 3, colour = '#D0D0D0') +
  geom_point(data = dat_long,
             aes(x      = Energy_productivity, 
                 y      = Country, 
                 colour = Year),
             size = 4) +
  labs(title = 'Energy productivity in selected countries \nand regions',
       subtitle = 'Billion dollars GDP per quadrillion BTU',
       caption = 'Source: EIA, 2016',
       x = NULL, y = NULL) +
  scale_colour_manual(values = c('#1082CD', '#042B41')) +
  theme_bw() +
  theme(legend.position = c(0.92, 0.20),
        legend.title = element_blank(),
        legend.box.background = element_rect(colour = 'black'),
        panel.border = element_blank(),
        axis.ticks = element_line(colour = '#E6E6E6'))

ggsave('energy.png', width = 20, height = 10, units = 'cm')

This could be extended to add value labels and to highlight the colour of the one case where the values swap order, as in the original.

Also `geom_lollipop` is available in ggalt and in SciencesPo R packages. — G. Grothendieck, Apr 05 '20 at 22:13

score 25 · Accepted Answer · edited Apr 05 '20 at 17:20

Some call it a (horizontal) lollipop plot with two groups.

Here is how to make this plot in Python using matplotlib and seaborn (only used for the style), adapted from https://python-graph-gallery.com/184-lollipop-plot-with-2-groups/ and as requested by the OP in the comments.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import io
sns.set(style="whitegrid")  # set style

data = io.StringIO(""""Country"  1990  2015
"Russia" 71.5 101.4
"Canada" 74.4 102.9
"Other non-OECD Europe/Eurasia" 60.9 135.2
"South Korea" 127.0 136.2
"China" 58.5 137.1
"Middle East" 170.9 158.8
"United States" 106.8 169.0
"Australia/New Zealand" 123.6 170.9
"Brazil" 208.5 199.8
"Japan" 181.0 216.7
"Africa" 185.4 222.0
"Other non-OECD Asia" 202.7 236.0
"OECD Europe" 173.8 239.9
"Other non-OECD Americas" 193.1 242.3
"India" 173.8 260.6
"Mexico/Chile" 221.1 269.8""")

df = pd.read_csv(data, sep="\s+", quotechar='"')
df = df.set_index("Country").sort_values("2015")
df["change"] = df["2015"] / df["1990"] - 1


plt.figure(figsize=(12,6))
y_range = np.arange(1, len(df.index) + 1)
colors = np.where(df['2015'] > df['1990'], '#d9d9d9', '#d57883')
plt.hlines(y=y_range, xmin=df['1990'], xmax=df['2015'],
           color=colors, lw=10)
plt.scatter(df['1990'], y_range, color='#0096d7', s=200, label='1990', zorder=3)
plt.scatter(df['2015'], y_range, color='#003953', s=200 , label='2015', zorder=3)
for (_, row), y in zip(df.iterrows(), y_range):
    plt.annotate(f"{row['change']:+.0%}", (max(row["1990"], row["2015"]) + 4, y - 0.25))
plt.legend(ncol=2, bbox_to_anchor=(1., 1.01), loc="lower right", frameon=False)

plt.yticks(y_range, df.index)
plt.title("Energy productivity in selected countries and regions, 1990 and 2015\nBillion dollars GDP per quadrillion BTU", loc='left')
plt.xlim(50, 300)
plt.gcf().subplots_adjust(left=0.35)
plt.tight_layout()
plt.show()

thank you however, it is giving me error I have used your data and python (TypeError: 'int' object is not callable) any help Regards Mazin — Mazin Almurrani, Mar 10 '20 at 02:22

What is the name of this plot that has rows with two connected dots?

3 Answers3