Questions tagged [data-visualization]

Constructing meaningful and useful graphical representations of data. (If your question is only about how to get particular software to produce a specific effect, then it is likely not on topic here.)

Overview

Data visualization refers to techniques for presenting results in graphical form, such as histograms, scatterplots, or boxplots. Data visualization is a special challenge for data with high dimensionality.

If your question is only about how to get particular software to produce a specific effect, then it is likely not on topic here. Programming questions (for example, in Python, or in R with ggplot, etc.) for which you can supply a reproducible example are usually welcomed on StackOverflow.

References

The following question contains references to data visualization resources:

2831 questions
215
votes
4 answers

How to interpret a QQ plot

I am working with a small dataset (21 observations) and have the following normal QQ plot in R: Seeing that the plot does not support normality, what could I infer about the underlying distribution? It seems to me that a distribution more skewed…
JohnK
  • 18,298
  • 10
  • 60
  • 103
116
votes
4 answers

Assessing approximate distribution of data based on a histogram

Suppose I want to see whether my data is exponential based on a histogram (i.e. skewed to the right). Depending on how I group or bin the data, I can get wildly different histograms. One set of histograms will make is seem that the data is…
guestoeijreor
  • 1,161
  • 3
  • 8
  • 3
106
votes
11 answers

"Best" series of colors to use for differentiating series in publication-quality plots

Has any study been done on what are the best set of colors to use for showing multiple series on the same plot? I've just been using the defaults in matplotlib, and they look a little childish since they're all bright, primary colors.
Daisy Sophia Hollman
  • 1,203
  • 2
  • 9
  • 7
99
votes
1 answer

Interpreting plot.lm()

I had a question about interpreting the graphs generated by plot(lm) in R. I was wondering if you guys could tell me how to interpret the scale-location and leverage-residual plots? Any comments would be appreciated. Assume basic knowledge of…
Guest
  • 991
  • 2
  • 7
  • 3
89
votes
4 answers

How to produce a pretty plot of the results of k-means cluster analysis?

I'm using R to do K-means clustering. I'm using 14 variables to run K-means What is a pretty way to plot the results of K-means? Are there any existing implementations? Does having 14 variables complicate plotting the results? I found something…
83
votes
4 answers

How to visualize what canonical correlation analysis does (in comparison to what principal component analysis does)?

Canonical correlation analysis (CCA) is a technique related to principal component analysis (PCA). While it is easy to teach PCA or linear regression using a scatter plot (see a few thousand examples on google image search), I have not seen a…
78
votes
12 answers

Famous statistical wins and horror stories for teaching purposes

I am designing a one year program in data analysis with a local community college. The program aims to prepare students to handle basic tasks in data analysis, visualization and summarization, advanced Excel skills and R programming. I would like…
72
votes
3 answers

How to actually plot a sample tree from randomForest::getTree()?

Anyone got library or code suggestions on how to actually plot a couple of sample trees from: getTree(rfobj, k, labelVar=TRUE) (Yes I know you're not supposed to do this operationally, RF is a blackbox, etc etc. I want to visually sanity-check a…
smci
  • 1,456
  • 1
  • 13
  • 20
70
votes
3 answers

When are Log scales appropriate?

I've read that using log scales when charting/graphing is appropriate in certain circumstances, like the y-axis in a time series chart. However, I've not been able to find a definitive explanation as to why that's the case, or when else it would be…
dav
  • 1,551
  • 2
  • 15
  • 23
66
votes
9 answers

How to visualize what ANOVA does?

What way (ways?) is there to visually explain what is ANOVA? Any references, link(s) (R packages?) will be welcomed.
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
62
votes
12 answers

Software needed to scrape data from graph

Anybody have any experience with software (preferably free, preferably open source) that will take an image of data plotted on cartesian coordinates (a standard, everyday plot) and extract the coordinates of the points plotted on the…
Alex Holcombe
  • 519
  • 1
  • 7
  • 9
59
votes
2 answers

How can I change the title of a legend in ggplot2?

I have a plot I'm making in ggplot2 to summarize data that are from a 2 x 4 x 3 celled dataset. I have been able to make panels for the 2-leveled variable using facet_grid(. ~ Age) and to set the x and y axes using aes(x=4leveledVariable, y=DV). I…
russellpierce
  • 17,079
  • 16
  • 67
  • 98
56
votes
7 answers

Graph for relationship between two ordinal variables

What is an appropriate graph to illustrate the relationship between two ordinal variables? A few options I can think of: Scatter plot with added random jitter to stop points hiding each other. Apparently a standard graphic - Minitab calls this an…
Silverfish
  • 20,678
  • 23
  • 92
  • 180
55
votes
4 answers

How to visualize a fitted multiple regression model?

I am currently writing a paper with several multiple regression analyses. While visualizing univariate linear regression is easy via scatter plots, I was wondering whether there is any good way to visualize multiple linear regressions? I am…
Shawn Wang
  • 1,245
  • 3
  • 12
  • 12
55
votes
6 answers

How to determine best cutoff point and its confidence interval using ROC curve in R?

I have the data of a test that could be used to distinguish normal and tumor cells. According to ROC curve it looks good for this purpose (area under curve is 0.9): My questions are: How to determine cutoff point for this test and its confidence…
Yuriy Petrovskiy
  • 4,081
  • 7
  • 25
  • 30
1
2 3
99 100