Methods to compare and analyze two columns in a dataset

Question

I have a large dataset containing many different variables of weather, now I'm interested in comparing two columns with value of temp (call them V1 and V2), where V2 is the actual temp and V1 is the temp recorded by some air quality machines, each column has 35 values (temp recorded every hour for 35 hours) I need some method to tell how accurate this machine can report the temp by comparing the data with the actual temp. I tried to find the standard error using V1 and V2, but it doesn't seem to tell me anything. I need some numbers or explanations to show someone else how good is their product or what their problems are....Should I try fitting some regression lines for V2 and scatter plot V1 around it? I need to use R for this.

I don't know if this will help, but here is a graph of V1 and V2 (really not very accurate...):

Try [root mean squared error](http://www.inside-r.org/packages/cran/Metrics/docs/rmse) — Pierre L, May 26 '16 at 15:42
Although the OP mentions R, this seems like its really more of a statistical question to me. We could consider migrating it to [stats.SE]. — gung - Reinstate Monica, May 26 '16 at 15:46
@PierreLafortune and this number will tell me how big the error is? But it still is just a number, does it have any standards? Like if it is below a certain then it's good? — , May 26 '16 at 15:47
My first thought would be to calculate the correlation coefficient. Not sure how similar this is to @gung 's suggestion to use Lin's concordance coefficient. — R Greg Stacey, May 26 '16 at 17:02
@Qroid, correlations are not measures of agreement. To understand that better, it may help to read my answer here: [Does Spearman's r=0.38 indicate agreement?](http://stats.stackexchange.com/a/199714/7290) — gung - Reinstate Monica, May 26 '16 at 18:18
@gung If she's not worried about temperature magnitude, e.g. one thermometer is inside a hot air quality machine, and she just wants to quantify whether the thermometer is accurately measuring increases/decreases in temperature, would Pearson/Spearman correlation be useful then? I realize that might not be her question. p.s. Thanks for the link. That was a great answer. — R Greg Stacey, May 26 '16 at 18:36
@Qroid, this is probably better asked as a new question, so the information doesn't end up buried in comments. However, correlations don't only mess up the magnitude, they ignore differences in unit size (a 1-unit increase may actually be a 10-unit increase). Calibration & prediction are also related possibilities, but distinct goals. — gung - Reinstate Monica, May 26 '16 at 18:44

score 2 · Answer 1 · answered May 26 '16 at 15:42

2

From a statistical point of view, you are looking for measures of agreement. Fitting a regression model is a common and intuitive idea, but it doesn't measure agreement. To assess agreement, you can use Lin's concordance coefficient or the methods of Bland and Altman.

In R, I see that there is a function for the concordance coefficient in the epiR package (?epi.ccc)
In R, you can make Bland-Altman plots using the BlandAltmanLeh package. The vignette has some introductory information.

On the other hand, if you want to do calibration, or want to determine a prediction equation to convert the machine's readings to the best guess of what the temperature would be by the other method, then regression may be helpful.

answered May 26 '16 at 15:42

gung - Reinstate Monica

132,789
81
357
650

will t-test work? see if p-value is less than 0.05? – May 26 '16 at 15:54
Work for what? If you want to measure agreement, then you need to use methods for agreement. If you want calibration, you need to use those methods. Etc. – gung - Reinstate Monica May 26 '16 at 15:55
that's the problem, I'm not sure which results will help more, I guess I need the agreement methods to tell if the machines are accurate or not, but I should also need methods for calibration for those machines to be better – May 26 '16 at 16:11
If you aren't sure what you're trying to accomplish, it will be difficult for us to help you. You may want to read up on agreement [here](https://en.wikipedia.org/wiki/Inter-rater_reliability) & [here](http://www.john-uebersax.com/stat/agree.htm), and on calibration [here ("in regression", & "see also")](https://en.wikipedia.org/wiki/Calibration_%28statistics%29). – gung - Reinstate Monica May 26 '16 at 16:23

Methods to compare and analyze two columns in a dataset

1 Answers1