17

I hesitate whether to ask this here in the stats StackExchange or in the linguistics/English one, but I reckon there might be more language-nitpicking users on here than stats-savvy users in the other forum ;)

I often read reports that mention correlation as a verb in the active voice, as in "We then correlated A with B and found...". To me, this verb only makes sense in the passive voice, as when saying for instance that "We found that A and B were significantly correlated". I might be wrong that this really constitutes active vs passive voice grammatically, but what I describe is the difference between doing something to A and B such that they each end up changed, versus computing a third variable (e.g. an R coeff) from them.

One can, of course, actively DEcorrelate two variables, but it seems to me that to "correlate" them, rather than referring to something active, is simply used as a shorthand for checking whether a significant such correlation exists!

Am I wrong? Does it make any other sense statistically to say that you [actively] correlated A with B?

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
z8080
  • 1,598
  • 1
  • 19
  • 38
  • 3
    From my experience your first paragraph is correct. – Nick Cox Mar 26 '18 at 12:00
  • 1
    I share your preference, but this largely seems to be cultural, with different fields using it differently. My impression is that the usage you dislike is more common in the social sciences, but I'm unsure. And for what it's worth, there's probably bigger issues for us to work on, terminology-wise: https://stats.stackexchange.com/questions/202879/what-are-the-most-misused-statistics-terms-that-we-should-care-to-correct – mkt Mar 26 '18 at 12:03
  • 1
    you're definitely right that there are more important misnomers to worry about. Thanks anyway for your response! – z8080 Mar 26 '18 at 13:04
  • 1
    That is not an active/passive distinction. That is using "correlate" as a verb, and "correlate" as an adjective. An example of passive is, "A and B were correlated" which is *completely* ambiguous. – AdamO Mar 26 '18 at 14:59
  • 2
    Note that "A correlates with B" is also in the active voice: the verb is ergative. – Scortchi - Reinstate Monica Mar 26 '18 at 15:19
  • 1
    @AdamO: I agree that calling it passive/active (voice) is not correct, but vb. vs adj. is not where the distinction lies with either. In the ( I think) correct usage form, it can well be a verb also, e.g. "we found that A correlated with B" – z8080 Mar 26 '18 at 15:41
  • @z8080 I think you primarily see correlate as a verb, whereas I do not. As a note, if you say "A and B were correlated" it will be universally interpreted that you *already* performed the "correlating" and are reporting a finding: reject $H_0$ that $\rho = 0$. I see correlation as a noun and correlated as an adjective, and "correlating" is not in my vocabulary. The verb usage conveys a goal, as an analyst, to *induce* a correlation through some means. That would be questionable research at best. As a reviewer I would ask for a lot of clarification if you used the word in that sense. – AdamO Mar 26 '18 at 16:01
  • Do you find this phrase "A correlates to B" as acceptable as "A is correlated to B?" – Aksakal Mar 26 '18 at 17:45

6 Answers6

15

I see where you are coming with this – if you say something like "we correlated A with B", you might risk giving the impression that you introduced correlation between A and B where perhaps none existed before.

In my view, there are better ways to say this, such as: "we investigated whether A and B were correlated" or "we studied the (linear?) association/relationship between A and B".

Can you get away with using "we correlated A and B" from a grammatical and/or statistical viewpoint? The answer is yes. Is that the best way you can get your point across? My own answer to this last question would be No.

Alexis
  • 26,219
  • 5
  • 78
  • 131
Isabella Ghement
  • 18,164
  • 2
  • 22
  • 46
  • 1
    Nice answer. I'd add that describing statistical methods is a great time to use the passive voice. The implied subject in most sentences is the analyst. To use the active voice, many sentences begin with "we" meaning the statistician and their tapeworm. The passive voice lends itself well to omitting the subject from the sentence. "A was regressed upon B" is an example or "Pearson correlation was found/estimated for A and B". or even "A and B were summarized using means and correlation." – AdamO Mar 26 '18 at 17:26
  • 1
    @AdamO, this is not about passive vs. active. Consider these two phrases: "A is correlated to B" and "A correlates to B." I bet that OP would be Ok with both of these despite one being active. OP's objection is to intransitive use of the verb: "A correlates to B", as opposed to transitive "We correlate A to B" – Aksakal Mar 26 '18 at 17:42
  • @Aksakal I see, herein lies the confusion. Both statements can be interpreted as a "results" statement about the condition of A and B. The problem is using "correlate" to describe a statistical analysis. In that sense, one could write [past active], "We correlated A and B. We found A and B were not correlated (p >0.05)." [past progressive] "A and B were correlated. A and B were not found to be correlated (p > 0.05)". Merging these sentences together, you get a sense of how bad the writing can be. The analyst must be clear enough to distinguish a finding from a method. – AdamO Mar 26 '18 at 17:48
  • @Aksakal I would contend it is about passive/active because it seems to me the OP wants to reserve some tenses to be less ambiguous about the problems I'm mentioning. That's what the title of the post says, after all. You're also correct that it's not about passive/active because a good writer must avoid ambiguity in all cases, regardless of phrasing. – AdamO Mar 26 '18 at 17:49
  • @AdamO, the "problem" is that both "A correlates to B" (intransitive) and "we correlate A to B" (transitive) are accepted and in use; one can't simply declare either of them wrong – Aksakal Mar 26 '18 at 17:50
  • @Aksakal I would say your first example is transitive: the subject (A) and object (B) are clearly delineated by a verb. The use of "correlated to" is just another phrase, again trying to avoid the ambiguity of a method and a finding. An intransive use is "A and B correlate". [edited the last example] – AdamO Mar 26 '18 at 17:51
  • @AdamO, some people would disagree with you see [this discussion](https://forum.wordreference.com/threads/correlate-intransitive-verb.2424018/) – Aksakal Mar 26 '18 at 17:53
  • @Aksakal Hmm, the first response is a good one. He or she uses a different phrasing, "Correlates *with*". I feel that prepositions imply a sense of ordering. This is underscored by the statistical fact, not a grammatical one, that if A correlates with B, then B correlates with A. That fact is also only true of bivariate analyses. As I'm trying to underscore, avoid ambiguity. – AdamO Mar 26 '18 at 17:56
  • @AdamO, the difference between "correlates with" and "correlate to" could be just me :) I'm not a native speaker, so I mess up propositions left and right – Aksakal Mar 26 '18 at 17:58
10

I don't think this is a grammatical issue, just a question of how words are used, or should be best used, in practice.

A meta-lesson I have learned over several years is that a claim that something is ungrammatical is fragile. There is always another grammarian who can be found who will dispute the assertion. (I am of a generation firmly told never to split infinitives because, supposedly, the practice is totally ungrammatical; that was rebutted as bogus logic (a misconceived analogy with Latin) long before I was taught this in the 1960s; my teachers were, I guess now, just passing on what they had been told in their youth, and so forth. Nevertheless I still can't split an infinitive willingly.)

I would understand "we correlated $X$ and $Y$" easily as "we calculated the correlation between $X$ and $Y$". It's fairly common usage, I think. Even if it isn't common usage, I don't see what is ungrammatical about it. There is an associated question of how far the correlation exists as an inevitable consequence of the data, as a mathematical or even real fact, before its value is calculated, or indeed regardless of whether that is done. I can't say I have ever worried about that.

But I wouldn't want to write that in a paper or catch myself saying it in a presentation. That is mostly a question of personal style, and as always agreement and disagreement about style are both to be expected.

I can't imagine saying "We plotted $Y$ against $X$", because I would just say "Here is a plot..." or "Figure 2 is a plot ...". Similarly, at most, I would just say "The correlation is ...".

It's worth remembering that Francis Galton hijacked correlation, which was a fairly unusual but long-existing word, for the present statistical purpose. Now I guess that the statistical sense of correlation (or a more diluted or generalised sense of it) is primary usage.

Notes:

  1. You want nit-pickers to comment, so in that vein I will say that $A$ and $B$ are not congenial notation for variables, even in complete abstraction.

  2. Never heard of "decorrelated"!

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
  • 1
    Interesting and entertaining answer, thank you. Although I don't see "We plotted Y against X" being misleading/uncongenial in the same way as "We correlated X and Y", since you *are* actually plotting Y (as a function of X), whereas the person saying the latter likely does not mean he/she has forcibly made the variables be correlated when they weren't previously. And you are right about X&Y as opposed to A&B, haha! – z8080 Mar 26 '18 at 13:13
  • Also, I was using "decorrelated" in the sense of https://en.wikipedia.org/wiki/Decorrelation - am I wrong? – z8080 Mar 26 '18 at 13:13
  • Not wrong at all about _decorrelated_; that's just a case of what I have read and still remember. – Nick Cox Mar 26 '18 at 13:27
  • Galton's use of *correlation* is not as bad as his use of *regression*, which had decidedly eugenic overtones – Henry Mar 26 '18 at 15:44
  • 2
    A great deal of what was written on heredity by scientists from say 1850 to 1950 looks misguided if not obnoxious to many from a current perspective; Galton is among good company as well as bad. I don't think the fact that (e.g.) taller parents have on average children taller than average but shorter than themselves led anyone to advocate culling babies according to height (or length). Regress here means retreat or withdraw, not that anyone is backwards in a pejorative sense. – Nick Cox Mar 26 '18 at 15:54
  • 1
    @NickCox: my understanding is that Galton originally believed that the offspring of taller/cleverer/better parents would on average be closer to average than their parents and applied the pejorative *regress* to this, which stuck before he spotted that parents of taller/cleverer/better children would on average be closer to average than their children – Henry Mar 26 '18 at 16:10
  • 1
    I don't think Galton intended _regress_ to be pejorative, just factual about what happens. But I wouldn't object if somehow we could all start using different terms much more often. I've spent more time explaining the term to students than I would choose. – Nick Cox Mar 26 '18 at 16:18
  • I think there has to be an optimal height in terms of intelligence of its bearer. Taller guys have bigger skulls, and bigger brains. As we know from AI, more neurons means more power in the brain. The reason why whales are not smarter than us is because the bigger body needs more neurons to control it, and also bigger brains needs more energy. So there has to be an optimal height to maximize intelligence. I think it's around 6'1-6'3" – Aksakal Mar 26 '18 at 17:28
  • 1
    @Aksakal Not sure how serious that is, but I don't think that any such view, facetious or factitious, can be blamed on Galton. – Nick Cox Mar 26 '18 at 17:31
  • @NickCox height intelligence correlation is in [0.1-0.2 range](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3044837/), but it's lame to do it in a linear fashion. It doesn't account for bounding factors that I mentioned. That's why there has to be an optimum, it's not a monotonous relationship – Aksakal Mar 26 '18 at 17:35
  • 3
    @Aksakal, I believe the mechanism behind that is that people who are stunted physically, say due to malnutrition in childhood, do not reach their full potential height, & also don't reach their full potential IQ. This creates a marginal correlation, but the connection isn't causal. Re the scaling b/t body mass & brain mass, see the work of [Geoff West](https://www.chemistryworld.com/review/scale-the-universal-laws-of-life-and-death-in-organisms-cities-and-companies/3008350.article). – gung - Reinstate Monica Mar 26 '18 at 18:12
  • "I would understand "we correlated X and Y" easily as "we calculated the correlation between X and Y"." The potential ambiguity could be whether the situation is a) we set about to find the correlation because it was of interest and would have reported it no matter what, or b) we were studying a bunch of data, and in the process we came across this correlation. In terms of achieving a superior communication, I would think that "we correlated X and Y" should be advised against, regardless of whether on the basis of grammar per se. – nathanwww Mar 27 '18 at 20:15
  • 2
    @nathanwww: But how do "X & Y are correlated" or "X correlates with Y" help readers tell apart situations (a) & (b)? – Scortchi - Reinstate Monica Mar 29 '18 at 13:05
10

Correlate is now commonly used as a verb. You pointed to the use of this word as transitive vs. intransitive, and stated that the latter is right and the former is, perhaps, wrong.

Note, unlike you, I'm not framing this as the difference between active and passive forms, because that distinction is just a red herring in this case. Consider this, the form that you find more comfortable to use is "A is correlated to B" is passive. However, it's not the fact that it's passive that makes it more natural to you. It's that it's intransitive, as in active form "A correlates to B," as opposed to its transitive form "we correlate A to B," that makes it sound right to you.

I must agree that the intransitive form sounds more natural, both in passive and active forms. Moreover, when Galton first introduced the term, he used it only as an intransitive verb, in passive form, e.g. "the length of the arm is said to be correlated with that of the leg." According to Pearson, it was Galton who first defined the term as a statistical concept in "Co-relations and their Measurement, chiefly from Anthropometric Data” in 1888. Although the word itself was used before in other contexts. Pearson's paper "Notes on the History of Correlation" is here.

Now, I have to break a bad news: both forms have been in use for quite some time. Here's an example from The Standard American Encyclopedia of Arts... published in 1898! enter image description here enter image description here

-- verb intransitive – correlate, correlating. To have reciprocal relation, to be reciprocally relates, as father and son. -- verb transitive. To place in reciprocal relation: to determine the relations between, as between several objects or phenomena which bear a resemblance to one another

As you can see both intransitive and transitive forms are described, i.e. "A correlates to B" and "we correlate A to B" are both fine. See also this discussion.

The verb "correlate" was created by back-formation from the noun. For instance, apparently, a verb "translate" was created similarly from a noun "translation".

@kjetilbhalvorsen brought up an example "to google", but it's a different mechanism of word formation called verbing, and a special case of it too. Normally, verbing is making verbs from nouns like "medal" $to$ "to medal." In this case we take an eponym "Google" and make a verb "to google." It's similar to "Xerox" $\to$ "to xerox", and even an older example of a guy named Charles Boycott $\to$ "to boycott."

What's even more interesting about Google case, is that it's made from a recently made up word "googol."

Alexis
  • 26,219
  • 5
  • 78
  • 131
Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • Verb-creation in this way is very easy in english, a modern example "to google" – kjetil b halvorsen Mar 26 '18 at 15:04
  • @kjetilbhalvorsen, that's not back-formation though – Aksakal Mar 26 '18 at 15:38
  • 2
    I think this answer misses the key aspect of (in OP's words) *the difference between doing something to A and B such that they each end up changed, versus computing a third variable*. (I don't know the grammatical term for this.) – Richard Hardy Mar 26 '18 at 15:54
  • @RichardHardy thank you for comment, I updated my answer – Aksakal Mar 26 '18 at 16:02
  • 1
    (I don't understand medal: mdeal. There is a trivial typo in there, but more than that: Are you saying that giving a person a medal is to medal them?) – Nick Cox Mar 26 '18 at 16:34
  • @NickCox, to medal is to get a medal, it's *intransitive*, not *transitive*. It sounds awful, but it's used a lot at least in US media. Anyhow, it's better than "webinar" so I don't complain – Aksakal Mar 26 '18 at 16:55
  • 1
    Not heard of that in English.... – Nick Cox Mar 26 '18 at 16:56
  • 1
    Ah, *transitive* and *intransitive* – that was what I could not name. Thanks! – Richard Hardy Mar 26 '18 at 16:57
  • @NickCox, here's an example "women have out-medaled their male counterparts" from https://www.thrillist.com/news/nation/usa-medal-count-women-winning-more-than-men it's so common you wouldn't believe it – Aksakal Mar 26 '18 at 16:57
  • You're expecting me to know about _sport_? – Nick Cox Mar 26 '18 at 16:58
  • 1
    @NickCox, in US if you don't follow the sports you'll be always lonely at a water cooler at work – Aksakal Mar 26 '18 at 16:59
  • 7
    How could I be lonely with SE? – Nick Cox Mar 26 '18 at 17:00
  • 1
    @NickCox, "to medal" (as intransitive) is very common in coverage of the Olympics, for example. – gung - Reinstate Monica Mar 26 '18 at 18:00
  • 2
    @gung I always believe you. – Nick Cox Mar 26 '18 at 18:01
  • 4
    As well you should, @Nick. – gung - Reinstate Monica Mar 26 '18 at 18:14
  • very hard to pick AN answer for this question, as so much interesting discussion was generated by it - but yours is clearly a very nice one :) – z8080 Mar 27 '18 at 07:37
  • 1
    Technically, "A is correlated to B" can be seen as passive transitive, as in "A is correlated to B [by someone]". (Although I do agree that most of the time this isn't the desired interpretation) – Pedro A Mar 27 '18 at 12:26
  • @Hamsterrific, no it's an intransitive form of the verb, see the screenshot in my post or the link to the linguistic discussion – Aksakal Mar 27 '18 at 13:18
  • 1
    @Hamsterrific is correct. Intransitive verbs have no objects & therefore can't be used in the passive voice. There are two interpretations of "A is correlated to B": (1) the true/dynamic passive, with *is* as auxilliary verb & *correlated* as past participle of a transitive verb (e.g. "in the second step of the procedure A is correlated to B"); (2) the false/stative passive, with *is* as copula & *correlated* as adjective (e.g. "A is correlated to B, resulting in high variance inflation factors"). – Scortchi - Reinstate Monica Mar 28 '18 at 10:34
  • @Scortchi, see [this discussion](https://forum.wordreference.com/threads/correlate-intransitive-verb.2424018/), do you disagree with what they're saying? "A correlates with B" is intransitive as is "A is correlated with B." I'd like to hear this from a linguist though. The dictionary clearly asserts that this use is intransitive too, so you two are against the reputable sources, which doesn't necessarily mean that you're wrong, but me thinks you are in this case – Aksakal Mar 29 '18 at 21:12
  • That the verb in "A correlates with B" is intransitive I don't dispute. That *correlated* in "A is correlated with B" is the participle of an intransitive verb is asserted nowhere in that discussion or the dictionary you quote. In fact "Dancing is correlated with mathematics" is listed under the heading of 'transitive'. – Scortchi - Reinstate Monica Mar 29 '18 at 22:37
3

"Correlate" is a back formation of "correlation", which comes from "co" (with) and "relation". Which I suppose is a bit redundant, as a relation is always with something else. It would be acceptable to say "We related X to Y", so I think that from a "lay" perspective, it makes sense to say "We correlated X to Y". One could argue that in a math context, "correlate" has a specific meaning that precludes this use, but that raises questions such as "What is that meaning?" "How was it established?", and "In what circumstances is it reasonable to call for math specific usage?". For instance, there was a Jeopardy! clue along the lines of "It's the set of points within a fixed distance of a central point." The "correct" response was "What is sphere", but mathematically the correct response was "What is ball?" Even though they were discussing math, this is a program directed at the general populace, so making the distinction was reasonable.

So I would say that it is reasonable to make the distinction yourself, and even reasonable to expect someone speaking to a math audience to make the distinction, but it's acceptable in more lay contexts to not do so.

I might be wrong that this really constitutes active vs passive voice grammatically

I think you are. Generally speaking, if something is in the passive voice, then you can add a "by ..." at the end, e.g. "The passive voice is frequently used [by writers]".

but what I describe is the difference between doing something to A and B such that they each end up changed

I don't think that's an accurate description. If someone were to say "We compared A and B", would they be implying that A and B were changed? Just because something is grammatically the object of a verb, doesn't mean that anything was actually done to it.

Acccumulation
  • 3,688
  • 5
  • 11
  • Are you sure it's "co" + "relation"? I think it came from Latin "cor" + "relatio", then it got into English as a whole word "correlation", it wasn't constructed from the parts you mention, but it's a topic for linguistics forum, not here – Aksakal Mar 26 '18 at 15:46
  • thanks! the distinction between the two usages would of course only be of interest for people sufficiently interested int he details of statistics to be on this forum and read my post :) – z8080 Mar 26 '18 at 15:51
  • 1
    @Aksakal that's almost correct. The Latin substantive "cor" means "heart". "Correlation" actually stems from the Latin prefix "co[n]-" (meaning "together") and the perfectum "relatum" (meaning "related"). – Jim Mar 26 '18 at 18:47
  • @Jim, in Latin "cor-" is a form of prefix "com-", which means "together," "with" etc., e.g. compare compile and correct, see [here](http://wordquests.info/cgi/ice2-for.cgi?file=/hsphere/local/home/scribejo/wordquests.info/htm/d0000526.htm&HIGHLIGHT=co). The form "co-" is used in "cohensive", i.e. before letter "h;" and "con-" is in "conceive", i.e. before "c." "Cor" is a noun, completely different word, of course. It's a weird language – Aksakal Mar 26 '18 at 19:25
  • @Jim, "com-" converts into "cor-" before letter "r", that's why it is "correlation" and not "corelation" or "comrelation" – Aksakal Mar 26 '18 at 19:32
2

I don't think this is nitpicking at all.

The first time I heard someone say "We correlated A with B", the speaker had the ability to influence A. I took their saying to mean "A and B were first uncorrelated, but we then altered A so as to have it strongly correlated with B". I spent a lot of time trying to figure out why they had done this. Eventually I realized that they meant "we found a correlation between A and B", and their motivation became much more clear at that point.

Cliff AB
  • 17,741
  • 1
  • 39
  • 84
0

This usage of the verb correlate may be uncommon but it is grammatically correct since it can be used as a transitive verb.

correlate: to present or set forth so as to show relationship. "He correlates the findings of the scientists, the psychologists, and the mystics."

See this Definition for reference.