1

I would like to get some advice on running a customized contrast analysis. I have looked at a few vignettes and I think I get the basic idea, but I'm not sure how to adapt them for my purpose.

To make things simple, let's just say that in this survey, people rated on a scale how much they want to visit cities in the east coast and west coast of the US. So a response like "New York, 10" means "I would very much love to visit New York" and "Seattle, 4" means "I don't really want to go to Seattle".

This dataset was collected in order to test the following three (competing) hypotheses.

I) Ratings for east coast cities > Ratings for west coast cities.

II) Ratings for east coast cities are comparable to each other. (New York = Boston = DC and so on. No claim about west coast cities.)

III) New York > all other cities.

I have two main questions.

1) I understand that hypotheses I and III can be coded as contrasts (non-orthogonal ones) and run in a single regression. But how do I include Hypothesis II in the same contrast analysis? (I'm assuming I should test all of them at once.)

If a contrast analysis isn't appropriate, what should I use? Some kind of ordinal ANOVA for the relevant cities? Edit: I did some Googling and found that a nested ANOVA might be suitable for Hypothesis II, but I'm not sure.

Are there any packages or vignettes I should use? I am hoping to run an ordinal analysis if that's helpful.

2) In the vignettes I saw, the output of a regression with contrasts indicate whether it is significant (t and p values), but not the direction of the contrast. But Hypotheses I and III are basically one-tailed hypotheses. How can I check for that?

intasker
  • 35
  • 5
  • Can I ask what software are you using? – Sal Mangiafico Jan 18 '20 at 19:58
  • I'm using R and the "ordinal" package, because the data is ordinal – intasker Jan 18 '20 at 20:16
  • Okay. The `ordinal` package is supported by the `emmeans` package. It should have no problems handling the contrasts for this type of model. (I'm assuming the word "nested" in the title isn't doing any work, but I suspect it would handle the contrasts in a nested model as well). It should be able to address all the hypotheses you are interested in. Give me a little bit, and I'll try to mock up a toy example in an answer. – Sal Mangiafico Jan 18 '20 at 20:59

1 Answers1

0

At the time of writing, this is a very software-heavy answer. The following is R code, relying on the emmeans package to analysis custom contrasts for a model fit with the ordinal package. I suspect that emmeans will properly handle custom contrasts for supported models even if the design is more complex.

One thing to note here is that the classic way to analyze custom contrasts in R is with aov that limits the number of contrasts to the degrees of freedom in the e.g. treatment effect. However, emmeans and multcomp allow for any number of custom contrasts, and often a p value correction is applied to account for the multiple hypothesis tests.

Sources, with the caveat that I am the author of these webpages:

http://rcompanion.org/rcompanion/h_01.html

http://rcompanion.org/handbook/G_07.html

### Install packages ###
if(!require(ordinal)){install.packages("ordinal")}
if(!require(car)){install.packages("car")}
if(!require(RVAideMemoire)){install.packages("RVAideMemoire")}
if(!require(emmeans)){install.packages("emmeans")}
###                  ###

Input = ("
City           Response
 NY             4
 NY             5
 NY             6
 Boston         4
 Boston         5
 Boston         6
 DC             7
 DC             9
 DC            10

 Seattle        1
 Seattle        2
 Seattle        3
 SantaCruz      1
 SantaCruz      2
 SantaCruz      2
 Oxnard         1
 Oxnard         2
 Oxnard         4
")

Data = read.table(textConnection(Input),header=TRUE)


### Specify the order of factor levels. Otherwise R will alphabetize them.

Data$City = factor(Data$City, levels=unique(Data$City))


Data$Response = factor(Data$Response, ordered=TRUE)


boxplot(Response ~ City,
        data = Data,
        ylab="Response",
        xlab="City")

Box plot

###  You need to look at order of factor levels to determine the contrasts

levels(Data$City)

   ### "NY"        "Boston"    "DC"        "Seattle"   "SantaCruz" "Oxnard"

library(ordinal)

model = clm(Response ~ City, data = Data, threshold="equidistant")

   ### Normally you don't need threshold="equidistant"

library(car)

library(RVAideMemoire)

Anova(model,
      type = "II")

  ### Note that you shouldn't use car::Anova with `ordinal` models,
  ###  unless you use the modification in the `RVAideMemoire` package. 


library(emmeans)

em = emmeans(model, ~ City)


### Hypothesis I

Contrasts = list(East_vs_West = c(1,  1,  1,  -1,  -1,  -1))

   ### The column names match the order of levels of the treatment variable
   ### The coefficients of each row sum to 0

contrast(em, Contrasts, adjust="sidak")

   ### contrast     estimate   SE  df z.ratio p.value
   ### East_vs_West     23.4 6.03 Inf 3.878   0.0001 

   ### estimate is sided.  That is, it can be negative or positive.
   ### Here, obviously, East has higher values than West, and the estimate
   ###  is positive.

   ### For a one sided test, I guess you should just divide the p-value by two.

### Hypothesis II

Contrasts = list(NY_vs_Boston = c( 1, -1,  0,  0,  0,  0),
                 Boston_vs_DC = c( 0,  1, -1,  0,  0,  0),
                 DC_vs_NY     = c(-1,  0,  1,  0,  0,  0))

### Do omnibus test for differences among East Coast cities

Test = contrast(em, Contrasts)

test(Test, joint=TRUE)

   ### df1 df2 F.ratio p.value note
   ###   2 Inf   4.809 0.0082   d  
   ###
   ### d: df1 reduced due to linear dependence

### Look at pairwise comparisons of East Coast cities

contrast(em, Contrasts, adjust="sidak")

   ### contrast     estimate   SE  df z.ratio p.value
   ### NY_vs_Boston     0.00 1.49 Inf  0.000  1.0000 
   ### Boston_vs_DC    -5.78 2.01 Inf -2.881  0.0118 
   ### DC_vs_NY         5.78 2.01 Inf  2.881  0.0118 
   ###
   ### P value adjustment: sidak method for 3 tests 

## Hypothesis III

Contrasts = list(NY_vs_AllElse = c( 1, -1/5, -1/5, -1/5, -1/5, -1/5))

contrast(em, Contrasts, adjust="sidak")

   ###  contrast      estimate   SE  df z.ratio p.value
   ###  NY_vs_AllElse     2.36 1.28 Inf 1.841   0.0657
Sal Mangiafico
  • 7,128
  • 2
  • 10
  • 24
  • Thanks for all the detail and comments! A follow-up question: shouldn't these contrasts be analyzed as part of a single list, as in "contrast(em, list(NY_vs_AllElse, East_vs_West, NY_vs_Boston, Boston_vs_DC, ...) "? Would the p-values be too low, if the three hypotheses were tested separately? – intasker Jan 19 '20 at 16:27
  • For Hypothesis Test II, I added the code to test the omnibus hypothesis: That there is no difference among East Coast cities. – Sal Mangiafico Jan 19 '20 at 17:38
  • Yes, good point. If you have several single-degree-of-freedom contrasts (that is, ones that take only one line), you want to put them all together in one `list` so that *p* value adjust applies to all the tests in that list. If you have multiple-degree-of-freedom contrasts (ones that take more than one line), like the omnibus test in Hypothesis II, you'll have to do these separate so you can conduct the joint test. – Sal Mangiafico Jan 19 '20 at 17:47
  • Right, So if I understand correctly, Hypotheses I and III can be combined in a single `emmeans::contrast` command, but Hypothesis II will need to be run separately. I'm wondering about how to control for the multiple comparisons problem. Maybe I can run Hypothesis I and II with `emmeans::contrast(em ...)`, and then test Hypothesis II with a post-hoc pairwise analysis - running `emmeans::pairs(em)`, checking the differences between east coast cities? – intasker Jan 19 '20 at 22:17
  • I can think of a few options. a) Choose no adjustment of *p* values from `emmeans`, and then feed the *p* values manually through `p.adjust`, choosing the adjustment you want. Or manually perform the Šidák adjustment. b) You may decide to make no *p* value adjustments at all (except in the case of the multiple comparisons of the east coast cities). Consider if you had a factorial anova with factors A, B, and AxB. If you perform an anova on this model, would you adjust the *p* values for the test of each factor? I would tend not to. Is this case analogous? – Sal Mangiafico Jan 20 '20 at 17:04
  • These are interesting analogies. This problem feels more like scenario a than b. If this was scenario b, we would be modeling ratings as a function of these contrasts. (Also, thank you for working through this problem with me!) – intasker Jan 20 '20 at 19:10
  • Please note that a construct like `if(!require(car)){install.packages("car")}` does not work if the package is not installed, because it does not attach the package after it is installed. I have seen this in a number of your answers, not just this one. I think you should just put `require("...")` and leave it to users to install the packages they don't have. – Russ Lenth Jan 24 '20 at 19:26
  • Yes, that code installs packages only. I use `library(xyz)` later in the code. I like using the install code because it makes the example truly reproducible, though it could be a problem if people are using a metered internet connection. I use it for my own scripts as well. I'll update my example a bit to be explicit about what that code does. – Sal Mangiafico Jan 25 '20 at 20:43