Testing Categorical Variables With Three Levels

Set up simulated dataset

library(tidyverse)
library(sciplot)
library(cowplot)
set.seed(1)
n <- 90 # number of subjects

d <- data.frame(subID = rep(paste("p",seq(n),sep=""), each=1),
                accuracy = c(pmin(1,rnorm(n/3,mean=0.8,sd=0.15)),rep(pmin(1,rnorm(n/3,mean=0.65,sd=0.15)),each=2)),
                condition = rep(c("Massed", "Spaced","Random"), each=n/3))

adults <- d

Plot the data

sumCond <-  d %>%
  group_by(condition) %>%
  summarize(meanAcc=mean(accuracy),seAcc=se(accuracy))

ggplot(sumCond,aes(condition,meanAcc,fill=condition,color=condition)) +
  geom_bar(stat="identity",color="black",alpha=0.5)+
  geom_jitter(data=d,aes(y=accuracy,color=condition),width=0.1)+
  geom_errorbar(aes(ymin=meanAcc-seAcc,ymax=meanAcc+seAcc),color="black",width=0)+
  scale_color_brewer(type="qual",palette="Set1")+
  scale_fill_brewer(type="qual",palette="Set1")+
  ylab("accuracy")

Hypothesis-driven approach

Hypothesis: Massed > Random == Spaced

For example, assume that your hypothesis is that accuracy will be higher in the Massed condition than in the Random or Spaced condition (and that the Random and Spaced condition will be roughly equivalent). Then you might code this hypothesis in the focal contrast code: (Massed,Random,Spaced)=(0.67,-0.33,-0.33)

The values of the contrast code should:

sum to (roughly) 0
be unit-weighted, if possible (difference of 1)

d$c1 <- ifelse(d$condition=="Massed",0.67,-0.33)

The basic approach that you might use to testing this hypothesis is the Abelson & Prentice approach (contrast + residual test). An alternative (more controversial) would be to simply use a single-contrast approach.

Abelson & Prentice

Create a second contrast code that is orthogonal to c1

c2 = (0,0.5,-0.5)

c2 is orthogonal to c1 because 0.67 x 0 + -0.33 x 0.5 + -0.33 x -0.5 = 0.

If there were more than 3 levels, say m total levels, we would create additional orthogonal contrasts (for a total of m - 1 orthogonal contrasts).

d$c2 <- ifelse(d$condition=="Massed",0, 
               ifelse(d$condition=="Random",0.5,-0.5))

Now fit the model with c1 and c2

m <- lm(accuracy~c1+c2, data=d)
summary(m)

## 
## Call:
## lm(formula = accuracy ~ c1 + c2, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34237 -0.07579 -0.00579  0.08899  0.27077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.71620    0.01315  54.481  < 2e-16 ***
## c1           0.14025    0.02789   5.029 2.62e-06 ***
## c2           0.01275    0.03220   0.396    0.693    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1247 on 87 degrees of freedom
## Multiple R-squared:  0.2263, Adjusted R-squared:  0.2086 
## F-statistic: 12.73 on 2 and 87 DF,  p-value: 1.419e-05

If c1 is significant and c2 is non-significant, you find evidence for your hypothesis c1.

The reason to test c2 is to account for residual variance in the model, essentially to be sure that our focal contrast explains the variance adequately. If there were more than three levels, we would test all of the residual contrasts (c2, c3,…) as a group to see if they are (jointly) significant. Since we only have one residual contrast, we only need to check whether c2 is non-signficant/ does not explain a meaningful amount of variance.

Single-contrast

The main controversial aspect of the Abelson & Prentice approach is that one is left interpreting a null effect (c2 is non-significant) as evidence of absence, i.e. that there is no effect. This goes against the basic tenets of null hypothesis testing. The single-contrast approach (Richter, 2015) says “why not just do away with this and simply test the hypothesis you are interested in?”. In other words, on this approach (more controversially), you simply test c1.

m <- lm(accuracy~c1, data=d)
summary(m)

## 
## Call:
## lm(formula = accuracy ~ c1, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34237 -0.07906 -0.00579  0.08463  0.27714 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.71620    0.01308  54.744  < 2e-16 ***
## c1           0.14025    0.02775   5.054 2.33e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1241 on 88 degrees of freedom
## Multiple R-squared:  0.2249, Adjusted R-squared:  0.2161 
## F-statistic: 25.54 on 1 and 88 DF,  p-value: 2.332e-06

What about a different hypothesis?

You might have a different, hypothesis, e.g. a linear hypothesis such as Massed > Random > Spaced. You would test this hypothesis in an analogous way.

#Massed > Random > Spaced
d$c1 <- ifelse(d$condition=="Massed",0.5, 
               ifelse(d$condition=="Random",0,-0.5))

d$c2 <- ifelse(d$condition=="Massed",-0.33, 
               ifelse(d$condition=="Random",0.67,-0.33))

#fit model
#Abelson & Prentice
m <- lm(accuracy~c1+c2, data=d)
summary(m)

## 
## Call:
## lm(formula = accuracy ~ c1 + c2, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34237 -0.07579 -0.00579  0.08899  0.27077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.71687    0.01315  54.532  < 2e-16 ***
## c1           0.14662    0.03220   4.554 1.71e-05 ***
## c2          -0.06057    0.02789  -2.172   0.0326 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1247 on 87 degrees of freedom
## Multiple R-squared:  0.2263, Adjusted R-squared:  0.2086 
## F-statistic: 12.73 on 2 and 87 DF,  p-value: 1.419e-05

#single-contrast
m <- lm(accuracy~c1, data=d)
summary(m)

## 
## Call:
## lm(formula = accuracy ~ c1, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32218 -0.08352 -0.00225  0.09483  0.23039 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.71667    0.01342   53.40  < 2e-16 ***
## c1           0.14662    0.03287    4.46 2.41e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1273 on 88 degrees of freedom
## Multiple R-squared:  0.1844, Adjusted R-squared:  0.1751 
## F-statistic: 19.89 on 1 and 88 DF,  p-value: 2.408e-05

In the Abelson & Prentice case, c2 is also significant, which weakens the evidence for c1.

Agnostic approach

In this approach, we don’t have a specific hypothesis about the levels of the categorical variable, so we simply want to see if there are differences across our condition levels.

If we plan to test 2 or 3 (pairwise) comparisons, the best approach is probably Fisher LSD protected testing.

In this method, we first test the multi-df “overall” effect of our categorical variable. If this test is non-significant, we stop (there’s no significant effect of condition). If it is significant, we can then move on to test all non-orthogonal contrasts with dummy codes.

#agnostic approach
#test overall effect of condition
m1 <- lm(accuracy~condition, data=d)
m2 <- lm(accuracy~1, data=d)
anova(m1,m2)

## Analysis of Variance Table
## 
## Model 1: accuracy ~ condition
## Model 2: accuracy ~ 1
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     87 1.3531                                  
## 2     89 1.7489 -2  -0.39584 12.726 1.419e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#2 df test significant.

Since the overall effect of condition is significant, we can now move on to testing the pairwise comparisons between different levels with dummy codes.

#now test pairwise w/ orthogonal contrasts
 
#Massed vs. Random
#Random is reference level
d$Massed_vs_Random <- ifelse(d$condition=="Massed",1,0)
d$Spaced_vs_Random <- ifelse(d$condition=="Spaced",1,0)

m <- lm(accuracy~Massed_vs_Random+Spaced_vs_Random,data=d)
summary(m)

## 
## Call:
## lm(formula = accuracy ~ Massed_vs_Random + Spaced_vs_Random, 
##     data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34237 -0.07579 -0.00579  0.08899  0.27077 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.67629    0.02277  29.703  < 2e-16 ***
## Massed_vs_Random  0.13388    0.03220   4.158 7.51e-05 ***
## Spaced_vs_Random -0.01275    0.03220  -0.396    0.693    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1247 on 87 degrees of freedom
## Multiple R-squared:  0.2263, Adjusted R-squared:  0.2086 
## F-statistic: 12.73 on 2 and 87 DF,  p-value: 1.419e-05

#Massed vs. Spaced
#Spaced is the reference level
d$Massed_vs_Spaced <- ifelse(d$condition=="Massed",1,0)
d$Random_vs_Spaced <- ifelse(d$condition=="Random",1,0)
m <- lm(accuracy~Massed_vs_Spaced+Random_vs_Spaced,data=d)
summary(m)

## 
## Call:
## lm(formula = accuracy ~ Massed_vs_Spaced + Random_vs_Spaced, 
##     data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34237 -0.07579 -0.00579  0.08899  0.27077 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.66354    0.02277  29.143  < 2e-16 ***
## Massed_vs_Spaced  0.14662    0.03220   4.554 1.71e-05 ***
## Random_vs_Spaced  0.01275    0.03220   0.396    0.693    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1247 on 87 degrees of freedom
## Multiple R-squared:  0.2263, Adjusted R-squared:  0.2086 
## F-statistic: 12.73 on 2 and 87 DF,  p-value: 1.419e-05

The Massed condition is signiciantly different from both the Spaced condition and the Random condition.

Note that Random_vs_Spaced and Spaced_vs_Random yield the same effect, just in opposite directions.

Testing interactions (hypothesis-driven approach)

For testing interactions with categorical variables that have more than two levels, the same basic approaches and logic apply. For instance, if we want to see if there is an interaction with age group (adults vs. kids), we can test this in a hyppthesis-driven manner as discussed above.

#add interaction with group
adults$group="adults"
kids <- data.frame(subID = rep(paste("p",seq(n),sep=""), each=1),
                accuracy = c(pmin(1,rnorm(n/3,mean=0.6,sd=0.2)),rep(pmin(1,rnorm(n/3,mean=0.6,sd=0.2)),each=2)),
                condition = rep(c("Massed", "Spaced","Random"), each=n/3),
                group="kids")

d <- rbind(adults,kids)

sumCond <-  d %>%
  group_by(group,condition) %>%
  summarize(meanAcc=mean(accuracy),seAcc=se(accuracy))

ggplot(sumCond,aes(condition,meanAcc,fill=condition,color=condition)) +
  geom_bar(stat="identity",color="black",alpha=0.5)+
  geom_jitter(data=d,aes(y=accuracy,color=condition),width=0.1)+
  geom_errorbar(aes(ymin=meanAcc-seAcc,ymax=meanAcc+seAcc),color="black",width=0)+
  scale_color_brewer(type="qual",palette="Set1")+
  scale_fill_brewer(type="qual",palette="Set1")+
  ylab("accuracy")+
  facet_wrap(~group)

Abelson & Prentice approach.

Fit the model with two orthogonal contrast codes, where contrast code 1 represents the hypothesis of interest. To test if our Massed > Random == Spaced hypothesis (contrast code c1) differs between age group, we would ask

is the c1 * group effect significant?
is the c2 * group effect non-significant?

#Massed > Random = Spaced
d$c1 <- ifelse(d$condition=="Massed",0.67,-0.33)

d$c2 <- ifelse(d$condition=="Massed",0, 
               ifelse(d$condition=="Random",0.5,-0.5))

#fit model
#Abelson & Prentice
m <- lm(accuracy~(c1+c2)*group, data=d)
summary(m)

## 
## Call:
## lm(formula = accuracy ~ (c1 + c2) * group, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34237 -0.09073  0.00349  0.10067  0.34507 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.71620    0.01561  45.867  < 2e-16 ***
## c1            0.14025    0.03312   4.234 3.71e-05 ***
## c2            0.01275    0.03825   0.333   0.7393    
## groupkids    -0.11679    0.02208  -5.289 3.66e-07 ***
## c1:groupkids -0.13287    0.04684  -2.836   0.0051 ** 
## c2:groupkids -0.02551    0.05409  -0.472   0.6377    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1481 on 174 degrees of freedom
## Multiple R-squared:  0.2105, Adjusted R-squared:  0.1878 
## F-statistic: 9.278 on 5 and 174 DF,  p-value: 7.545e-08

#c1*group is significant
#c2*group is not
#--> strong evidence for c1*group interaction

Single-contrast approach

For the single-contrast approach, one would simply test the interaction between contrast code c1 and group.

#single-contrast approach
#simply fit model with c1*group, rather than try to interpret null effect of c2 as evidence that c2 is not real
m <- lm(accuracy~c1*group, data=d)
summary(m)

## 
## Call:
## lm(formula = accuracy ~ c1 * group, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34237 -0.09711  0.00640  0.09536  0.33869 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.71620    0.01554  46.101  < 2e-16 ***
## c1            0.14025    0.03296   4.256 3.38e-05 ***
## groupkids    -0.11679    0.02197  -5.316 3.19e-07 ***
## c1:groupkids -0.13287    0.04661  -2.851  0.00488 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1474 on 176 degrees of freedom
## Multiple R-squared:  0.2095, Adjusted R-squared:  0.196 
## F-statistic: 15.55 on 3 and 176 DF,  p-value: 5.161e-09