How to adjust chance level for logistic regression

First, let’s create a little dataset

Let’s say that we have

10 participants
who completed 10 trials each.
Each trial was an 8 alternative-forced-choice trial
Participants were assigned randomly to an experimental condition.

Let’s also assume that the true mean for participants in the first condition is 1/8, and the true mean for participants in the second condition is 1/2.

#load necessary packages
library(lme4)
library(car)
library(lmSupport)

####Create the dataset####

#create subject codes
subj=rep(c("subj1","subj2", "subj3", "subj4", "subj5", "subj6", "subj7", "subj8", "subj9", "subj10"),10)
#create a between-subjects condition variable 
cond=rep(rep(c("cond1", "cond2"),5),10)
#create a random sample of responses, with .5 probability of being correct overall
respCond1=sample(c(0,1), size=50, replace=T, prob=c(7/8,1/8))
respCond2=sample(c(0,1), size=50, replace=T, prob=c(0.5,0.5))
#combine responses so they correspond to the correct condition column
#note we are not building in dependencies between responses in subjects
resp=as.vector(rbind(respCond1, respCond2))

#put it all together
d=data.frame(subjects=subj, condition=cond, isRight=resp)

Let’s just check to make sure our dataframe has the structure we envision:

head(d,20)

##    subjects condition isRight
## 1     subj1     cond1       0
## 2     subj2     cond2       1
## 3     subj3     cond1       0
## 4     subj4     cond2       0
## 5     subj5     cond1       0
## 6     subj6     cond2       1
## 7     subj7     cond1       1
## 8     subj8     cond2       0
## 9     subj9     cond1       0
## 10   subj10     cond2       0
## 11    subj1     cond1       1
## 12    subj2     cond2       1
## 13    subj3     cond1       1
## 14    subj4     cond2       1
## 15    subj5     cond1       1
## 16    subj6     cond2       1
## 17    subj7     cond1       0
## 18    subj8     cond2       1
## 19    subj9     cond1       0
## 20   subj10     cond2       0

Testing performance against chance

Step 1: Add an offset column

To test performance against chance in logistic mixed-effects regression, we can simply add an offset column that captures the chance level on any given trial. Assuming we have an 8 AFC task, that chance level would be 1/8 for every trial. Note that we could also create a column with chance levels varying from trial to trial.

#create offset column
d$chance=1/8

head(d,20)

##    subjects condition isRight chance
## 1     subj1     cond1       0  0.125
## 2     subj2     cond2       1  0.125
## 3     subj3     cond1       0  0.125
## 4     subj4     cond2       0  0.125
## 5     subj5     cond1       0  0.125
## 6     subj6     cond2       1  0.125
## 7     subj7     cond1       1  0.125
## 8     subj8     cond2       0  0.125
## 9     subj9     cond1       0  0.125
## 10   subj10     cond2       0  0.125
## 11    subj1     cond1       1  0.125
## 12    subj2     cond2       1  0.125
## 13    subj3     cond1       1  0.125
## 14    subj4     cond2       1  0.125
## 15    subj5     cond1       1  0.125
## 16    subj6     cond2       1  0.125
## 17    subj7     cond1       0  0.125
## 18    subj8     cond2       1  0.125
## 19    subj9     cond1       0  0.125
## 20   subj10     cond2       0  0.125

Build a logistic mixed-effects model

Here, we will simply test whether people are above chance level overall in a logistic mixed-effects model, simply by adjusting our intercept to chance level through the offset() functionality. We need to also adjust the chance level values by the logit() function, since we are doing logistic regression.

#build the model, adjusting the intercept and including a by-subject random intercept
m=glmer(isRight~offset(logit(chance))+(1|subjects), data=d, family=binomial)
#output a summary of model fit
summary(m)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: isRight ~ offset(logit(chance)) + (1 | subjects)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    130.3    135.5    -63.2    126.3       98 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.0056 -0.6783 -0.5475  1.0950  1.8264 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subjects (Intercept) 0.3709   0.609   
## Number of obs: 100, groups:  subjects, 10
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.2258     0.2964   4.136 3.54e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall, people are performing above chance, z=4.1358066, p=3.537110^{-5}.

You can recreate the estimated mean response or probability of a correct response as estimated by the model like so:

#calculate estimated logit value by adding back the offset value we subtracted
interceptLogit=coefficients(summary(m))[1]+logit(1/8)
#transform into the odds value
odds=exp(interceptLogit)
#transform the odds into the corresponding probability
prob=odds/(1+odds)

The model estimates an overall mean of 0.3273686.

More complex models

If you want to test chance levels in different conditions or just generally fit more complex models, you can do this in the typical way. For instance, what if I want to know whether participants are performing above chance in condition 1 and/or in cond2?

Condition 1 Performance above chance?

#center the condition predictor on condition 1
d$condition1=varRecode(as.numeric(d$condition), c(1,2), c(0,1))
#Fit model with condition variable centered on condition 1
m=glmer(isRight~offset(logit(chance))+condition1+(1|subjects), data=d, family=binomial)
summary(m)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: isRight ~ offset(logit(chance)) + condition1 + (1 | subjects)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    122.5    130.3    -58.2    116.5       97 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.0000 -0.6014 -0.4685  1.0000  2.1344 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subjects (Intercept) 0        0       
## Number of obs: 100, groups:  subjects, 10
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   0.4296     0.3681   1.167  0.24323   
## condition1    1.5163     0.4642   3.266  0.00109 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr)
## condition1 -0.793

Participants in condition 1 are not performing above chance, p=0.2432284. But there is a significant condition difference, with people performing higher in cond2 than in cond1, p=0.0010891

Condition 2 Performance above chance?

#center the condition predictor on condition 2
d$condition2=varRecode(as.numeric(d$condition), c(1,2), c(-1,0))
#Fit model with condition variable centered on condition 1
m=glmer(isRight~offset(logit(chance))+condition2+(1|subjects), data=d, family=binomial)
summary(m)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: isRight ~ offset(logit(chance)) + condition2 + (1 | subjects)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    122.5    130.3    -58.2    116.5       97 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.0000 -0.6014 -0.4685  1.0000  2.1344 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subjects (Intercept) 0        0       
## Number of obs: 100, groups:  subjects, 10
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.9459     0.2828   6.880 5.99e-12 ***
## condition2    1.5163     0.4642   3.266  0.00109 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr)
## condition2 0.609

Participants in condition 2 are performing above chance, p=5.992242810^{-12}. The condition effect remains unchanged, of course.