Qualitative Coding

The previous lecture discussed the idea of qualitative coding, central to analysis of open-ended text and other qualitative data.

Coding: The process of translating written or visual material into standardized categories. Codes are the labels/tags for chunks of text, not the programming/coding we’ve been doing so far.

First, determine codebook. This is the set of category labels for the data.
Then, “code” the data. Apply codes (i.e., labels) to text.

Inter-rater agreement

However, the process of coding has historically needed to be completed (at least in part) by humans (i.e., raters).

It’s likely that if codes for a very large dataset were generated by a single rater, there might be risks to validity. After all, perhaps this rater would have certain preconceptions that encoded their way into labels. If codes for a very large dataset were generated by multiple human raters, there might be risks to reliability, because different raters may code the same datapoint differently.

In practice, multiple researchers collaborate to assign codes that are both reliable and valid:

Co-design codebook.
Then, work together by labeling a subset of the data separately, then coming together to discuss agreement. This assesses reliability of labels.
If they don’t agree, then they revisit the codebook and variable definitions, then try coding again. This also revisits the validity of coding process, potentially redefining how the concept is operationalized by the new codes (i.e., categories) in the variable codebook.
Repeat this process until reasonable levels of agreement are reached.
Then, independently code the rest of the dataset.

How do researchers reach “reasonable levels of agreement”? We define this idea as inter-rater agreement, also known as inter-rater reliability. One common quantitative measure is Cohen’s Kappa. Let’s read on.