Grouping

Aggregating values across rows.

Read Inferential Thinking

Read Ch 8.2, which describes grouping in detail.

Before continuing, make sure that:

  • You know that group with one argument produces counts per unique value in the grouped column.
  • You know that group with two arguments produces an aggregated value per unique value in the grouped column, and can potentially produce multiple columns based on which column value can be aggregated with the provided aggregator function.

You can also group data on multiple columns.

Read Inferential Thinking

Read Ch 8.3.1-8.3.2, which describes grouping by multiple columns.

Aggregating and Disaggregating

Grouping is a table operation most useful in translating between units of analysis. In particular, it lets us disaggregate data. Instead of just reporting an average across all datapoints, we can report averages as broken down by different subgroups.

Case Study: 1973 UC Berkeley Graduate Admissions

The Fall 1973 admitted graduate student population had a peculiar characteristic: Overall, women were admitted at lower rates than men to graduate school, seeming to suggest that the admission rates was biased. However, when the admission rates by department (e.g., by field of study) were studied, women were often admitted at higher rates than men. How is this possible?

This is a peculiar case study around Simpson’s Paradox: when the trends of subgroups are reversed or not seen as compared to the trends of the overall dataset. We will explore this in detail in the reading, in section, and in the project. Stay tuned!

External Reading

  • (mentioned in notes) Computational and Inferential Thinking, Ch 8.3
  • P. J. Bickel et al. ,Sex Bias in Graduate Admissions: Data from Berkeley. Science 187, 398-404(1975). DOI:10.1126/science.187.4175.398