Lecture 19 – Exploratory Data Analysis: Voter Targeting in Pennsylvania

Data 6, Summer 2021

Run the next cell to import all of our datasets, obtained from https://www.electionreturns.pa.gov/ and https://www.truckads.com/.

Step 1: Data Cleaning

To make this easier, let's cut it down to the columns we want. (For this section, don't re-run cells! You'll get an error.)

Step 2: Exploratory Data Analysis

Our questions for this analysis:

Let's learn more about where voters voted and how they voted. We're going to use group and pivot for this.

Recall: tbl.group("col", func) If func is not specified, by default finds the count of each unique value in "col". Otherwise, applies func to the grouped values in every other column.

tbl.pivot("col", "row", "vals", func) cross-classifies a dataset, making all the unique values in 1 column the new rows and all the unique values in the other column the new column labels. Then, it puts the values of "vals", with the function applied to each group, in the corresponding cells.

For example: http://data8.org/interactive_table_functions/

Now let's try "cross classifying"; this is similar to a 2 column group, but let's focus on a specific question:

How were the votes broken down by Media Market and party?

Or, in other words, what media markets provided most of the raw votes come from for each party? This is useful information because electoral college votes are based on state totals (i.e. it doesn't matter if Johnstown Altoona is very Republican if there aren't many voters there).

Additional Analysis: Targeting by Market and Understanding What Happened in 2020

If you're interested, check out this notebook by Ian that gives us more insight into the state by looking at more elections data.

You'll notice that the linked notebook uses a different syntax -- that's because it uses pandas instead of the datascience library that we're used to!

The datascience library is great because it's simple and easier to learn, but has a lot of limitations and requires a lot of work to do just this level of cleaning and EDA. You can learn in further data science classes about pandas, which is a lot more powerful, concise, and lets us do even more interesting analysis with data tables.