Lecture 26 – Visualizing Two Numerical Variables

Data 6, Summer 2021

Our first dataset today comes from Basketball Reference. It contains per-game averages of players in the 2019-2020 NBA season.

Run the cell below to load it in, select the relevant columns, and do some data cleaning.

Note: Most of the interesting data comes from the "better" players in the league; we will only look at players who averaged at least 10 points per game in the season. This isn't perfect, since there were plenty of good players who averaged less than 10 points per game.

A description of each column:

Review – bar charts and histograms

Bar charts

Histograms

Scatter plots

Example 1

Observation: On average, as the number of points a player averages increases, the number of assists they average also increases.

Example 2

Observation: on average, as the number of rebounds a player averages per game increases, the number of three point attempts they average per game decreases.

Quick Check 1

Observation: on average, as the number of points per game a player averages increases, three-point percentage neither increases nor decreases. (In other terms – it appears that PTS and 3P% are uncorrelated.)

More customization

Point size

Point color by grouping

Observation: Guards tend to have fewer rebounds and more three-point attempts than forwards, who tend to have more rebounds and fewer three-point attempts.

Labels

Line plots

Our second dataset also comes from Basketball Reference. This dataset contains team-based average statistics for each year.

A little bit about our new dataset:

Example 1

Observation: The league slowed down in the late 90s and early 2000s, but is speeding back up.

Example 2

Observation: The three-point shot has rapidly increased in popularity over the past decade.

Example 3

Quick Check 2