Lecture 27 – Maps

Data 6, Summer 2021

Review: scatter plots and line plots

Maps with circles

Modifying circle appearance

labels

color_scale

The map above confirms the claims of this LA Times article from 1990, which says:

The company plans to open 10 stores in California in 1990 and 1991, with most to be located in the interior sections of the state. This year, it will open stores in Lancaster, Victorville, El Centro, Madera, Modesto, Ridgecrest and Stockton. In 1991, it plans stores in Elk Grove, Hanford and Bakersfield.

colors

It seems like most Walmarts in California are standard locations and only a few are Supercenters.

What about in the rest of the country?

In many large metro areas there is a concentration of standard Walmarts (blue). Supercenters are more common in the eastern part of the country.

Remember this data is from 2006; things have changed since then.

Quick Check 1

Maps with markers (pins)

marker_icon

Most icon names at this site work, but make sure to remove the term "glyphicon".

clustered_marker

Example: COVID cases

This data was pulled from Johns Hopkins' Center For Systems Science And Engineering on April 6th, 2021.

It describes the number of cumulative cases for each county, every day since January 22, 2020.

Let's aim to draw a map illustrating the average number of cases per day over the last 7 days in each county.

To do this, we take the number of cases on April 5, subtract from it the number of cases on March 29, and divide the result by 7.

We need to relabel our columns in order to prepare our table for Circle.map_table.

There's something weird – there are a few counties whose 7-day average is negative. This is almost certainly due to some data logging issues; we will need to drop these rows before continuing as they'll mess up our color scale.

Time to call Circle.map_table.

We can take things a step further by creating more informative labels.

Now each circle tells you the county name and the average number of COVID cases over the past 7 days in that county.

Extra: cumulative cases in Alameda county

Note: The exploration here won't be covered in lecture, and includes programming that is slightly more involved than you'll be responsible for. Nevertheless, you may find it interesting, so take a look!

The dataset has columns for each date; we want rows, because that's what plot expects.

That's not a problem:

What is a problem is that the date is not in a format that datascience recognizes as being a number. There's a solution; run the following cell to implement it.

Great, now run the following cell to draw the line plot:

Awesome. But what if we want the number of new cases per day? We can compute that too, using np.diff. np.diff subtracts consecutive elements in an array. (Notice that when we call np.diff on an array of length n, the result is an array of length n-1.)

We can use it on the 'Cases' column of alameda_rotated.

Hmm – there are a few jumps that don't quite seem right. What do you think happened? 🤔

(Hint: hover over the values for February 5, February 6, and February 7. What happens when you add the values for February 5 and February 6?)