Lecture 25 – Visualizing Numerical Variables

Data 6, Summer 2021

Review: bar charts

Histograms

Aside: can confirm results using where and are.between

Why do we need density = False?

Look at the histogram that results if we don't set density = False.

This is a perfectly valid histogram too, but it's not one that we will study in this class.

Quick Check 1

Customization

We can use the same customization arguments with hist as we did with barh.

Choosing bins

np.arange, revisited

Let's look at another column.

Before setting bins, it's a good idea to look at the smallest and largest values in the column.

Overlaid and side-by-side histograms

One category is 'time' – we can make separate histograms for every unique value in 'time'. As a reminder, there are two unique times, 'Lunch' and 'Dinner', so we should expect to see two histograms.

If we want these on separate axes:

Note that for whatever reason, using group, overlay, and bins with an array all at the same time doesn't work. (I've raised the issue with the folks who maintain the datascience module.)

We could separate by other columns, like 'day'.

There's too much going on there – but you can click the legend to hide certain days.

Quick Check 2

Documentation

Run the following cell.