Histograms and Ranges
Distributions and the area principle
There is an existing chapter that fully describes histograms. We highly recommend you read the textbook, then review the Data 6 lecture notebook.
Ranges
np.arange
is a NumPy function useful for producing sequences of equally spaced numbers. Read the chapter for more details.
Read Inferential Thinking
Read Ch 5.2, which describes the np.arange
function.
Before continuing, make sure that you:
- Know that the full function signature of
np.arange(start, end, step)
. - Know what the default arguments for
start
and/orstep
are when you pass in one or two arguments. - Remember that a range always includes its
start
value, but does not include itsend
value. It counts up bystep
, and it stops before it gets to theend
.
Histograms
Read Inferential Thinking
Read Ch 7.2, which describes histograms in detail.
Before continuing, make sure that you:
- Understand terminology related to histograms:
- Bins (lower bound, upper bound)
- Density, area, proportion.
- Can use the area principle to explain histogram shape and bar density, area, and dimensions.
- Can compute area and proportion from bar dimensions.
- Can determine use cases for using bar charts over histograms, and vice versa.
- Can use the
hist
method and specify the optional parameterdensity
asTrue
orFalse
Lecture Notebook
This notebook mostly covers the hist
Table method in the datascience
package. See the Data 6 Python Reference for full information.
tbl.hist(column)
: This table method has many optional arguments, but we highlight the most important ones here:
bins
: Specify bounds of bins, as an array. All but the last element of the array specifies bin lower bounds; the last element specifies the upper bound of the rightmost bin. If not specified, the default produces 10 equally spaced bins.density
: Boolean value (True
orFalse
).True
by default calculates height as percent per unit. IfFalse
, calculates height as count in bin.