NumPy and Array Functions

Accessing Packages

This note has the following goals:

Array Functions

As data scientists we may find it useful to perform operations on arrays beyond simple element-wise arithmetic operations. We can do so via a range of functions from different sources.

Here are some starter arrays to get us going. What arrays are created?

from datascience import *

int_arr = make_array(3, -4, 0, 5, 2)
str_arr = make_array("cm", "m", "in", "ft", "yd")
empty_arr = make_array() # challenge
int_arr
array([ 3, -4,  0,  5,  2])
str_arr
array(['cm', 'm', 'in', 'ft', 'yd'],
      dtype='<U2')

An empty array is one with no elements:

empty_arr
array([], dtype=float64)

We will describe the syntax and terminology of built-in and NumPy functions below, then provide a table of operations. Here’s how we suggest working through this section:

  1. Don’t memorize these functions!
  2. Instead, remember there are two types of functions for arrays: built-in functions and NumPy functions.
  3. Get familiar with two tables, copied below. Get familiar with reading this tabl documentation to understand how each function works.
  4. When writing your programs, look through these functions and see which function(s) can compose your solution.

Built-in functions

Some built-in functions (i.e., included with Python) can take in arrays as arguments.

Here is a table of example built-in functions for arrays. Again, don’t memorize the functions. Rather, get familiar with reading and predicting their outputs.

Built-in Python functions that take in array arguments.
Expression and Return value Example(s)
len(arr) Length of an array, providing the number of elements it contains. Useful for determining the size of an array dynamically. len(str_arr) # 5
len(empty_arr) # 0
max(arr) The largest value within an array. max(int_arr) # 5
max(str_arr) # 'yd'
min(arr) The smallest value within an array. min(int_arr)
sum(arr) Sum of all values in an array. sum(int_arr) # 6
sum(str_arr) # TypeError

Remember that even though function names can be identical, call expressions can evaluate differently depending on the argument data type. len(arg), for example, returns an integer value indicating the “length” of the argument arg. If arg is a string, it returns the number of characters; if arg is an array, it returns the number of elements.

len(str_arr)
5
len(str_arr.item(0)) # what is the argument here?
2

As you may have noticed above, calling these functions on strings return some seemingly bizarre values. After all, what does it mean to get the maximum value of an array of strings?

Instead of erroring out, the Pythonic convention is to consider alphabetic sorting as a way of ordering elements—hence, "yd" comes alphabetically after "cm" and "ft", and so on.

We will not cover string comparisons in detail in this course. If you are curious about these algorithms, we encourage you to take a Data Structures course!

Let’s use these built-in functions to compute the average (mean) value of the below array. We will discuss measures of average much later in this course.

arr = make_array(30, -40, -4.5, 0, 35)
avg = sum(arr)/len(arr)
avg
4.0999999999999996

Due to approximations in how the computer stores and operates on floats, the above number is as close to 4.1 (the true numeric average) as we can get with our Python calculator. Take a computer systems or computer architecture course for more information!

NumPy functions

NumPy (pronounced “NUM-pie”) is a Python library with convenient and powerful modules and functions for manipulating arrays. Any time we want to use NumPy, we must write an import statement:

import numpy as np

arr = make_array(30, -40, -4.5, 0, 35)
arr
array([ 30. , -40. ,  -4.5,   0. ,  35. ])

After putting this statement at the top of our notebook, we can then prepend np. to call a NumPy function. The below NumPy function call computes averages much more conveniently than our clever (but verbose) expression with built-in function calls, though it still suffers from floating point approximations:

np.average(arr)
4.0999999999999996

NumPy Array Function Table

There are many, many types of NumPy array functions; the below table only scratches the surface of what is possible. Again, instead of memorizing functions, we encourage you to learn how to read documentation by considering the following:

  1. What is the function name? How does this name inform the function description?
  2. What is the return value of this function?
    • If the data type of the return value is an array, is it the same length as the original array? Is the function therefore operating element-wise on the original array?
    • If the data type of the return value is a single value, how is this value computed from the different elements of the original array?
  3. Is the function changing the contents of the original array?
A subset of NumPy array functions. A full reference is in the Data 6 reference sheet.
NumPy function Description
np.average(arr)
np.mean(arr)
The average (i.e., mean) value of arr
np.sum(arr) The sum of all elements in arr
np.prod(arr) The product of all elements in arr
np.count_nonzero(arr) The number of elements in arr that are not equal to 0
np.diff(arr) The difference between each element and the previous one value of arr. Returns an array of length 1 less than the original.
np.cumsum(arr) The cumulative sum of all elements in arr.
np.sqrt(arr) The square roots of each element in arr.
np.log(arr) The natural logarithm of each element in arr.
np.log10(arr) The base-10 logarithm of each element in arr.
np.sort(arr) Sort the elements in arr.

Refer to the lecture notebook for example call expressions involving these NumPy functions. Refer to the Data 6 reference sheet for all functions we will expect you to be familiar with (not memorize!) in this course.

External Reading

  • (mentioned in notes) Computational and Inferential Thinking, Ch 5.1
  • (optional) Tomas Beuzen. Python Programming for Data Science Ch 1.2.

References

U.S. Census Bureau, “EDUCATIONAL ATTAINMENT,” American Community Survey 5-Year Estimates Subject Tables, Table S1501, 2020, https://data.census.gov/table/ACSST5Y2020.S1501?q=2020+education&t=Age+and+Sex:Educational+Attainment&g=010XX00US$0400000, accessed on August 24, 2025.