We work with the schools
table from before.
from datascience import *
import numpy as np
schools = Table.read_table('data/cal_unis.csv' )
schools
University of California, Berkeley
UC
Berkeley
Alameda
42519
1869
University of California, Davis
UC
Davis
Yolo
39152
1905
University of California, Irvine
UC
Irvine
Orange
35220
1965
University of California, Los Angeles
UC
Los Angeles
Los Angeles
45428
1882
University of California, Merced
UC
Merced
Merced
8544
2005
University of California, Riverside
UC
Riverside
Riverside
23278
1954
University of California, San Diego
UC
San Diego
San Diego
38798
1960
University of California, Santa Barbara
UC
Santa Barbara
Santa Barbara
24346
1891
University of California, Santa Cruz
UC
Santa Cruz
Santa Cruz
19700
1965
California State University Maritime Academy
CSU
Vallejo
Solano
1017
1929
... (22 rows omitted)
Pull up the Python Reference and try these exercises yourself! They are roughly in order of increasing difficulty.
Exercise 1: Variable names
How do we get all the column labels of schools
?
('Name', 'Institution', 'City', 'County', 'Enrollment', 'Founded')
Exercise 2: Reorder columns
How do we reorder the columns to get the following table?
Hint: use one of select
, drop
, or with_columns
.
schools.select(
"Name" , "Founded" ,
"Institution" , "City" ,
"County" , "Enrollment"
)
University of California, Berkeley
1869
UC
Berkeley
Alameda
42519
University of California, Davis
1905
UC
Davis
Yolo
39152
University of California, Irvine
1965
UC
Irvine
Orange
35220
University of California, Los Angeles
1882
UC
Los Angeles
Los Angeles
45428
University of California, Merced
2005
UC
Merced
Merced
8544
University of California, Riverside
1954
UC
Riverside
Riverside
23278
University of California, San Diego
1960
UC
San Diego
San Diego
38798
University of California, Santa Barbara
1891
UC
Santa Barbara
Santa Barbara
24346
University of California, Santa Cruz
1965
UC
Santa Cruz
Santa Cruz
19700
California State University Maritime Academy
1929
CSU
Vallejo
Solano
1017
... (22 rows omitted)
Exercise 3: Filtering
Let’s reset the schools
table to our original dataset before continuing.
schools = Table.read_table('data/cal_unis.csv' )
University of California, Berkeley
UC
Berkeley
Alameda
42519
1869
University of California, Davis
UC
Davis
Yolo
39152
1905
University of California, Irvine
UC
Irvine
Orange
35220
1965
University of California, Los Angeles
UC
Los Angeles
Los Angeles
45428
1882
University of California, Merced
UC
Merced
Merced
8544
2005
University of California, Riverside
UC
Riverside
Riverside
23278
1954
University of California, San Diego
UC
San Diego
San Diego
38798
1960
University of California, Santa Barbara
UC
Santa Barbara
Santa Barbara
24346
1891
University of California, Santa Cruz
UC
Santa Cruz
Santa Cruz
19700
1965
California State University Maritime Academy
CSU
Vallejo
Solano
1017
1929
... (22 rows omitted)
How do we get a table with all of the UC schools?
schools.where("Institution" , "UC" )
University of California, Berkeley
UC
Berkeley
Alameda
42519
1869
University of California, Davis
UC
Davis
Yolo
39152
1905
University of California, Irvine
UC
Irvine
Orange
35220
1965
University of California, Los Angeles
UC
Los Angeles
Los Angeles
45428
1882
University of California, Merced
UC
Merced
Merced
8544
2005
University of California, Riverside
UC
Riverside
Riverside
23278
1954
University of California, San Diego
UC
San Diego
San Diego
38798
1960
University of California, Santa Barbara
UC
Santa Barbara
Santa Barbara
24346
1891
University of California, Santa Cruz
UC
Santa Cruz
Santa Cruz
19700
1965
How do we get a table with all the schools in Los Angeles?
schools.where("City" , "Los Angeles" )
University of California, Los Angeles
UC
Los Angeles
Los Angeles
45428
1882
California State University, Los Angeles
CSU
Los Angeles
Los Angeles
27685
1947
Exercise 4: Rename Columns
How do we update schools
such that the column Name is renamed University? Hint : Check out the method relabeled
.
schools = schools.relabeled("Name" , "University" )
schools
University of California, Berkeley
UC
Berkeley
Alameda
42519
1869
University of California, Davis
UC
Davis
Yolo
39152
1905
University of California, Irvine
UC
Irvine
Orange
35220
1965
University of California, Los Angeles
UC
Los Angeles
Los Angeles
45428
1882
University of California, Merced
UC
Merced
Merced
8544
2005
University of California, Riverside
UC
Riverside
Riverside
23278
1954
University of California, San Diego
UC
San Diego
San Diego
38798
1960
University of California, Santa Barbara
UC
Santa Barbara
Santa Barbara
24346
1891
University of California, Santa Cruz
UC
Santa Cruz
Santa Cruz
19700
1965
California State University Maritime Academy
CSU
Vallejo
Solano
1017
1929
... (22 rows omitted)
There are many ways to approach a problem. Suppose you didn’t know the method relabeled
existed:
schools = schools.with_column("University" , schools.column("Name" )).drop("Name" )
schools
UC
Berkeley
Alameda
42519
1869
University of California, Berkeley
UC
Davis
Yolo
39152
1905
University of California, Davis
UC
Irvine
Orange
35220
1965
University of California, Irvine
UC
Los Angeles
Los Angeles
45428
1882
University of California, Los Angeles
UC
Merced
Merced
8544
2005
University of California, Merced
UC
Riverside
Riverside
23278
1954
University of California, Riverside
UC
San Diego
San Diego
38798
1960
University of California, San Diego
UC
Santa Barbara
Santa Barbara
24346
1891
University of California, Santa Barbara
UC
Santa Cruz
Santa Cruz
19700
1965
University of California, Santa Cruz
CSU
Vallejo
Solano
1017
1929
California State University Maritime Academy
... (22 rows omitted)
Questions for you as you read the above code:
What methods are being chained together here? Which is executed first, with_column
or drop
?
When is the column
method being called?
What would happen if you switched the two methods? (try it out)
Why is University now the last column?
Exercise 5: Debugging show
The following cells exhibit a tricky, but potentially common bug. First, check out the Python Reference to understand what show
does.
schools = Table.read_table('data/cal_unis.csv' )
schools = schools.show(3 )
University of California, Berkeley
UC
Berkeley
Alameda
42519
1869
University of California, Davis
UC
Davis
Yolo
39152
1905
University of California, Irvine
UC
Irvine
Orange
35220
1965
... (29 rows omitted)
schools = schools.show(4 )
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[13] , line 1
----> 1 schools = schools . show (4 )
AttributeError : 'NoneType' object has no attribute 'show'
Like print
, show is for display purposes. It does not return anything—i.e., it returns None. The second cell above therefore assigns schools to None!
Despite this, show
—just like print
—is useful when you want to display intermediate output for debugging purposes.
schools = Table.read_table('data/cal_unis.csv' )
schools.show(3 ) # just the first three rows
schools = schools.relabeled("Name" , "University" )
schools.show(3 ) # and again
schools = schools.with_columns("Name" , "City" )
schools # the last evaluated value in a cell
# is displayed by default
University of California, Berkeley
UC
Berkeley
Alameda
42519
1869
University of California, Davis
UC
Davis
Yolo
39152
1905
University of California, Irvine
UC
Irvine
Orange
35220
1965
... (29 rows omitted)
University of California, Berkeley
UC
Berkeley
Alameda
42519
1869
University of California, Davis
UC
Davis
Yolo
39152
1905
University of California, Irvine
UC
Irvine
Orange
35220
1965
... (29 rows omitted)
University of California, Berkeley
UC
Berkeley
Alameda
42519
1869
City
University of California, Davis
UC
Davis
Yolo
39152
1905
City
University of California, Irvine
UC
Irvine
Orange
35220
1965
City
University of California, Los Angeles
UC
Los Angeles
Los Angeles
45428
1882
City
University of California, Merced
UC
Merced
Merced
8544
2005
City
University of California, Riverside
UC
Riverside
Riverside
23278
1954
City
University of California, San Diego
UC
San Diego
San Diego
38798
1960
City
University of California, Santa Barbara
UC
Santa Barbara
Santa Barbara
24346
1891
City
University of California, Santa Cruz
UC
Santa Cruz
Santa Cruz
19700
1965
City
California State University Maritime Academy
CSU
Vallejo
Solano
1017
1929
City
... (22 rows omitted)