We work with the schools
table from before.
from datascience import *
import numpy as np
schools = Table.read_table('data/cal_unis.csv' )
schools.show(3 )
California State Polytechnic University, Humboldt
CSU
Arcata
Humboldt
6025
1913
California State University, Bakersfield
CSU
Bakersfield
Kern
9613
1965
University of California, Berkeley
UC
Berkeley
Alameda
45307
1869
... (29 rows omitted)
Pull up the Python Reference and try these exercises yourself! They are roughly in order of increasing difficulty.
Exercise 1: Variable names
How do we get all the column labels of schools
?
('Name', 'Institution', 'City', 'County', 'Enrollment', 'Founded')
Exercise 2: Reorder columns
How do we reorder the columns, as below?
Hint: use one of select
, drop
, or with_columns
.
schools.select(
"Name" , "Founded" ,
"Institution" , "City" ,
"County" , "Enrollment"
)
California State Polytechnic University, Humboldt
1913
CSU
Arcata
Humboldt
6025
California State University, Bakersfield
1965
CSU
Bakersfield
Kern
9613
University of California, Berkeley
1869
UC
Berkeley
Alameda
45307
California State University Channel Islands
2002
CSU
Camarillo
Ventura
6128
California State University, Dominguez Hills
1960
CSU
Carson
Los Angeles
16426
California State University, Chico
1887
CSU
Chico
Butte
14183
University of California, Davis
1905
UC
Davis
Yolo
39679
California State University, Fresno
1911
CSU
Fresno
Fresno
23999
California State University, Fullerton
1957
CSU
Fullerton
Orange
40386
California State University, East Bay
1957
CSU
Hayward
Alameda
13673
... (22 rows omitted)
Exercise 3: Filtering
California State Polytechnic University, Humboldt
CSU
Arcata
Humboldt
6025
1913
California State University, Bakersfield
CSU
Bakersfield
Kern
9613
1965
University of California, Berkeley
UC
Berkeley
Alameda
45307
1869
... (29 rows omitted)
How do we get a table with only UC schools?
schools.where("Institution" , "UC" )
University of California, Berkeley
UC
Berkeley
Alameda
45307
1869
University of California, Davis
UC
Davis
Yolo
39679
1905
University of California, Irvine
UC
Irvine
Orange
35937
1965
University of California, Los Angeles
UC
Los Angeles
Los Angeles
46430
1882
University of California, Merced
UC
Merced
Merced
9110
2005
University of California, Riverside
UC
Riverside
Riverside
26809
1954
University of California, San Diego
UC
San Diego
San Diego
42006
1960
University of California, Santa Barbara
UC
Santa Barbara
Santa Barbara
26420
1891
University of California, Santa Cruz
UC
Santa Cruz
Santa Cruz
19478
1965
How do we get a table with all the schools in Los Angeles ?
schools.where("City" , "Los Angeles" )
University of California, Los Angeles
UC
Los Angeles
Los Angeles
46430
1882
California State University, Los Angeles
CSU
Los Angeles
Los Angeles
26460
1947
Exercise 4: Rename Columns
How do we update schools
such that the column Name is renamed University? Hint : Check out the method relabeled
.
schools = schools.relabeled("Name" , "University" )
schools.show(3 )
California State Polytechnic University, Humboldt
CSU
Arcata
Humboldt
6025
1913
California State University, Bakersfield
CSU
Bakersfield
Kern
9613
1965
University of California, Berkeley
UC
Berkeley
Alameda
45307
1869
... (29 rows omitted)
There are many ways to approach a problem. Suppose you didn’t know the method relabeled
existed:
schools = schools.with_column("University" , schools.column("Name" )).drop("Name" )
schools.show(3 )
CSU
Arcata
Humboldt
6025
1913
California State Polytechnic University, Humboldt
CSU
Bakersfield
Kern
9613
1965
California State University, Bakersfield
UC
Berkeley
Alameda
45307
1869
University of California, Berkeley
... (29 rows omitted)
Questions for you as you read the above code:
What methods are being chained together here? Which is executed first, with_column
or drop
?
When is the column
method being called?
What would happen if you switched the two methods? (try it out)
Why is University now the last column?
Exercise 5: Debugging show
The following cells exhibit a tricky, but potentially common bug. First, check out the Python Reference to understand what show
does.
schools = Table.read_table('data/cal_unis.csv' )
schools = schools.show(3 )
California State Polytechnic University, Humboldt
CSU
Arcata
Humboldt
6025
1913
California State University, Bakersfield
CSU
Bakersfield
Kern
9613
1965
University of California, Berkeley
UC
Berkeley
Alameda
45307
1869
... (29 rows omitted)
schools = schools.show(4 )
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[12] , line 1
----> 1 schools = schools . show (4 )
AttributeError : 'NoneType' object has no attribute 'show'
Like print
, show is for display purposes. It does not return anything—i.e., it returns None. The second cell above therefore assigns schools to None!
Despite this, show
—just like print
—is useful when you want to display intermediate output for debugging purposes.
schools = Table.read_table('data/cal_unis.csv' )
schools.show(3 ) # just the first three rows
schools = schools.relabeled("Name" , "University" )
schools.show(3 ) # and again
schools = schools.with_columns("Name" , "City" )
schools # the last evaluated value in a cell
# is displayed by default
California State Polytechnic University, Humboldt
CSU
Arcata
Humboldt
6025
1913
California State University, Bakersfield
CSU
Bakersfield
Kern
9613
1965
University of California, Berkeley
UC
Berkeley
Alameda
45307
1869
... (29 rows omitted)
California State Polytechnic University, Humboldt
CSU
Arcata
Humboldt
6025
1913
California State University, Bakersfield
CSU
Bakersfield
Kern
9613
1965
University of California, Berkeley
UC
Berkeley
Alameda
45307
1869
... (29 rows omitted)
California State Polytechnic University, Humboldt
CSU
Arcata
Humboldt
6025
1913
City
California State University, Bakersfield
CSU
Bakersfield
Kern
9613
1965
City
University of California, Berkeley
UC
Berkeley
Alameda
45307
1869
City
California State University Channel Islands
CSU
Camarillo
Ventura
6128
2002
City
California State University, Dominguez Hills
CSU
Carson
Los Angeles
16426
1960
City
California State University, Chico
CSU
Chico
Butte
14183
1887
City
University of California, Davis
UC
Davis
Yolo
39679
1905
City
California State University, Fresno
CSU
Fresno
Fresno
23999
1911
City
California State University, Fullerton
CSU
Fullerton
Orange
40386
1957
City
California State University, East Bay
CSU
Hayward
Alameda
13673
1957
City
... (22 rows omitted)