Data 6 Python Cheatsheet

This cheat sheet has been modified from the Data 6 Python Reference and includes all of the functions and table methods that you will need for the exams.

Built-In Python Functions

Function	Description	Input	Output
`str(val)`	Converts `val` to a string	A value of any type (int, float, NoneType, etc.)	The value as a string
`int(num)`	Converts `num` to an int	A numerical value (represented as a string or float)	The value as an int
`float(num)`	Converts `num` to a float	A numerical value (represented as a string or int)	The value as a float
`len(arr)`	Returns the length of `arr`	array or list	int: the length of the array or list
`max(arr)`	Returns the maximum value in `arr`	array or list	The maximum value the array (usually an int)
`min(arr)`	Returns the minimum value in `arr`	array or list	The minimum value the array (usually an int)
`sum(arr)`	Returns the sum of the values in `arr`	array or list	int or float: the sum of the values in the array
`abs(num)`	Returns the absolute value of `num`	int or float	int or float
`print(input, ...)`	Prints the `input`. Multiple inputs can be passed, and they will be separated by spaces by default.	input: any inputs to print	None
`type(object)`	Returns the type of `object`.	object: the object whose type is to be determined	type: the type of the object

NumPy Array Functions

Function	Description	Input	Output
`make_array(val1, val2, ...)`	Makes a NumPy array with the inputted values	A sequence of values	An array with those values
`np.mean(arr)` or `np.average(arr)`	Calculates the average value of `arr`	An array of numbers	float: The average of the array
`np.sum(arr)`	Returns the sum of the values in `arr`	array	int or float: the sum of the values in the array
`np.prod(arr)`	Returns the product of the values in `arr`	array	int or float: the product of the values in the array
`np.sqrt(num)`	Calculates the square root of `num`	int or float	float : the square root of the number
`np.arange(stop)`, `np.arange(start, stop)`, or `np.arange(start, stop, step)`	Creates an array of sequential numbers starting at `start`, going up in increments of `step`, and going up to but excluding `stop`. Default `start` is 0, default `step` is 1	int or float	array
`np.count_nonzero(arr)`	Returns the number of non-zero (or `True`) elements in an array	An array of values	int: the number of non-zero values in `arr`
`np.append(arr, item)`	Appends `item` to the end of `arr`. Does not modify the original array.	1. array to append to 2. item to append (any type)	array: a new array with the appended item
`np.cumsum(arr)`	Returns the cumulative sum of the elements in `arr`, where each element is the sum of all preceding elements including itself	array	array: the cumulative sum of the values in the array
`np.diff(arr)`	Computes the difference between consecutive elements in `arr`.	array	array: the differences between consecutive elements in the array containing `len(arr) - 1` elements
`np.sort(arr)`	Sorts an array in ascending order.	array	`None` (`arr` is sorted; no copy is made)

String Methods

Function	Description	Input	Output
`str.split(separator, maxsplit)`	Splits `str` into a list of substrings using the specified `separator`. If `separator` is not provided, splits at any whitespace. You can also use the optional argument `maxsplit` to limit the number of splits.	1. (Optional) separator: the delimiter used to split `str` 2. (Optional) maxsplit: maximum number of splits	list of substrings
`str.join(iterable)`	Concatenates the elements in `iterable` (usually a list or array) into a single string, with each element separated by `str`.	iterable: an iterable of strings to join (can be an array or list of strings)	string: a single string formed by joining the elements of `iterable` with the separator `str`
`str.replace(old, new)`	Returns a copy of the string with all occurrences of the substring `old` replaced by `new`.	`old`: the substring to be replaced. `new`: the substring to replace `old` with.	string: a new string where occurrences of `old` have been replaced by `new`.

Tables and Table Methods

Function	Description	Input	Output
`Table()`	Creates an empty table, usually to extend with data	None	An empty Table
`Table().read_table(filename)`	Create a table from a data file	string: the name of the file
`tbl.with_column(name, values)` or `tbl.with_columns(n1, v1, n2, v2, ...)`	Adds an extra column onto `tbl` with the label `name` and `values` as the column values	1. string: name of the new column 2. array: values in the column	Table: a copy of the original table with the new column(s)
`tbl.column(col)`	Returns the values in a column in `tbl`	string or int: the column name or index	array: the values in that column
`tbl.num_rows`	Compute the number of rows in `tbl`	None	int: the number of rows in the table
`tbl.num_columns`	Compute the number of columns in `tbl`	None	int: the number of columns in the table
`tbl.labels`	Returns the labels in `tbl`	None	array: the names of each column as strings
`tbl.select(col1, col2, ...)`	Creates a copy of `tbl` only with the selected columns	string or int: the column name(s) or index(es) to be included in the table	Table with the selected columns
`tbl.drop(col1, col2, ...)`	Creates a copy of `tbl` without the selected columns	string or int: the column name(s) or index(es) to be dropped from the table	Table without the selected columns
`tbl.relabeled(old_label, new_label)`	Creates a new table, changing the column name specified by `old_label` to `new_label`, and leaves the original table unchanged.	1. string: the old column name 2. string the new column name	Table: a copy of the original table with the changed column name
`tbl.show(n)`	Displays the first `n` rows of `tbl`. If no argument is specified, the function defaults to showing the entire table	(Optional) int: number of rows to be displayed	None (table is displayed)
`tbl.sort(column_name)`	Sorts the rows of `tbl` by the values in the `column_name` column. Defaults to ascending order unless the optional argument `descending=True` is included.	1. string or int: name or index of the column to sort 2. (Optional) `descending=True`	Table: a copy of the original table with the column sorted
`tbl.where(column, predicate)`	Creates a copy of `tbl` containing only the rows where the value of `column` matches the `predicate`. See `Table.where` predicates below.	1. string or int: column name or index 2. `are.(...)` predicate or exact-match value.	Table: a copy of the original table with only the rows that match the predicate
`tbl.take(row_indices)`	Creates a table with only the rows at the given indices. `row_indices` is either an array of indices or an integer corresponding to one index.	int or array: indices of rows to be included in the table	Table: a copy of the original table with only the rows at the given indices
`tbl.apply(function)` or `tbl.apply(function, col1, col2, ...)`	Returns an array of values resulting from applying a function to each item in a column.	1. Function: function to apply to column 2. (Optional) string or int: the column name(s) or index(es) to apply the function to	array containing an element for each value in the original column after applying the function to it
`tbl.group(column_or_columns, function)`	Groups rows in `tbl` by unique values or combinations of values in a column(s). Multiple columns must be entered as an array of strings. Values in the other columns are aggregated by count (by default) or the optional argument `function`. You can visualize the `group` function here.	1. string or array of strings: column(s) on which to group 2. (Optional) Function: function to aggregate values in cells (defaults to counting rows)	Table a new groupped table
`tbl.pivot(col1, col2)` or `tbl.pivot(col1, col2, values, collect)`	Creates a pivot table where each unique value in `col1` has its own column and each unique value in `col2` has its own row. Counts or aggregates values from a third column, collected with some function. If the `values` and `collect` arguments are not included, `pivot` defaults to returning counts in the cells. You can visualize the `pivot` function here.	1. string: name of the column in `tbl` whose unique values will make up the columns of the pivot table 2. string: name of column in `tbl` whose unique values will make up the rows of the pivot table 3. (Optional) string: name of the column in `tbl` that describes the values of cells in the pivot table 4. (Optional) Function: how the values are collected (e.g. `sum` or `np.mean`)	Table: a new pivot table
`tblA.join(colA, tblB)` or `tblA.join(colA, tblB, colB)`	Generate a table with the columns of `tblA` and `tblB`, containing rows for all values in `colA` and `colB` that appear in `tblA` and `tblB`, respectively. By default, `colB` is the same value as `colA`. `colA` and `colB` must be strings specifying column names.	1. string: name of column in `tblA` with values to join on 2. Table: the other table 3. (Optional) string: the name of the shared column in `tblB`, if column names are different between the tables	Table: a new combined table
`tbl.with_row(values)`	Adds a new row with the specified `values` to `tbl`	1. list or array: values to add as a new row	Table: a copy of the original table with the new row
`tbl.with_rows(list_of_rows)`	Adds multiple rows to `tbl` using a list of rows	1. list of lists or arrays: each list/array represents a new row	Table: a copy of the original table with the new rows

Visualization Functions

Function	Description	Input	Output
`tbl.barh(categories)` or `tbl.barh(categories, values)`	Displays a horizontal bar chart with bars for each category in the column `categories`. `values` specifies the column corresponding to the size of each bar, but is unnecessary if the table only has two columns. Optional argument `overlay` (default is `True`) specifies whether grouped bar charts should be overlaid or on separate plots.	1. string: name of the column with categories 2. (Optional) string: name of the column with values corresponding to the categories	None: draws a bar chart
`tbl.hist(column)`	Generates a histogram of the numerical values in `column`. Optional arguments `group` (to specify categorical column to group on), `bins` (to specify custom bins), and `overlay` to specify overlaid or separate histograms.	string: name of the column	None: draws a histogram
`tbl.plot(x_column, y_column)` or `tbl.plot(x_column)`	Draws a line plot consisting of one point for each row in `tbl`. If only `x_column` is specified, `plot` will plot the rest of the columns on the y-axis with different colored lines. Optional argument `overlay` (default is `True`) specifies whether multiple lines should be overlaid or on separate plots.	1. string: name of the column on the x-axis 2. string: name of the column on the y-axis	None: draws a line graph
`tbl.scatter(x_column, y_column)`	Draws a scatter plot consisting of one point for each row in `tbl`. The optional argument `fit_line=True` can be included to draw a line of best fit through the scatter plot. The optional arguments `group` (to specify categorical column to group on) and `sizes` (to specify a numerical column for bubble sizes) can also be used to encode additional variables.	1. string: name of the column on the x-axis 2. string: name of the column on the y-axis 3. (Optional) `fit_line=True`	None: draws a scatter plot

Table.where Predicates

These functions can be passed in as the second argument to tbl.where(..) and act as a condition by which to select rows from tbl.

Predicate	Description
`are.equal_to(Z)`	Equal to `Z` (can be an int, float or string)
`are.not_equal_to(Z)`	Not equal to ‘Z’ can be a number (int or float) or a string)
`are.above(x)`	Greater than `x`
`are.above_or_equal_to(x)`	Greater than or equal to `x`
`are.below(x)`	Less than `x`
`are.below_or_equal_to(x)`	Less than or equal to `x`
`are.between(x,y)`	Greater than or equal to `x` and less than `y`
`are.between_or_equal_to(x,y)`	Greater than or equal to `x`, and less than or equal to `y`
`are.strictly_between(x,y)`	Greater than `x` and less than `y`
`are.contained_in(A)`	True if it is a substring of `A (if` A `is a string) or an element of` A `(if` A` is an array)
`are.containing(S)`	Contains the string `S`