# Lecture 5 – Functions and Control¶

## Motivation¶

We've seen a few in-built Python functions so far.

We don't currently have a good way to prevent our code from getting repetitive. For example, if we want to determine whether or not different students are ready to graduate:

## Functions¶

Here's a better solution:

By using a function, we only had to write out the logic once, and could easily call it any number of times.

Other function examples:

## Parameters and return values¶

### Returning¶

Nothing after the return keyword is run.

## Demo¶

Let's load in the same Wikipedia countries data from this week's earlier lectures. But this time, we will write some of the data cleaning functions ourself.

Let's look at the 'Population' column.

We want these numbers to be integers, so that we can do arithmetic with them or plot them. However, right now they are not.

Let's write a function that takes in a string with that format, and returns the corresponding integer. But first, proof that the int function doesn't work here (it doesn't like the commas):

Cool!

Using techniques we haven't yet learned, we can apply this function to every element of the 'Population' column, so that when we visualize it, things work.

The '%' column is also a little fishy.

Percentages should be floats, but here they're strings.

Let's suppose we want to have the proportion of the total global population that lives in a given country as a column in our table. Proportions are decimals/fractions between 0 and 1. We can do this two ways:

• write a function, similar to clean_population_string, that correctly extracts the proportion we need
• calculate this by hand using all of the values in 'Population'

Let's do... both!

Nice! The other way requires adding together all of the values in the 'Population' column. We haven't covered how to do that just yet, so ignore the code for it and assume it does what it should.

Assume this is the total population of the world. How would you calculate the proportion of people living in one country?

Pretty close to clean_pct_string(china_pct). The difference is likely due to some countries not being included in one column or the other.

Hopefully this gives you a glimpse of the power of functions!