Syllabus 📖

Table of contents

  1. Communication 💬
  2. Logistics 📆
  3. About 🧐
  4. Course Structure 🍎
    1. Lecture
    2. Lab
    3. Discussion
    4. Homeworks
    5. Office Hours and Ed
    6. Quizzes and Exams
  5. Technology 🖥
  6. Policies ✏️
    1. Grading
    2. Late Policy and Extensions
    3. Academic Honesty
    4. A note on letter grades
  7. Acknowledgements 🙏

Note: This page is still being finalized, and the information here is tentative. Also, see here for information about enrollment.


Communication 💬

For communication, we’ll be using Ed, a new communication tool. Ed is where you will see all announcements and get help from staff and other students on assignments and concepts. You will be added to Ed automatically; email Ian (castro.ian@berkeley.edu) or Isaac (isaacmerritt@berkeley.edu) if you’re not sure how to access it.

We will not be using bCourses at all in this class; this website and Ed serve as replacements.


Logistics 📆

Lecture: Mondays & Wednesdays, 10-11AM; Tuesdays & Thursdays, 10AM-12PM

Lab: Mondays & Wednesdays, 11AM-12PM

Discussion: Fridays, 10AM-12PM

Office Hours: TBD

Lectures, labs, and office hours will be hosted on Zoom. See this post on Ed for the link. (We won’t make the Zoom link public, so that we don’t get Zoom-bombed 💣.)


About 🧐

From the course catalog: This course is an introduction to computational thinking and quantitative reasoning, designed to prepare students for further coursework in data science, computer science, and statistics (in particular, Foundations of Data Science, Data C8). This course emphasizes the use of computation to gain insight about quantitative problems with real data from the social sciences.

Data 6 uses the Python language to teach computation. It also uses the Jupyter Notebook environment, which makes it easy to get started with programming without needing to use a text editor and terminal and is very popular in data science applications.

This class serves a different purpose than several other classes that may sound similar. Specifically:

  • Data 8: Data 6 does not cover nearly as much statistics and inference as Data 8. Instead, it dives deeper into Python and its applications in data science. After taking this class, you will be well-equipped to take Data 8 and focus on the inference.
  • CS 10: While CS 10 is also an introductory computing class, it focuses less on Python and data science, and more on abstract ideas in computing. It is a fantastic alternative to Data 6.
  • CS 61A and CS 88: While these courses also teach Python, they serve a slightly different purpose - namely, they are designed to introduce students to computer science, not to computing in data science. They cover the Python language in far greater detail than we will, but they do not cover how to work with real-world data. They are also substantially more fast-paced than this course.

The rough topic breakdown is as follows:

  • Weeks 1-3: Python basics in the Jupyter notebook environment.
  • Week 4: Working with real-world tabular data using datascience (the library used in Data 8).
  • Week 5: Data visualization.
  • Week 6: Probability and simulation. Special topics, as time permits.

Slides and code will be posted after each lecture, and they will cover everything you are required to know for the course. There is no one textbook that covers the content of this course the way we intend on covering it, though we will link supplementary readings.

Also, note that the course will emphasize the use of real-world data. Some possible datasets include

  • Data from media markets in Pennsylvania and data on Congress members’ ages
  • California housing prices data
  • COVID cases
  • Bay Area bike sharing usage data
  • Vehicle fuel efficiency data
  • Sports data

You will leave the course being able to independently apply the skills you’ve learned to datasets of your own choosing.


Course Structure 🍎

Lecture

There will be four lectures a week. In lecture, we’ll introduce you to new ideas and concepts in programming and data science. Lectures will be recorded and posted after class for you to review in the future. All lecture resources (slides, code, supplemental readings) will be linked on the course website. We will begin on Berkeley Time, and attendance is mandatory.

During each lecture, there will be a few points at which we stop and ask you to answer a short question. We call these questions Quick Checks. They serve two purposes:

  1. For us to get a gauge of how well the class understands the material we’re currently covering
  2. For you to get a gauge of how well you understand the material we’re currently covering

Quick Checks are hosted on Ed using its “Lessons” feature, and will also be linked on the course website under each lecture. Quick Checks are graded on completion, not correctness. It’s not important to get these questions right on your first try – but it’s important to try them. You will be given time in lecture to answer them. If you have to miss a lecture for whatever reason, just answer that lecture’s Quick Check whenever you catch up on lecture.

Additionally, in some lecture notebooks, we will post optional practice problems. These are not required, but we recommend that you complete them.

Lab

There are 2 lab sections a week that follow immediately after the Monday and Wednesday lectures. In lab, we’ll spend the first ~15 minutes going over some demos that are relevant to that week’s material. While there may be a notebook accompanying this demo (that we will post on the course website), there is no lab assignment. You’ll spend the remaining ~35 minutes working on practice problems pertinent to that week’s homework with the help of your peers and course staff. The hope is that by participating and collaborating, you will be able to better understand the concepts and finish your homework quicker.

Discussion

Each Friday, there will be a discussion section. In these sections, we will discuss ethical and social issues in computing and data, such as privacy and algorithmic bias. To prepare for these discussions, you will need to complete some short prep work assigned each Monday, which usually consist of a few readings. Other activities may include guest speakers and content review. Participation is a part of your grade. Following our group discussions, you will be given time to work on homework with your peers and ask questions from staff members. Sometimes, these topics may be difficult to discuss. We all come from different backgrounds and experiences, which shape our views. That being said, our classroom is a judgment-free space; we ask that everyone keep an open mind, be aware of the space you take, and make space for your peers.

Homeworks

You learn data science by doing data science, not by listening or reading about it. As such, homework assignments will be your primary source of learning in this class.

Homeworks primarily consist of programming problems. You will apply the skills you learned in recent lectures to accomplish tasks involving real data. Autograder tests in your notebook will tell you if you’re on the right track or not. Most homeworks will also include a few “written” problems, where you have to type your answer in text. These problems will be manually graded by a human.

Homeworks, like all course material, can be accessed by clicking the correct link on the course website. Clicking on the “Homework 3” link, for example, will bring you to a copy of the Homework 3 notebook in your own DataHub. This is where you will work on the assignment. Once you’re done, you will run the very last cell in the assignment to generate a .zip file, which you will then upload to Gradescope so that we can grade it. This process will be walked through in lecture and in the first assignment.

There will be 5 homework assignments, which corresponds to roughly one per week. In general, homework assignments will be released on Wednesday evening, and will be due the following Monday at 11 PM. See the Policies section for our extensions and late submissions policy, as well as our homework drop policy.

Homework assignments are meant to be completed individually, but we encourage you to discuss approaches with others; see our Academic Honesty policy below.

Office Hours and Ed

In addition to lecture and lab, we will host office hours each week. In office hours, you’ll get a chance to ask questions about and (hopefully) work with your peers on assignments. You’ll also be able to ask conceptual questions about lecture material.

While office hours are not mandatory, we highly recommend attending them regularly as they’ll very likely cut down on the time you’ll need to spend on homeworks.

Furthermore, you’re encouraged to ask and answer questions about assignments and concepts on Ed.

Quizzes and Exams

In lieu of a midterm, we will have two small quizzes, each worth 10% of your grade. Each quiz will focus on the material that was not assessed on the previous quiz. The scheduling for these is on the course homepage; the tentative dates are:

  • Quiz 1: Tuesday, July 20
  • Quiz 2: Monday, August 2

We will have a final exam during the campus-assigned slot: Friday August 13th, 10AM-12PM. Unlike the quizzes, the final exam will be cumulative.

More relevant logistics for quizzes and exams will be announced on Ed.


Technology 🖥

We will be using several websites this semester. Here’s what they’re all used for:

  • Course Website: where all content will be posted.
  • Ed: discussion forum where all announcements will be sent, and where all student-staff and student-student communication will occur. Also where Quick Checks are hosted and submitted.
  • DataHub: where all assignments will be hosted. (You will not usually have to navigate here manually; assignment links on the course homepage bring you to the right place automatically.)
  • Gradescope: where all homeworks are submitted and all grades live. (Not bCourses! 🙅)

Policies ✏️

Grading

Here’s how we will compute your grade.

Component Weight Notes
Participation 10%  
Quick Checks 5% no drops
Weekly Surveys 5% no drops
Homeworks 40% 5, with 1 drop (10% each)
Quizzes 20% 2, 10% each
Final Exam 20%  

Participation

In labs and discussion, you are expected to participate as part of your grade. Participation takes a variety of forms, including asking questions, working with peers on problems, volunteering answers, and completing practice problems.

Even though you will not be assessed on the readings for discussion, you are also expected to complete them to be able to participate. Due to the virtual format, we will be relatively lenient for this portion of the grade. However, learning happens best when you actively participate, so please speak up in class!

Weekly Surveys

Given the state of our universe right now, we want to check in with you each week to hear how you’re doing, both academically and personally. Furthermore, since this is a new class, we’re very interested in receiving your feedback as to how it’s going and how we can improve.

As such, we will have feedback surveys for you to fill out roughly each week. These will coincide with homework assignments, e.g. Survey 2 and Homework 2 will come out and be due at the same time. These will be hosted on Google Forms, and will be posted on both the course homepage and on Ed. They will generally not be anonymous, so that we can reach out to you if we feel the need to based on your responses. However, there will be a few points in the semester for you to provide us with anonymous feedback about the course.

There are no drops for these (so you need to do them all for full credit), but we will be lenient with their deadlines.

Homeworks

There will be 5 homework assignments. We will drop your lowest homework assignment score, meaning your top 4 homework assignments will be graded. This means each homework is worth 10% of your overall grade in this class.

There is a bonus point available for students who submit their assignments early. If you submit your homework 24 hours before the posted deadline, usually Sunday 11 PM, you will receive 1 extra credit point on your homework grade.

Late Policy and Extensions

Homework assignments are due to Gradescope at 11PM on the day that they are due, which will typically be Monday. We will have a small, undisclosed grace period to account for any technical difficulties; if you face any issues while submitting, please post on Ed ASAP (ideally before the deadline).

If you submit your homework late, and do not have an extension (see below), we will still accept your submission but you will lose 50% of the credit you earned per day late, at a maximum of one day late. So if you scored 90% on a homework and submitted it a day late, your score would drop to a 45%. We will not accept homeworks past one day after the submission deadline.

Extensions: We know this is a stressful time, and we don’t want to penalize you because of circumstances that are out of your control. To request an extension on a homework, please email both Ian and Isaac with the reason for your request and number of days you’re requesting an extension for (1 or 2). As long as your request is within reason, there’s a good chance of it being granted. Students with DSP accommodations that allow for late assignment submissions will still need to email us for extensions, but not with a reason.

Academic Honesty

This class does not satisfy any requirements for any program (other than that it counts towards the 120 unit minimum needed to graduate). As such, you’re not taking it to get a good grade – you’re taking it to learn!

Data science is a collaborative activity. As such, we encourage you to discuss homework assignments at a high level with others, and we even give you class time to do this in lab. With that said, we ask that you write your solutions individually in your own words. Rather than copying someone else’s work, ask for help. You are not alone in this course! We’re here to help you succeed. If you invest the time to learn the material and complete the assignments, you won’t need to copy any answers (taken from 61A). If you use code you found online, please cite it in a comment.

A note on letter grades

The following is adapted from CSE 160 at the University of Washington.

Grading for this class is not curved in the sense that the average is set at (say) a B+ and half of the class must receive a grade lower than that. If everyone does well and shows mastery of the material, everyone can receive an A (this would be awesome!). If no one does well (this is unlikely), then everyone can receive a C.

Grading for this class is curved in the sense that we do not have a pre-defined mapping from homework and exam scores to a final GPA. There is no pre-determined score (e.g., 90% of all possible points) that earns an A or a B or a C or any other grade. To determine the final grade, we will ask questions like “Did this student master the material?”.

Try your best not to worry about them, and we’ll reciprocate by being fair and lenient. We’re in this together 😎.


Acknowledgements 🙏

This class is based on Data 94, taught by Suraj Rampure in Spring 2021 at UC Berkeley. That class was loosely based on Data C6, taught by Ian Castro in Summer 2020 at UC Berkeley, which in turn was based on Data 8R, taught by Henry Milner in Summer 2017, also at UC Berkeley. These classes were based on Data 8 at UC Berkeley.

When creating Data 6, we’ve referred to the materials of several other courses:

The website uses Just the Class.