Research, Pedagogy, and Discovery Group Project
- Overview
- [Project 1] Reading Data Science and Social Science Literature
- [Project 2] Exploratory Data Analysis
- [Project 3] Exploratory Data Analysis, Python
- [Presentation] Final Presentation
Overview
This summer seminar provide scholars an opportunity to explore independent projects early in their data scientist career. Scholars will build pedagogically and socially motivated curricula for a new introductory computing course centered around data and science.
Unlike a traditional open-ended research project, students will understand, explore, and reproduce existing contexts and findings of particular datasets; through reproducibility, students will build research skills and bridge interdisciplinary fields of study.
Components
There are several components to this seminar:
Scholars Lecture series on how to read semi-technical data science articles, consider ethical and social implications when studying a dataset, and do exploratory data analysis.
Group Lab/group work series where students build small pieces of curricula, including how to write a discussion question and write a lab question.
Project Project series-based exploration of parts 1 and 2 in teams of 3-4 Tuskegee students + one Berkeley Data 6/Data 8 UGSI. Each group will focus on one social context.
Scholars Lecture series Thriving in STEM (run by SEED Scholars Summer Rising Program).
Learning Goals and Workload
After this seminar, scholars will have exposure to the following:
- How to read data science articles
- How to consider ethical and social implications when studying a dataset
- How to do Exploratory Data Analysis
- How to write a discussion question
- How to write a lab question
We expect there to be little commitment outside of the scheduled 4 group work hours + 4-6 scholars-only hours.
[Project 1] Reading Data Science and Social Science Literature
Due: Friday July 8, end of seminar (tentatively)
Expected Work Time: (Updated 7/1) During seminar on Tuesday and Friday.
- Tuesday 6/28: Get through half of topic readings and pedagogy readings.
- Friday 7/1: Build Google Slides (up through lesson plan) and share on Slack.
- Tuesday 7/5: Work through lesson plan and finish slides.
- Friday 7/8: Deliver presentations and submit Google Slides.
Topic Assignments: Group Assignments
Required Readings
Topic-based: https://tinyurl.com/rpd-project-list
Pedagogy-based:
- Tools for Teaching, (p. 97-p.101, PDF 115-119). Read up to and including “Starting a Discussion.”
- How to Write Learning Outcomes https://teaching.berkeley.edu/resources/design/course-level-learning-goalsoutcomes
- (Optional) Deborah Nolan and Sara Stoudt. Communicating with Data: The Art of Writing for Data Science. 2001. (log in with UC Berkeley library proxy)
Deliverable: Google Slides and Presentation
Your Google Slide deck should have at least the following (max 15 slides):
- Slide: History of the topic
- Slide: Current public opinion
- Slide: Research example
- Slide: Bio of a prominent researcher in the field at the intersection of society and data science
- Slide: Lesson plan for discussion activity
Presentation: Updated Fri 7/1: You will present these slides to the other groups on Friday, July 8. The presentation should be 10 minutes, and every group member should present. These slides will not be presented anywhere! The slides format is just to organize your thoughts. Feel free to add slides as necessary.
Lesson Plan for Discussion Activity
Outline a 15-minute introductory discussion that encourages students to discuss what they learned from the above readings. In particular, write a lesson plan for an instructor that includes the main takeaway point for students from this exercise. Discussion will be 50 minutes total, so this is the opening 1/4th of class. There should be several parts of this lesson:
- Reading list for students
- Lesson plan (see below):
- How will you introduce the discussion section?
- How will you structure the discussion of the question(s)? In groups, pairs, as a class? Will you have time to review as a class?
- Share the main takeaway you will repeat for the students at the end of activity.
To answer the above, you should present a slide addressing the following table:
Category | Description |
---|---|
Student Required Readings | Pick at most 3 readings that students should read prior to discussion that will illuminate a particular dataset in the context of society. This may be a subset of the ones we provided, or you can pick your own from reputable sources. The readings should describe the topic itself, the current public opinion, any historical context, and a recent research study using the dataset. Readings should take about 1 hour, maximum. |
Introduction (5 minutes) | How will you introduce the discussion section? |
Recall Activity (10 minutes) | Pose a question you to get students positioned and warmed up to discuss. It often involves students “recalling” what was in their required readings. |
Activity structure | How will you structure the discussion of this question(s)? In small groups, pairs, as a class? Will you have time to review as a class before or after the activity? |
Learning Goal/Outcome | Write a sentence for the instructor-facing lesson plan, e.g., “By the end of this activity, students will be able to…” (see Action Verbs) |
Main Takeaway (2 minutes) | Share the main student-facing takeaway that you will repeat for the students at the end of activity. |
Submission
Send the Google Slides through Slack. Also deliver a presentation to the seminar as above.
[Project 2] Exploratory Data Analysis
Due: Friday July 15, end of seminar (tentatively)
Expected Work Time:
- Friday 7/8: Look through data and get through half of questions.
- Tuesday 7/12: Explore rest of questions, begin looking at own questions (at least two). Start compiling Google Slides with findings by Friday.
- Friday 7/15: Get slides checked off.
Questions: link Data: Google Drive link
[Project 3] Exploratory Data Analysis, Python
Due: Friday July 29, middle of seminar (tentatively)
Expected Work Time:
- Tuesday 7/19: Guest speaker; finish Colab setup; get slides checked off
- Friday 7/22: Reproduce (see Note) 2 figures from Google Sheets. UC Berkeley UGSIs switch to teaching role so that Tuskegee Scholars do bulk of programming
- Tuesday 7/26: Guest speaker; reproduce all of figures from Google Sheets, and start exploring 2 new figures or tables in Python. UC Berkeley UGSIs continue to teach.
- Friday 7/29: Get slides and code checked off (Deb and Lisa to sit down and review code with each Tuskegee Scholar)
Note: The datascience
library has different plotting styles from Google Sheets. When “reproducing” figure/plot, we expect that you will take considerable time getting the right tables and columns for plotting, then choosing the right arguments for datascience
library functions. Here are the function reference sheets for Data 8 and Data 6.
It is less important to reproduce the formatting of the plot– in fact doing so requires advanced plotting knowledge beyond the scope of Data 6/Data 8.
[Presentation] Final Presentation
Build and edit your slides from Project 1 to include EDA findings and social context discussion questions. The final presentation should be a standalone slide deck that can be shared with future discussion instructors.
Expected Work Time:
- Friday 7/29: Get slides and code checked off (Deb and Lisa to sit down and review code with each Tuskegee Scholar)
- Tuesday 8/2: Start dicsussion question work (refining and expanding discussion to fill 50 minutes)
- Friday 8/5: Make final presentation slides
- Monday 8/8: Extra presentation time (Scholars only)
- Tuesday 8/9: Final Presentations. 3-4:30pm (are UC Berkeley GSIs available?)
Final Presentation: 15 minutes per group. Your presentations should be no longer than 15 slides, plus extra reference slides as needed.
- Introduction
- Discussion activity
- Required readings for students. It would be useful to give a one-sentence verbal explanation about the purpose of each of the readings in building student knowledge (e.g., original research study, explains dataset, is an opinion article, etc.).
- Outline of a 50-minute discussion (Projects 1 and below social context questions)
- Instructor-facing Lesson Plans
- EDA
- Python figures only
- Note which functions or methods used from datascience library or from Python
- Thoughts and reflections
- How was your experience exploring this dataset and context this summer?
- What did you like, and what did you learn?
- Reference slides (not covered, but included in the presentation)
- Project 1 Slide: Research example
- Project 1 Slide: Bio of a prominent researcher in the field at the intersection of society and data science
- Required readings for students and instructors (e.g., anything you read that you think an instructor would find useful, but may be too in-depth for a student)
Lesson Plan for Social Context Discussion Activities
Construct at least two 15-minute activities where students engage critically with the social context of the dataset and the institutions/collectors of the dataset. Your activities should address the following:
- How were the data collectors/researchers connected to the population of interest?
- How is the dataset shaped by the data collectors’ backgrounds and interests? For example, how variables are defined/measured, what concepts are not measured, how data are organized/aggregated for public use, what visualizations are shared, etc.
- How may the context of the study have impacted the sample collection, reports, and policy decisions?
Notes:
- Based on how you organize your discussion, it’s possible to create one big activity as well. You should adjust the lesson plan table below (as well as its timings) to reflect any changes you make.
- If you learned a social science/ethical concept from a guest speaker (e.g., normative ethical approaches) and would like to introduce it as part of discussion, please make sure to include a slide describing that part of the lesson as well.
To answer the above, you should present a slide (or multiple) addressing the following table, which also includes the recall activity from Project 1:
Category | Description | |
---|---|---|
Student Required Readings | Pick at most 3 readings that students should read prior to discussion that will illuminate a particular dataset in the context of society. This may be a subset of the ones we provided, or you can pick your own from reputable sources. The readings should describe the topic itself, the current public opinion, any historical context, and a recent research study using the dataset. Readings should take about 1 hour, maximum. | |
Introduction (5 minutes) | How will you introduce the discussion section? | |
1 | Recall Activity (10 minutes) | Pose a question you to get students positioned and warmed up to discuss. It often involves students “recalling” what was in their required readings. |
1 | Activity structure | How will you structure the discussion of this question(s)? In small groups, pairs, as a class? Will you have time to review as a class before or after the activity? |
1 | Learning Goal/Outcome | Write a sentence for the instructor-facing lesson plan, e.g., “By the end of this activity, students will be able to…” (see Action Verbs) |
1 | Main Takeaway (2 minutes) | Share the main student-facing takeaway that you will repeat for the students at the end of activity. |
2 | Activity 2 (N minutes) | Pose a question/activity where students engage critically with the social context of the dataset and/or the institutions/collectors of the dataset. You should adjust N (the number of minutes) as needed. |
2 | Activity structure | |
2 | Learning Goal/Outcome | |
2 | Main Takeaway (2 minutes) | |
3 | Activity 3 (N minutes) | Pose a question/activity where students engage critically with the social context of the dataset and/or the institutions/collectors of the dataset. You should adjust N (the number of minutes) as needed. |
3 | Activity structure | |
3 | Learning Goal/Outcome | |
3 | Main Takeaway (2 minutes) |
Note on timing: You should aim for a 40-50-minute discussion lesson plan.