Data Science Academic Resource Kit

For UC Berkeley Instructors

The Division of Data Sciences's Academic Resource Kit (ARK) for Instructors seeks to give instructors the ability to create and deploy educational materials in a variety of classes.  The Division supports instructors in learning and incorporating data science approaches and teaching tools. We offer periodic workshops,   summer trainings, and resources for course development. 

Program Overview

Building from the freshman level, the Division of Data Sciences supports an open, interdisciplinary curriculum that stretches across campus and provides a foundation for Berkeley undergraduates of all majors to engage capably and critically with data. The Foundations of Data Science class (Data 8) is currently serving over 4,000 students per year across over 70 majors and fulfills statistics requirements in 90 percent of the majors that have one. More than 30 entry-level “connector” courses have been created by instructors in disciplines from humanities to engineering.  Over 40 classes have deployed Data Science Modules adapted to enrich existingn classes.  Instructors in many departments are now developing advanced courses in order to serve their existing majors, and to integrate with the major offered through the Division of Data Sciences.

Workshops & Resources

The Division offers a variety of opportunities to support instructors in learning and incorporating data science approaches, including workshops and programs to help develop course resources.

More information about workshops offered: https://data.berkeley.edu/news/data-science-education-opportunities

Continuing Opportunities

Data Science Modules - Request Development Support

Data science modules are short explorations that give students the opportunity to work hands-on with a dataset relevant to your course and receive instruction on the principles of data analysis, statistics, and computing. Modules vary widely and are customized based on each instructor’s objectives and the type of course, ranging in length from one to two lectures to multiple-session workshops culminating in a data-centered project. The Data Science Education Program provides assistance to instructors interested in adding data science modules to their existing classes, through student developer teams that partner closely with you to develop materials.  A recent article and video explain how modules are currently being deployed.

For more information visit: https://data.berkeley.edu/education/courses/modules

Module Support Request Form: https://goo.gl/zRxVm7(link is external)

Proposals for Data Science Connector Courses (Fall or Spring)

Students in Data 8 learn computational and statistical concepts from a variety of examples spanning a broad range of disciplines. Connector courses (2, 3, or 4 units, often numbered 88) are designed to build on students’ analytical knowledge from Data 8 in connection to their own interests in a specific field. Connectors can be housed in a department, cross-listed in multiple programs, or piloted as a L&S course. The Data Science Education Program seeks to broaden the suite of connector offerings available to students and increase the potential to integrate data science into existing curricula. DSEP is able to provide a range of technical, pedagogical, financial, and community-building support to you in piloting a course, as well as computer lab space and lab assistants.

For more information visit: https://data.berkeley.edu/education/connectors

Connector interest form: (link is external)https://goo.gl/s9EkN4(link is external)

Data 8: Foundations of Data Science

Data 8 (data8.org)(link is external) is the flagship introductory Data Science class at UC Berkeley and is currently fully enrolled at more than 1,200 students every semester.  The vision for the course is to combine teaching computational thinking and statistics, to modernize statistical instruction to center on statistical inference, and to be accessible to a broad range of students, including by eliminating requirements such as calculus, linear algebra, and introductory programming.  Both programming and statistics are taught through Jupyter notebooks running Python.

Data 8x: An Online MOOC Implementation of Data 8 offered through EDx

Outside of Berkeley, this class is available as Data 8x, a popular course online at EDx.   Several institutions have deployed Data 8 in a flipped classroom model, using the videos from EDx in combination with in-person lab help. edx.org/professional-certificate/berkeleyx-foundations-of-data-science(link is external)

Textbook

The Data 8 textbook is at inferentialthinking.com(link is external). The textbook was written by John DeNero and Ani Adhikari and is licensed under a Creative Commons License. One of the key innovations of the textbook is that it has a set of Jupyter notebooks that provide programming illustrations of key concepts, which can be accessed at a local JupyterHub or more generally at mybinder.org(link is external). You can access the Jupyter Book repository at github.com/data-8/textbook(link is external); the README includes instructions for how to host/adapt the textbook at other sites and how to change the interact links.

Course Webpage

The Data 8 homepage is at data8.org(link is external), where you can find the course sites for each iteration of Data 8 since fall 2015.  Within the sites for past semester offerings, you can find slide decks and videos for lectures and Jupyter notebooks for in-class demos, as well as labs, homeworks, and projects.  Almost all of the resources ( e.g., textbook, webpage, teaching materials) used for Data 8 are available the Github organization, github.com/data-8(link is external).

Contact us

If you have questions about the Data Science Education Program in the Division of Data Sciences, feel free to contact us:
Cathryn Carson (clcarson@berkeley.edu(link sends e-mail)), DSEP Faculty Lead

Karen Chapple (chapple@berkeley.edu(link sends e-mail)), DSEP Senior Faculty Advisor

Eric Van Dusen (ericvd@berkeley.edu(link sends e-mail)), DSEP Curriculum Coordinator 

David Culler (culler@berkeley.edu(link sends e-mail)), Interim Dean, Division of Data Sciences

For Instructors in Partner Institutions

UC Berkeley welcomes inquiries about how to design and implement a broad-based data science program. Below please find resources for further exploration into our undergraduate data science curriculum. 

Please fill out this interest form(link is external)(link is external) so that we can direct you to the resources you need. 

Guide to Adapting Data 8 to Other Institutions

The Zero To Data 8 Guide(link is external) is a guide to setting up and running Data 8 at other institutions. This guide, at berkeleydsep.gitbook.io/zero-to-data-8/(link is external),  is created by Chris Holdgraf and documents step by step how to manage aspects such as the technical infrastructure, cloud resources, and technical support.  The NSF funded Regional Data Hubs will be moving to expand support for cloud infrastructure: bigdatahubs.org/(link is external)

Tables Datascience Package for Python

Data 8 uses a course-specific python package datascience(link is external), designed for teaching tabular data manipulation and visualization in introductory data science courses. It was written by Berkeley professors John DeNero and David Culler, and students Sam Lau and Alvin Wan. Documentation for the package can be found at data8.org/datascience/(link is external).  Teaching with this package throughout the Data 8 materials allows for a pedagogically clean dataframe concept, without the added complexity of Pandas or R.  

Jupyter and JupyterHub

Project Jupyter (jupyter.org(link is external)) exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.  JupyterHub is a tool that allows Berkeley’s data science program to quickly utilize cloud computing infrastructure to deploy a scalable hub that enables users to interact remotely with a standardized, common computing environment. JupyterHubs create on-demand cloud-based Jupyter notebook servers and are the basis of the technical infrastructure for Data 8 and related classes. Compared to local environments that run Jupyter, a cloud-based JupyterHub provides many conveniences, including pre-installed software, quicker access to course content, and computing flexibility that enables even users on Chromebooks or iPads to run Jupyter notebooks. The Data 8 JupyterHub deployment is at datahub.berkeley.edu(link is external). In addition, the course textbook uses mybinder.org(link is external)for public interact links. Binder uses a JupyterHub to create temporary user sessions that are open to the public, in this case serving the Data 8 textbook and its environment. The JupyterHub team has written up a guide on deployment and maintenance of a JupyterHub, Zero To JupyterHub(link is external), at zero-to-jupyterhub.readthedocs.io(link is external)/. For a small audience /class of 0-50 people, there is a new project to run Jupyterhub on a single server called the The Littlest JupyterHub(link is external), at the-littlest-jupyterhub.readthedocs.io/(link is external).  An overview of the work at the Berkeley Institute for Data Science ( BIDS) related to Project Jupyter is available at bids.berkeley.edu/research/project-jupyter(link is external)

Open Source Textbook Publishing with Jupyter Book

There is a guide to the Jupyter-based open source textbook publishing at jupyter.org/jupyter-book/(link is external).  Content and pages in the textbook are written using Jupyter notebooks, generated with Jekyll, and hosted at Github.  

Grading

Data 8, in addition to many other large, technical classes at UC Berkeley use Ok.py(link is external) for grading and other class management purposes. Ok.py offers a rich suite of tools in addition to autograding, including office hour management, backups of student submissions, plagiarism detection, and more.  A simpler solution is Gofer Grader(link is external), a more basic python library, which was created by the JupyterHub team to solely autograde Jupyter Notebooks and Python files. The documentation is available at okgrade.readthedocs.io(link is external)/. The project is currently in development and is being used for the Data8x course.  The Data 8 course also uses the package GSExport, https://github.com/dibyaghosh/gsExport(link is external) to export a customized pdf for grading and submission to Gradescope https://www.gradescope.com/(link is external)

Data Science Modules

The infrastructure developed for Data 8 has also been used to develop teaching modules, often a set of 1-3 notebooks to deploy into an existing class.  These can range from GIS mapping to neuroscience to text analysis in a humanities course. A showcase of some of the interesting deployments is available at ds-modules.github.io/modules-textbook(link is external).  The full set of Jupyter notebooks developed for all classes are available at github.com/ds-modules(link is external). You can run any of these notebooks using Binder links in each repository’s ReadMe.

Connector Courses

Connector courses are a set of classes complementary to Data 8 that have been developed to expose students to Data Science applications with in a subject area. Connectors are often lighter-workload courses that allow students to apply theoretical concepts from Data 8 to a particular area of interest. Students take connectors concurrently with or after Data 8. A full list of connectors can be found at data.berkeley.edu/education/connectors.

Resources for Instructors - Curriculum Guide

The Data Science Education Program at Berkeley has created a Curriculum Guide at ds-modules.github.io/modules-textbook(link is external) to help guide instructors with set up, workflow, and pedagogy in teaching data sciences courses connected to Data 8 and using the same infrastructure.  Much of the content in this guide is useful for instructors teaching with Jupyter notebooks and JupyterHub deployments. Faculty across many departments have been trained to use the data science pedagogy platform in a short summer Data 8 bootcamp intended to get faculty ready to adapt data science teaching tools to their own subject area: sites.google.com/berkeley.edu/ucb-dse-workshop/home(link is external).