National Data Science Education Workshop Shares Best Practices on Teaching the Next Generation of Data Scientists

July 7, 2020

The demand for data scientists continues to surge with advances in technology and medicine requiring the skills to analyze, store, and protect data. Data scientists topped LinkedIn’s “Emerging Jobs Report” over the last three years, reporting a 37% annual growth in the 2020 report. The demand is even exceeding the pool of trained data scientists, according to a 2019 Dataconomy report. 

UC Berkeley pioneered an innovative undergraduate “Foundations of Data Science” (also known as Data 8) curriculum that takes an integrated approach to introductory computer science and statistics, allowing students to use data-driven methods to think critically about the world, draw conclusions from data, and effectively communicate results. Curriculum innovation accompanying the course is further developed in domain area “connector” courses that complement Data 8 concepts and “modules” that introduce data science into existing courses across campus. Several universities have incorporated aspects of this novel curriculum into their data science programs, including Cornell, Yale, University of Washington, and others. 

The 2020 National Data Science Education Workshop Goes Virtual 

UC Berkeley’s National Workshop for Data Science Education was created to support the increasing demand for data scientists; sharing best practices around teaching Data Science and Berkeley’s open-source curriculum, and creating a community around Data Science Education with instructors in higher education interested in developing Data Science classes and programs at their institutions educating the next generation of data scientists. 

 The third annual workshop was held June 22-26, 2020, with this year being the first virtual event. Over 500 attendees had the opportunity to learn about: 

  • Foundations of Data Science: Designing a curriculum built on computational thinking with Python, inferential thinking by resampling, prediction, and machine learning. 

  • Data Science Modules: How to infuse data sciences lectures, labs, assignments, and student projects into any domain area. 

  • Human Contexts and Ethics: Exploring how human, social, and institutional structures and practices shape technical workaround computing and data, as well as how data and computing permeate and shape our lives.

  • Technology Infrastructure: The technology underlying the pedagogy platform (JupyterHub, Kubernetes) and how to replicate it.

"The vision for this year's workshop was to continue to expand the voices beyond UC Berkeley to the wider range of stakeholders teaching Data Science. At first, we were worried about taking the workshop online, but we got a lot of interest and great engagement from across the country and felt that the hybrid online workshop was a success," shared Eric Van Dusen, Interim Director, Data Science Education Program.  

Attendees also had the opportunity to hear insights from Boise State University, the University of Illinois at Urbana-Champaign, University of Maryland, University of Virginia, and Wentworth Institute of Technology on their experiences adopting Berkeley’s Foundations of Data Science curriculum within their institutions. 

Growing Data Science Education Beyond Berkeley 

The demand to educate students to become skilled data scientists could not be more evident than it is today as data science has been linked to the rapid research and development around COVID-19. The Division of Computing, Data Science, and Society and the Data Science Education Program is committed to educating the rising generation of data scientists.

This year’s conference featured a panel of community colleges and California universities to collaborate as a community to help teach Data Science education to its respective students to enhance and grow the field of data science, discuss the obstacles that community college and university professors might face, coordinate across institutions in teaching Data Science, and offer strategies for UC Berkeley to support campuses implementing data science education curriculum.

Cathryn Carson, who developed the blueprint for Berkeley’s organizational realignment around data science, in her talk on Institutional Transformation, said it best, “data science is a social movement.” Data Science raises fundamental issues of justice and participation in the ways it engages with human beings as sources of data, as analysts, and as people affected by its products. This conference continues to be the first step in taking data science education beyond Berkeley to create a diverse new generation of data scientists to impact the world.