Interested to set up Datahub for your class?
Datahub creates on-demand cloud-based Jupyter notebook and R Studio notebook servers and are the basis of the technical infrastructure for Data 8 and related courses.
The main Datahub deployment is at datahub.berkeley.edu. In addition there are several other hub archetypes serving diverse instructional needs of the Berkeley instructors.
Jupyter Notebook Examples
Introductory notebook in R exploring the question of whether politicians racially discriminate against their constituents in Introduction to Empirical Analysis and Quantitative Methods course taught by David Broockman.
Notebook in Python analyzing the incarceration trends and impacts of prison realignment in California as part of Ethnic Studies modules taught by Victoria E Robinson.
Datahub provides standard computing infrastructure to many foundational courses across diverse disciplines. Instructors who are interested to run their Jupyter based workflow use Datahub. Datahub provides standard computing infrastructure, package management in Python, and storage solutions catering to the instructional requirement of many introductory data science courses.
R Hub provides standard computing infrastructure to instructors using R-based tools (RStudio IDE, Jupyter R). R Hub is widely used by instructors teaching quantitative social science courses. Fun fact: Infrastructure team within Berkeley made an immense contribution to the Jupyterhub ecosystem by adding R Studio as part of the standard offering which improved access to R based instructors.
Biology hub is a compute-intensive infrastructure tailored towards the needs of instructors in Biology and Genomics. Hub provides additional compute to support the complex data science use cases requiring large datasets as part of the courses taught eg: Hub supports compute intensive workflow to analyze large datasets in Genome sequencing.
Stat 159 Hub is an innovation hub tailored to the needs of the Stat 159 course taught by Fernando Perez. One of the objectives for this hub is to make it a "home away from home" for students enrolled in this course. Students will use the hub like their local setup and will utilize some of the advanced Datahub use-cases which include remote desktop environment in Linux, secure access to GitHub, Dropbox-like functionality to share files, Real-time collaboration, Real time file sharing etc.
Datahub is built with the principle of inclusion in mind. Any instructor irrespective of their domain can expose their students to data science workflow using Datahub.
Datahub completely removes the dependency on the student's local desktop configuration in order to run their Data Science workflow. Datahub provides the required infrastructure including the storage and compute in an equitable manner for all students
Datahub is built with an open-source ethos in mind. Datahub is completely free of cost, and no licensing is required for the instructors/students to access the infrastructure. In addition, The team behind Datahub has a strong connection with the open-source ecosystem including the Jupyter ecosystem.
Datahub was initially piloted in Spring 2017 as part of a small classroom of 50+ students in Data 8. At the start of Spring 2022 semester, Datahub supports almost 1500+ students who are enrolled in Data 8. Datahub Infrastructure’s ability to handle the growth in Data 8 is a huge testament to its scalability.
daily active users
monthly active users
number of student years spent using Datahub since 2019
cloud costs per student per semester
Jupyter environment on Datahub has greatly improved the overall outcome of the assignments, where the more experienced students continue to do well, but the less computer-savvy students are also doing well in the assignments. Variable time is therefore spent on the seismological lessons rather than figuring out how to do something and whether Excel or Matlab should be used. The process of defining assignments on bcourses and then linking to the assignment on Datahub works very well. In addition, when demonstrating or teaching how to approach a problem, it is very useful for the instructor to do so on exactly the same platform the students will be using for the assignment.
My advice to other faculty would be that it does take a bit of learning the tools up front, but that the team does a great job of teaching you how to use the tools and supporting you. Then, once you get it, it's really seamless and an amazing stack of tools. I can't imagine not teaching my class any other way.
First, I had classes in Matlab but later shifted to Jupyter notebooks. Notebooks are very easy to teach from a pedagogical standpoint. The power of Datahub is that students can connect remotely to use Jupyter notebooks. Key advantages are that the packages are pre-installed and students need not have the latest environments or powerful computers. It makes such environments accessible. Through Datahub, We can do magic things with Data.