Berkeley Data Stack

The Berkeley Data Stack is a collection of open source tools that help enable large-scale data science research and education efforts across UC Berkeley. These tools include:

Jupyter - Interactive computing notebooks
Online Textbooks - Open-source textbooks used in classroom instruction
Interact Links - One-click access to notebook content
Autograding Tools - Ok.py, Nbgrader, and Otter Grader

All of these elements of the Berkeley Data Stack can be found on Berkeley's JupyterHub, known as DataHub. UC Berkeley currently has one of the largest collection of JupyterHub deployments in the world, capable of supporting tens of thousands of users across campus and beyond.

CloudBank(link is external)

Berkeley is partnering with UC San Diego and the University of Washington on a grant from the National Science Foundation to develop CloudBank, a suite of managed services to remove barriers to public cloud access for data science and computer science research and education. Details and updates here. (link is external)

Education Hubs

JupyterHubs designed for classroom use in relevant courses.

Research JupyterHubs

Research oriented JupyterHubs, though are also used in course settings. Most deployments utilize SLURM.

JupyterHub Working Group Steering Committee

Shawna Dark - Chief Academic Technology Officer & Executive Director of Research, Teaching and Learning

Eric Fraser - Assistant Dean and Director of IT, College of Engineering

Yuvi Panda - Dev Ops Architect, Data Science Undergraduate Studies

Anthony Suen(link sends e-mail) - Director of Programs, Data Science Undergraduate Studies

CloudBank(link is external)

Education Hubs

Datahub

Data 100 Hub

Stat 140 Hub

Haas Hub

Research JupyterHubs

Research Computing

Economics JupyterHub

Statistics JupyterHub

NERSC JupyterHub

JupyterHub Working Group Steering Committee