Data Science Discovery Program

Data Sciences Discovery Showcase

The Data Science Discovery Program connects undergraduates with hands-on, team-based opportunities to contribute to cutting-edge data research projects with graduate and post-doctoral students, community impact groups, entrepreneurial ventures, and educational initiatives across UC Berkeley.

Highlighted Discovery Projects for Spring 2019

Humanitarian Data Exchange

UN OCHA (Office for the Coordination of Humanitarian Affairs) runs the Humanitarian Data Exchange based in the Hague where humanitarian agencies can upload their data sets for collaboration and sharing. Microsoft Research is working with UN to guide a team to assist in the creation of a process or resource whereby the HXL data tags could be applied to all 5,000 data sets to improve data analysis.

Bay Area Trip Choice Model

There are many externalities from driving; time lost in traffic, climate emissions, local air pollution and more. This project aims to aid policymakers working to limit these ills and improve travel in the Bay Area, particularly through pricing policies – adjustments in tolls, parking rates, etc. We aim to build a simplified commute model by simulating how people get to work. Options such as driving, taking transit (etc.) and the monetary and time cost of each can be constructed through a mapping API.

Fall Armyworm

Currently, over 300 million Sub-Saharan Africans depend on maize as their primary staple food crop. In 2016, a pest indigenous to the Americas called the Fall Armyworm (FAW) arrived in West Africa and has been spreading at breathtaking speed, attacking maize in 44 African countries in just one year. Students will help MercyCorps create a tool to help indicate the location and intensity of FAW outbreaks and help identify the risks of future outbreaks to enable humanitarian organizations to effectively target resources.

west big data innovation hub

Economics of Disaster Response

Working with West Big Data Innovation Hub and the State of California to address how we can better respond to natural disasters. Goals include finding ways to better optimize the demand and supply of physical donations and understanding the economics of disaster prevention and relief.


BEACO2N is a network of over 100 air quality and climate sensors. BEACO2N is undertaking projects on everything from understanding instrument calibration, to describing the composition of local plumes, to characterizing connections between weather and the observations, to visualization of the observations.

Water Data Collaborative

California State Water Resources Control Board seeks to create comprehensive, high-quality data on water rates for agencies across the state. This will allow easy interoperability in joining water use, income, and other demographic data sets.


Work At Home Vintage Experts

WAHVE pairs companies looking for specific skills with the veteran talent who have them. Businesses get the quality and knowledge they need, while ”vintage” professionals (people over 50) get to phase into retirement working from their home office. We have analyzed over 52,692 unique applicants and are looking to for students to participate in NLP analysis of the data.

Recent Discovery projects have included:

  • Working with SimpleWater to identify and prevent potential water quality issues by tracking and analyzing social media posts, broadcast transcripts, news articles, and other web sources.
  • Collaborating with UC Berkeley Central Human Resources to examine hiring, transfers, promotions, and separations to explore trends, successes, and opportunities to make progress toward affirmative action goals for non-academic staff.
  • Analyzing DNA sequences of 20 species of fruit flies with UC Berkeley’s Eisen Laboratory to enhance understanding of and better predict evolutionary patterns.

Why Discovery?

Data science is an intrinsically interdisciplinary field with broad reach, fast scaling capacity, and a large pool of interested students and projects. The Data Science Discovery Program, a joint effort of the Berkeley Institute of Data Sciences, Division of Data Sciences, and Undergraduate Research Apprenticeship Program (URAP), was created in 2015 to offer undergraduates the opportunity to build and apply data science skills and at the same time to provide collaborators with skilled students to help address their data challenges. 

Earn Academic Units

Students can earn units through the Undergraduate Research Apprenticeship Program (URAP) page under the Division of Data Sciences. As a long as you use that URAP, you can be considered for all the projects in the Discovery Program.  

Research Partners

The Data Science Discovery Experiences model can identify, connect, and scale access for significantly more students in the data science space. It does so by creating a sustainable and diverse pipeline of projects by improving the matching and database system, fine-tuning the training and consulting services needed by graduate students, postdocs, and undergraduate research leads, and expanding internal and public communication. 

Research projects with faculty, post-doctoral, and graduate students encompass all domain areas. Social Impact efforts with community non-profit groups offer the opportunity to help address critical issues. 

Email questions to:

Discovery Projects by Semester

Four students gathering around a laptop, laughing, at the Spring 2018 semester Data Scholars project showcase.

Spring 2018 Projects

Highlights from the 85 undergraduate researchers in over 30 projects.

Fall 2017 Projects

Highlighted projects from the Fall 2017 semester.


Spring 2017 Projects

Presentations from Spring 2017 Discovery Projects

Discovery Projects by Theme