Spring 2019 Discovery Projects

Signature Partner Projects

Humanitarian Data Exchange

UN OCHA & Microsoft 

UN OCHA (Office for the Coordination of Humanitarian Affairs) runs the Humanitarian Data Exchange. This is a sponsored center based in the Hague where humanitarian agencies can upload their data sets for collaboration and sharing. The datasets apply the Humanitarian Exchange Language tags to allow for ease of data analysis. One of the big issues they have is that even though there are 6,000 datasets so far, only about 1,000 have applied the HXL. UN is looking for a team from the Berkeley Data Science Division who could assist in the creation of a process or resource whereby the HXL data tags could be applied to 5,000 data sets. 

Water Data Collaborative Tools 

West Big Data Innovation Hub & California State Water Resources Control Board

Creating a comprehensive, high-quality data on water rates for agencies across CA. Working towards easy interoperability by ensuring keys are available to join between data sets and developing open source tools to make these joins when needed. Some example data sets that have already been mentioned include "Agency Boundary" polygons. These are necessary for mapping and joining with other spatial datasets (like ACS). Water use data from either the conservation reporting or the EAR. Income and demographics from the ACS. This data is not available at the granularity of agencies, so an open source tool for estimating agency-level income (etc) would be valuable. An ecosystem of tools and analyses that build on top of these data sets would be valuable.

Campus Partner Projects

Building damage detection from Satellite Images

Berkeley Seismology Lab

Satellite images and other remote sensing data provide us with a way for quick damage evaluation after hazards like earthquakes. Currently, more and more data are made available to the public to use. The motivation behind this project is to extract the damaged building or other types of infrastructures from the satellite images or other remote sensing approaches after the earthquake. Students will work with a researcher at the Berkeley Seismology Lab to develop machine learning models for object detection or classification to identify the damaged buildings or other critical information after the earthquakes. 

Text analysis of Campaign Messages in French Elections

Department of Economics

This project investigates the extent to which politicians adapt their discourse to electoral competition. We combine a new dataset of 30,000 manifestos issued by individual candidates at French legislative elections between 1958 and 1993 with computational text analysis to measure changes in discourse over the campaign - in particular, between election rounds


Analysis of the Electric Field in Near-Earth Space

Space Sciences Laboratory

The electric field plays a fundamental role in space. Yet, it remains poorly understood. The objective of the project is to analyze a large database of electric field measurements recently provided by the two spacecraft of the Van Allen Probes mission. The project aims to identify trends in the data, and ultimately to formulate a simple analytical model.


Neural Networks for Irregular Time Series

Lawrence Berkeley National Laboratory

As a part of the effort to evaluate the potential of neural networks for scientific applications, we are engaged in exploring the effectiveness of neural networks for predicting strongly irregular time Series.  The goal is to understand the limitations of the current neural network designs and the best ways to train the neural networks for streaming data. Initially, this exploration will be performed with CPU/GPU software.  Eventually, we anticipate converting the best neural networks onto an ML hardware system. 


Chemistry/EPS (Data Collaborative)

BEACO2N is a network of over 100 air quality and climate sensors. We have projects related to understanding the instrument calibration, describing the composition of local plumes, characterizing connections between weather and the observations, and visualization of the observations. For more information check out:


Active learning on Chemical and Material Systems

Lawrence Berkeley National Laboratory

Conventional machine learning algorithms operate on fixed-length real-valued vectors, while real-world objects can often be more efficiently modeled as discrete objects containing non-linear structures and categorical attributes. As an example, a molecule is often visualized as a collection of atoms and bonds that fits naturally into a graph-based data structure. In this project, we seek to apply and advance recently introduced learning techniques such as graph kernel, message-passing neural network, and graph convolution, to construct models that can learn from chemical and material datasets that are encoded as graphs. The project will be expanded from multiple fronts, including algorithm design, software implementation and optimization, and application of existing algorithms to solve real-world scientific problems.

Machine Learning for Particle Reconstruction and Identification

LBNL and UC Berkeley

An unprecedented amount of data are collected by the ATLAS experiment at the Large Hadron Collider experiments, which enable the discovery of new physics beyond the established Standard Model of particle physics. The huge data sample serves as a perfect test ground for machine learning algorithms that deal with pattern recognition, sequence analysis, and classification and regression, etc.. Students will work with researchers at the Berkeley lab to develop machine learning based techniques to improve the identification and reconstruction of particles produced from high energy collisions, in order to enhance the sensitivity of the experiment for discovery. Scientific publications may be produced as a result of this project.


Work At Home Vintage Experts


WAHVE pairs companies looking for specific skills with the veteran talent who have them. Businesses get the quality and knowledge they need, while ”vintage” professionals (our word for over 50) get to phase into retirement working from their home office.  We have been disrupting the outdated cliché that all work must be done in the office. We are proving that older workers who have critical business skills, knowledge and expertise are able to deliver a high level of productivity and do so from their home offices.  We have analyzed over 52,692 unique applicants. Students will participate in the NLP analysis of our data.