Spring 2019 Discovery Projects

Highlighted Discovery Projects for Spring 2019

Humanitarian Data Exchange

UN OCHA (Office for the Coordination of Humanitarian Affairs) runs the Humanitarian Data Exchange based in the Hague where humanitarian agencies can upload their data sets for collaboration and sharing. Microsoft Research is working with UN to guide a team to assist in the creation of a process or resource whereby the HXL data tags could be applied to all 5,000 data sets to improve data analysis.

Bay Area Trip Choice Model

There are many externalities from driving; time lost in traffic, climate emissions, local air pollution and more. This project aims to aid policymakers working to limit these ills and improve travel in the Bay Area, particularly through pricing policies – adjustments in tolls, parking rates, etc. We aim to build a simplified commute model by simulating how people get to work. Options such as driving, taking transit (etc.) and the monetary and time cost of each can be constructed through a mapping API.

Fall Armyworm

Currently, over 300 million Sub-Saharan Africans depend on maize as their primary staple food crop. In 2016, a pest indigenous to the Americas called the Fall Armyworm (FAW) arrived in West Africa and has been spreading at breathtaking speed, attacking maize in 44 African countries in just one year. Students will help MercyCorps create a tool to help indicate the location and intensity of FAW outbreaks and help identify the risks of future outbreaks to enable humanitarian organizations to effectively target resources.

west big data innovation hub

Economics of Disaster Response

Working with West Big Data Innovation Hub and the State of California to address how we can better respond to natural disasters. Goals include finding ways to better optimize the demand and supply of physical donations and understanding the economics of disaster prevention and relief.


BEACO2N is a network of over 100 air quality and climate sensors. BEACO2N is undertaking projects on everything from understanding instrument calibration, to describing the composition of local plumes, to characterizing connections between weather and the observations, to visualization of the observations.

Water Data Collaborative

California State Water Resources Control Board seeks to create comprehensive, high-quality data on water rates for agencies across the state. This will allow easy interoperability in joining water use, income, and other demographic data sets.


Work At Home Vintage Experts

WAHVE pairs companies looking for specific skills with the veteran talent who have them. Businesses get the quality and knowledge they need, while ”vintage” professionals (people over 50) get to phase into retirement working from their home office. We have analyzed over 52,692 unique applicants and are looking to for students to participate in NLP analysis of the data.

Campus Partner Projects

Building damage detection from Satellite Images

Berkeley Seismology Lab

Satellite images and other remote sensing data provide us with a way for quick damage evaluation after hazards like earthquakes. Currently, more and more data are made available to the public to use. The motivation behind this project is to extract the damaged building or other types of infrastructures from the satellite images or other remote sensing approaches after the earthquake. Students will work with a researcher at the Berkeley Seismology Lab to develop machine learning models for object detection or classification to identify the damaged buildings or other critical information after the earthquakes. 

Text analysis of Campaign Messages in French Elections

Department of Economics

This project investigates the extent to which politicians adapt their discourse to electoral competition. We combine a new dataset of 30,000 manifestos issued by individual candidates at French legislative elections between 1958 and 1993 with computational text analysis to measure changes in discourse over the campaign - in particular, between election rounds


Analysis of the Electric Field in Near-Earth Space

Space Sciences Laboratory

The electric field plays a fundamental role in space. Yet, it remains poorly understood. The objective of the project is to analyze a large database of electric field measurements recently provided by the two spacecraft of the Van Allen Probes mission. The project aims to identify trends in the data, and ultimately to formulate a simple analytical model.


Neural Networks for Irregular Time Series

Lawrence Berkeley National Laboratory

As a part of the effort to evaluate the potential of neural networks for scientific applications, we are engaged in exploring the effectiveness of neural networks for predicting strongly irregular time Series.  The goal is to understand the limitations of the current neural network designs and the best ways to train the neural networks for streaming data. Initially, this exploration will be performed with CPU/GPU software.  Eventually, we anticipate converting the best neural networks onto an ML hardware system. 

Active learning on Chemical and Material Systems

Lawrence Berkeley National Laboratory

Conventional machine learning algorithms operate on fixed-length real-valued vectors, while real-world objects can often be more efficiently modeled as discrete objects containing non-linear structures and categorical attributes. As an example, a molecule is often visualized as a collection of atoms and bonds that fits naturally into a graph-based data structure. In this project, we seek to apply and advance recently introduced learning techniques such as graph kernel, message-passing neural network, and graph convolution, to construct models that can learn from chemical and material datasets that are encoded as graphs. The project will be expanded from multiple fronts, including algorithm design, software implementation and optimization, and application of existing algorithms to solve real-world scientific problems.

Machine Learning for Particle Reconstruction and Identification

LBNL and UC Berkeley

An unprecedented amount of data are collected by the ATLAS experiment at the Large Hadron Collider experiments, which enable the discovery of new physics beyond the established Standard Model of particle physics. The huge data sample serves as a perfect test ground for machine learning algorithms that deal with pattern recognition, sequence analysis, and classification and regression, etc.. Students will work with researchers at the Berkeley lab to develop machine learning based techniques to improve the identification and reconstruction of particles produced from high energy collisions, in order to enhance the sensitivity of the experiment for discovery. Scientific publications may be produced as a result of this project.