Whether confronting a category-five hurricane in the Bahamas or an Ebola outbreak in the Democratic Republic of Congo, response workers rarely have time to meticulously collect and tag data. But properly-coded data can provide valuable insight into the crises that humanitarian workers face.
So last spring, the United Nations Office for the Coordination of Humanitarian Affairs (UN OCHA) and Microsoft AI partnered with UC Berkeley Discovery students to develop a machine learning – artificial intelligence algorithm that makes tagging data faster and more efficient.
“Adding HXL tags is a time-consuming and repetitive task," said UN OCHA project lead Simon Johnson. "The tagging of new datasets through an automated process has the potential to save hours of work. It also makes the standard and the associated tools accessible to a less technical audience.”
Microsoft Philanthropy and UNOCHA have had a longstanding partnership. In 2018, the UNOCHA Centre for Humanitarian Data approached the Microsoft team with a large, unstructured set of humanitarian data, which was part of the Humanitarian Data Exchange (HDX). HDX is an open data platform where recognized humanitarian organizations can upload their datasets to be shared publicly. The platform serves as a central source of information so that organizations can use data for the analysis of humanitarian crises. Having the datasets collected in one place is hugely valuable. By structuring them in a standard their utility for analysis can be increased.
The goal for the collaboration between the Discovery Program, Microsoft, and UN OCHA was to build a predictive model to apply the Humanitarian Exchange Language (HXL) Standard to more than 5,000 untagged datasets and create a real-time classifier for future datasets. The HXL Standard is a data standard that aims to improve information sharing during humanitarian crises without adding extra reporting burdens. HXL uses hashtags, such as #country or #contact to categorize data in humanitarian datasets.
In dealing with crises of scale, current methods to identify a common language to represent important information are labor intensive – we hope to address that with our applied research.
“Let’s take the example of addressing the Ebola outbreak as part of the United Nations and Red Cross relief efforts,” said Microsoft project lead and UC Berkeley computer science graduate Vinitra Swamy. “Usually, these viruses and crises don’t follow the boundaries of countries, states, and regions. Different organizations and different institutions have different ways of representing their data. In dealing with crises of scale, current methods to identify a common language to represent important information are labor intensive – we hope to address that with our applied research.”
Compared to previous work, the classifier provided a 14% accuracy increase and serves as a novel case study of using machine learning to enhance humanitarian data. The project members presented their work at the Knowledge, Discovery and Data Mining Conference in Anchorage in August.
“I learned a lot from dealing with the technical challenges of implementing a machine-learning model and an application programming interface (API) with the other members of our team,” said project member Anish Vankayalapati (Computer Science, Class of 2021). “From my experience working on this project, I believe that we can harness machine learning to make the lives of humanitarian workers and researchers easier and more effective by automating time-consuming mechanical processes so that they can focus on the more impactful aspects of their work.”
The Data Science Discovery Program connects undergraduates with hands-on, team-based opportunities to contribute to cutting-edge data research projects with graduate and postdoctoral students, community impact groups, entrepreneurial ventures, and educational initiatives across the university. This semester, 240 students are engaged in 40 projects with more than a dozen non-profits and governmental agencies.