Humanitarian Data Exchange
UN Office for the Coordination of Humanitarian Affairs
Bay Area Trip Choice Model
Currently, over 300 million Sub-Saharan Africans depend on maize as their primary staple food crop. In 2016, a pest indigenous to the Americas called the Fall Armyworm (FAW) arrived in West Africa and has been spreading at breathtaking speed, attacking maize in 44 African countries in just one year. Students will help MercyCorps create a tool to help indicate the location and intensity of FAW outbreaks and help identify the risks of future outbreaks to enable humanitarian organizations to effectively target resources.
Economics of Disaster Response
West Big Data Innovation Hub
Water Data Collaborative
California State Water Resources Control Board
Work at Home Vintage Experts
Building damage detection from Satellite Images
Berkeley Seismology Lab
Text analysis of Campaign Messages in French Elections
Department of Economics
This project investigates the extent to which politicians adapt their discourse to electoral competition. We combine a new dataset of 30,000 manifestos issued by individual candidates at French legislative elections between 1958 and 1993 with computational text analysis to measure changes in discourse over the campaign - in particular, between election rounds
Charter Schools and the Business Age
UC Berkeley Sociology Department
How does the push to run schools like businesses--complete with performance targets, incentives, and centralization in culture and governance--shape the growing charter school sector? Which charters survive and thrive in this political climate: those that stress standards-based rigor and college-readiness (traditional model), or those that prioritize independent thinking and socio-emotional development (progressive model)? And how does this differentiation affect charter school segregation--that is, do progressive schools serve white students in affluent, liberal communities while traditional schools serve students of color in poor or conservative communities?
Vice Chancellor of Finance
We want to build an interactive ad-hoc BI Reporting solution where users can get high level data with a simple search query against the data source (Oracle Database, Flat File). The main aim of this application will be to simplify the way users build ad-hoc queries using key words (similar to twitter hashtags) such Year, Month, Department, Employee Count, Average Years of Service, Enrollment, Revenue, Budget etc. Application search / query page we want to display some example keywords and canned Queries.
Ancient World Citation Analysis
Digital Humanities / D-Lab / Near Eastern Studies
The goal is to build a citation network with a 2TB collection of documents from the fields of ancient Near Eastern Studies, Classics, Archaeology, and Middle Eastern Languages. The result of this project will make this collection more internationally accessible for research by the scholars in these fields. First we will run the OCR in batches using Savio, a cluster computer through Research IT. Next we will data-mine the results (using NER methods) for both citation analysis and bibliographic analysis. The result will be a multi-modal network of authors-to-authors (i.e. who cites whom?) and authors-to-primary sources (i.e. who cites what?), with links to the OCRed text in 'bags of words' (to avoid copyright issues). We will also introduce tools/methods for textual analysis (e.g. Gensim).
The Citizen Science Solution to Media Misinformation Students on this project will help develop and test a collaborative web app guiding thousands of internet volunteers to read through the most shared news articles and find evidence of misinformation in the content. Working with a national coalition of social science researchers and journalists, a Nobel Laureate, cognitive scientists, and software designers/developers students will see first hand how the social good technologies of the future get built. And students’ individual efforts will be essential to the creation of a world where the public has confidence in its capacity to discern truth from fiction and fact from opinion.
Using the Government Publishing Office’s Congressional Hearing Transcripts as an example, the project leads will guide participants through their own data liberation project, in which they will: scrape the web for document files while retaining document metadata; programmatically find and extract meaningful data objects within the documents; link those objects to external databases; prepare all this compiled textual data for computational analysis in R and Python; and host their newly formed database so that the public and other researchers can launch their own studies of the data.
Discovering Patterns of Peace and Violence between Police and Protesters Students on this project will use a collaborative web app to process the information in over 8,000 news articles describing all the interactions between police and protesters during the Occupy movement. Data processed by students will be used to find patterns of peace and violence, which can be used to scaffold broad public conversations, and shift the behavior of police and protest strategists. Students work will be used to create artificial intelligence able to understand dynamics between cities and movements, and recommend policies more likely to result in peaceful and effect political expression.
East Bay Ophthalmology
Glaucoma is estimated to affect 3 million people in the United States alone, and is the leading cause of preventable, irreversible blindness worldwide. Although glaucoma is a largely treatable disease, poor medication compliance occurs in an estimated 30% to 60% of patients. Ocuelar uses an innovative smartphone application to increase patient medication compliance through a Bluetooth smart cap that is able to automatically record when a patient takes their glaucoma eye drops. This unique opportunity at the intersection of health and data science will allow students to analyze the patient datasets from our preliminary study conducted at East Bay Ophthalmology to find creative trends that could directly impact patient care.
East Bay Ophthalmology
iCare is a humanitarian organization with the mission of providing ophthalmic and oculoplastic care to those that have the least access to it. Since 2006, iCare has partnered with local ophthalmologists to advance ophthalmic and oculoplastic surgical care in over 20 countries. For this project, students will conduct a retrospective analysis of patient health outcome data from Macedonia and China to quantify the efficacy of the program.
Analyzing Big Data from the Centers of Medicare and Medicaid
East Bay Ophthalmology
In the last three decades, big data has been applied to diverse fields, such as government, international development and education. It is only now that the US healthcare system gas begun to explore its under-utilized data. Under the supervision of Dr. Scott Lee, students will analyze the data available on the Centers of Medicare and Medicaid Services database to find innovative ways to understand clinician decision making in today’s healthcare system.
Traffic, Mobility & Sustainability in Megalopolis
California Institute for Energy and Environment
As cities are expected to continue growing, so are the challenges and complexities related to moving people around, and the environmental impact of having combustion-based vehicle fleets for mobility. In this project, you will be working with a large datasets from cities and their traffic patterns. We are expecting to implement learning algorithms to better understand urban environments, how people are moving, and approaches to reduce the environmental impact.
Analysis of the Electric Field in Near-Earth Space
Space Sciences Laboratory
The electric field plays a fundamental role in space. Yet, it remains poorly understood. The objective of the project is to analyze a large database of electric field measurements recently provided by the two spacecraft of the Van Allen Probes mission. The project aims to identify trends in the data, and ultimately to formulate a simple analytical model.
Neural Networks for Irregular Time Series
Lawrence Berkeley National Laboratory
As a part of the effort to evaluate the potential of neural networks for scientific applications, we are engaged in exploring the effectiveness of neural networks for predicting strongly irregular time Series. The goal is to understand the limitations of the current neural network designs and the best ways to train the neural networks for streaming data. Initially, this exploration will be performed with CPU/GPU software. Eventually, we anticipate converting the best neural networks onto an ML hardware system.
Active learning on Chemical and Material Systems
Lawrence Berkeley National Laboratory
Conventional machine learning algorithms operate on fixed-length real-valued vectors, while real-world objects can often be more efficiently modeled as discrete objects containing non-linear structures and categorical attributes. As an example, a molecule is often visualized as a collection of atoms and bonds that fits naturally into a graph-based data structure. In this project, we seek to apply and advance recently introduced learning techniques such as graph kernel, message-passing neural network, and graph convolution, to construct models that can learn from chemical and material datasets that are encoded as graphs. The project will be expanded from multiple fronts, including algorithm design, software implementation and optimization, and application of existing algorithms to solve real-world scientific problems.
Machine Learning for Particle Reconstruction and Identification
LBNL and UC Berkeley
An unprecedented amount of data are collected by the ATLAS experiment at the Large Hadron Collider experiments, which enable the discovery of new physics beyond the established Standard Model of particle physics. The huge data sample serves as a perfect test ground for machine learning algorithms that deal with pattern recognition, sequence analysis, and classification and regression, etc.. Students will work with researchers at the Berkeley lab to develop machine learning based techniques to improve the identification and reconstruction of particles produced from high energy collisions, in order to enhance the sensitivity of the experiment for discovery. Scientific publications may be produced as a result of this project.