Fall 2020 Discovery Projects

Content Filters


Wordnik - Hyphenation Project

Wordnik provides a hyphenation API, with data licensed from traditional dictionaries.

Using AI and real-time data to power an early warning system for public safety and the recognition of emotions

We will build an AI system to detect negative emotion from vocalizations recorded by body cameras warn by police (building upon several published papers in my lab) and develop a warning system for police officers, that informs them of when t

Berkeley Police Department- Simulating Alternative Responses to Calls for Service

The Berkeley City Council recently voted to audit the "calls for service" (CFS) received by the Berkeley Police Department (BPD) to determine the feasibility of transferring the response to certain types of calls to alternative emergency response

SF Chronicle - Disaster Maps

The San Francisco Chronicle has produced a well-known Fire Tracker for several years, using data from NOAA as well as NASA's MODIS and VIIRS-I satellites. We are also working on a Flood Tracker for the Houston Chronicle.

NLP for Cannabis Text Data

For this project research apprentices will use Python to write code scraping, munging, and classifying product data to better understand the dynamics of the United States cannabis industry.

Lawrence Berkeley National Lab - Modeling Access Pattern of Large Scientific Data Repository with Machine Learning

Large amounts of data is archived on the tape archive (HPSS), especially for those experiments that produce large volumes of data every year.

Creative Commons - Linked Commons Graph Analysis

Creative Commons has collected graph data linking different web properties that use Creative Commons licenses. Nodes are domains (rather than, e.g., individual pages).

East Bay Community Energy - Evaluating Alameda County CO2 Emissions and Optimizing Customer Programs Using Marginal Emissions Data

East Bay Community Energy’s goal is to provide cleaner and cheaper energy to East Bay Cities and customers. The goal of this project is to answer the following questions: 1.

Girl Effect - Speaking her Language: Using NLU to build conversational products for girls in developing countries

Girl Effect builds products for girls in developing markets. We work in Rwanda, India, Ethiopia, South Africa and Tanzania.

WITI@UC, CITRIS and COE - Visualizing Women in Tech Data at UC

As part of continuing research in to the state of women in tech, WITI@UC, would like to enlist students to visualize data from CalAnswers and UCOP to show percentages and changes over time in the parti

Goodly Labs - Public Editor

Students on this project will analyze data from a collaborative web app guiding thousands of internet volunteers to read through the most shared news articles of the day and label evidence of misinformation in the content.

UC Berkeley School of Law - Indigenous Brands and Social Movements

Have you noticed how many brands in the marketplace use indigenous/Native American - oriented terms and imagery?

Creative Commons - Image Popularity and Authority

CC Search (https://search.creativecommons.org/) is a media search engine maintained by Creative Commons which currently indexes metadata for around 500 million images.

Creative Commons - Image Clustering and metric inference

CC Search (https://search.creativecommons.org/) is a media search engine maintained by Creative Commons which currently indexes metadata for around 500 million images.

Innovations for Youth - Exploring spaces and places of violence against young people experiencing homelessness

The study interviewed young people experiencing homelessness and accessed administrative data, and those data have been used to construct geospatial analyses of sites and pathways of violence.

Exploring the Achievement Gap in Berkeley Public Schools

Working with state and district data looking at achievement gaps in Berkeley Public Schools. See our work here

School of Information - Evaluating Accessibility on Congressional Websites

Voters and constituents increasingly look to the websites of elected officials to answer their questions about voting or constituent services.

California Partners for Advanced Transportation Technology (PATH) - Erroneous High Occupancy Vehicle (HOV) Degradation

In a recent review of Performance Measurement System (PeMS) data quality for the Connected Corridors project along the I-210 corridor, it was discovered that almost 10% of HOV loops along 30-miles of freeway were actually located in the mainline.

NHERI SimCenter - Enhancing regional scale natural hazard simulations with artificial intelligence

The NHERI Computational Modeling and Simulation Center (SimCenter) provides next-generation computational modeling and simulation software tools, user support, and educational materials to the natural hazards engineering research community with th

Berkeley School of Law - Empirical Examination of Corporate Rebranding and Trademarks

Companies are frequently found registering new trademarks or modifying existing trademarks as part of their rebranding efforts. Several reasons such as the change of company management teams may explain why rebranding decisions are made.

Goodly Labs - Demo Watch

The Demo Watch project has collected and is curating over 8,000 news articles describing all the interactions between police and protesters during the Occupy movement.

UCSF - Clinical text classification/information extraction to understand real-world treatment effects at a large, academic medical center

Every time you visit the doctor and watch her document your complete medical history, your data is being captured by huge electronic health records (EHR) systems on the backend.

Cal Alumni Association - CAA Data Analysis Project

Seeks to utilize data science to offer insights that could potentially improve CAA’s operational efficiency. 

Voice of Specially Abled People - Business Case for Investment into MMR vaccination by Developing Nations

MMR vaccine helps reduce birth defects, similar to Polio vaccine where most of the world is now polio-free because of serious efforts on Polio Vaccination, globally.

Building Integrated Solar Photovoltaics Assessment via ML

With the advent of novel image classification algorithms, the opportunities to apply them to the energy space are increasing, too.

Lawrence Hall of Science - Building Data Science Apps to teach Data Science

The project goal will be to develop a number of digital applications designed to help people of all ages learn about different data science concepts, like statistics and ML algorithms, by providing learners with intuitive examples and interactions

Berkeley Law Center for Law, Energy & the Environment - Analyzing Research on the Environmental & Energy Impacts of the Digital Economy

This project will center around analyzing a database of sources for an emerging multidisciplinary field of practice in hopes of gaining helpful insight into the field’s major trends, focus areas, and trajectory.

Berkeley School of Law - What's Important to the Supreme Court

The Supreme Court of the United States has essentially unconstrained discretion to set its own docket.

ESPM - Addressing Structural Inequality in U.S. Agricultural Higher Education: An Assessment of Pedagogical Practices and Food Systems Coursework at Land-grant Institutions

In the last decade, many U.S. “land-grant” agricultural universities (including UC Berkeley) have turned a reflexive lens on the fact that their institutions marginalize certain community members.

UCSF - A Precision Medicine Recommender System for Inflammatory Bowel Disease: a pilot study using real electronic health records data from UCSF

The idea of Precision Medicine is to move beyond our one-size-fits-all healthcare system and towards one where we make data-driven treatment decisions that take into account individual factors, like patient demographics, genetic background, and me

DaanMatch - A decision-making tool/method to facilitate the correlation of NGO’s and Corporation’s datasets for an efficient fund/aid allocation

DaanMatch is a project to address social inequalities in fund distribution in India. Recent data shows that around 60% of organizing funding is allocated to projects in urban areas and that the most benefited are a few multinationals NGOs.

United Ways of California

We would like to be able to build a mechanism that can collect data from photos of various documents and use that data to automatically fill out applications for COVID-19 relief and ongoing safety-net programs.

Goodly Labs - Research Ready Government Archives

The Research Ready project seeks two students to help improve and maintain archives of government activity that researchers, journalists, and the public can easily query.

Bay Area Rapid Transit - Reduce service outages of BART

Due to the drop in ridership from the COVID-19 pandemic, BART's operating budget has been drastically reduced.

City of Paterson - Mapping Eviction Trends in the City of Paterson

With unemployment rate at its historic highs and moratoriums on eviction expiring across the United States, tens of thousands of renters are expected to face homelessness.

Girl Effect - Using NLU to derive evaluation results from qualitative in-product feedback

Girl Effect builds products for girls in developing markets. We work in Rwanda, India, Ethiopia, South Africa and Tanzania.

UC Berkeley, Biophysics - Identification and Classification of Intrinsically Disordered Regions in Proteins

Regions within proteins can be broadly classified into two types: ordered and disordered. Ordered regions assume a defined three dimensional structure and are identified by their unique sequence of amino acids.

BioXplor Inc - Data Visualisation for Covid-19 Free online Platform

Design & build a modern dashboard to show a summary behind various Covid-19 Ontologies powered by Biomedical Literature (gene, drug, disease, source, author, applications, gene mutations etc)

Mapping for Environmental Justice

Mapping for Environmental Justice (MEJ) creates easy-to-use, publicly-available maps that paint a holistic picture of intersecting environmental, social, and health impacts experienced by communities across the US.

Sumerian Networks

The goal of the Sumerian Network project has been to build reproducible socio-economic networks from the Ur III textual archives.

The Tempest Media - AI Content Management System for The Tempest Media

The Tempest is a digital publishing platform that has published hundreds of articles over the past 5 years.

Wordnik - Etymology Search

In this project, we hope to build an etymology search tool and API for Wordnik users. We'll digitize an out-of-copyright etymological dictionary and pull data from Wiktionary, and create an appropriate datastore.

Group Dynamics on Reddit

I downloaded 10 years of Reddit data (metadata and content). I am looking to clean the data and run some statistical models to examine what predicts group commitment (i.e., commitment to subreddit).

UCSF - Clinical Natural Language Understanding using transformer models and extensions incorporating tabular data

The advent of the BERT language model has achieved state of the art performance on a variety of Natural Language Understanding (NLU) tasks such as question-answering.

Powerside - System Telemetry Analysis

Powerside wishes to develop value condition monitoring algorithms for high-value systems with generally high consequential costs of failure... industrial, medical, transportation, government, telecoms, for example.