Discovery Offers Cloud Computing Credits
Click here to request Microsoft Azure credits!
Discovery Projects by Semester
Spring 2021 Projects(link is external)
Fall 2020 Projects
Spring 2020 Projects
Fall 2019 Projects
Summer 2019 Projects
Spring 2019 Projects
Fall 2018 Projects
Spring 2018 Projects
Fall 2017 Projects
Wordnik - Hyphenation Project
Wordnik provides a hyphenation API, with data licensed from traditional dictionaries.
Using AI and real-time data to power an early warning system for public safety and the recognition of emotions
We will build an AI system to detect negative emotion from vocalizations recorded by body cameras warn by police (building upon several published papers in my lab) and develop a warning system for police officers, that informs them of when t
Berkeley Police Department- Simulating Alternative Responses to Calls for Service
The Berkeley City Council recently voted to audit the "calls for service" (CFS) received by the Berkeley Police Department (BPD) to determine the feasibility of transferring the response to certain types of calls to alternative emergency response
SF Chronicle - Disaster Maps
The San Francisco Chronicle has produced a well-known Fire Tracker for several years, using data from NOAA as well as NASA's MODIS and VIIRS-I satellites. We are also working on a Flood Tracker for the Houston Chronicle.
NLP for Cannabis Text Data
For this project research apprentices will use Python to write code scraping, munging, and classifying product data to better understand the dynamics of the United States cannabis industry.
Lawrence Berkeley National Lab - Modeling Access Pattern of Large Scientific Data Repository with Machine Learning
Large amounts of data is archived on the tape archive (HPSS), especially for those experiments that produce large volumes of data every year.
Creative Commons - Linked Commons Graph Analysis
Creative Commons has collected graph data linking different web properties that use Creative Commons licenses. Nodes are domains (rather than, e.g., individual pages).
East Bay Community Energy - Evaluating Alameda County CO2 Emissions and Optimizing Customer Programs Using Marginal Emissions Data
East Bay Community Energy’s goal is to provide cleaner and cheaper energy to East Bay Cities and customers. The goal of this project is to answer the following questions: 1.
Girl Effect - Speaking her Language: Using NLU to build conversational products for girls in developing countries
Girl Effect builds products for girls in developing markets. We work in Rwanda, India, Ethiopia, South Africa and Tanzania.
WITI@UC, CITRIS and COE - Visualizing Women in Tech Data at UC
As part of continuing research in to the state of women in tech, WITI@UC(link sends e-mail), would like to enlist students to visualize data from CalAnswers and UCOP to show percentages and changes over time in the parti
Goodly Labs - Public Editor
Students on this project will analyze data from a collaborative web app guiding thousands of internet volunteers to read through the most shared news articles of the day and label evidence of misinformation in the content.
UC Berkeley School of Law - Indigenous Brands and Social Movements
Have you noticed how many brands in the marketplace use indigenous/Native American - oriented terms and imagery?
Creative Commons - Image Popularity and Authority
CC Search (https://search.creativecommons.org/(link is external)) is a media search engine maintained by Creative Commons which currently indexes metadata for around 500 million images.
Creative Commons - Image Clustering and metric inference
CC Search (https://search.creativecommons.org/(link is external)) is a media search engine maintained by Creative Commons which currently indexes metadata for around 500 million images.
Innovations for Youth - Exploring spaces and places of violence against young people experiencing homelessness
The study interviewed young people experiencing homelessness and accessed administrative data, and those data have been used to construct geospatial analyses of sites and pathways of violence.
Exploring the Achievement Gap in Berkeley Public Schools
Working with state and district data looking at achievement gaps in Berkeley Public Schools. See our work here(link is external).
School of Information - Evaluating Accessibility on Congressional Websites
Voters and constituents increasingly look to the websites of elected officials to answer their questions about voting or constituent services.
California Partners for Advanced Transportation Technology (PATH) - Erroneous High Occupancy Vehicle (HOV) Degradation
In a recent review of Performance Measurement System (PeMS) data quality for the Connected Corridors project along the I-210 corridor, it was discovered that almost 10% of HOV loops along 30-miles of freeway were actually located in the mainline.
NHERI SimCenter - Enhancing regional scale natural hazard simulations with artificial intelligence
The NHERI Computational Modeling and Simulation Center (SimCenter) provides next-generation computational modeling and simulation software tools, user support, and educational materials to the natural hazards engineering research community with th
Berkeley School of Law - Empirical Examination of Corporate Rebranding and Trademarks
Companies are frequently found registering new trademarks or modifying existing trademarks as part of their rebranding efforts. Several reasons such as the change of company management teams may explain why rebranding decisions are made.
Goodly Labs - Demo Watch
The Demo Watch project has collected and is curating over 8,000 news articles describing all the interactions between police and protesters during the Occupy movement.
UCSF - Clinical text classification/information extraction to understand real-world treatment effects at a large, academic medical center
Every time you visit the doctor and watch her document your complete medical history, your data is being captured by huge electronic health records (EHR) systems on the backend.
Cal Alumni Association - CAA Data Analysis Project
Seeks to utilize data science to offer insights that could potentially improve CAA’s operational efficiency.
View our work here(link is external).
Voice of Specially Abled People - Business Case for Investment into MMR vaccination by Developing Nations
MMR vaccine helps reduce birth defects, similar to Polio vaccine where most of the world is now polio-free because of serious efforts on Polio Vaccination, globally.
Building Integrated Solar Photovoltaics Assessment via ML
With the advent of novel image classification algorithms, the opportunities to apply them to the energy space are increasing, too.
Lawrence Hall of Science - Building Data Science Apps to teach Data Science
The project goal will be to develop a number of digital applications designed to help people of all ages learn about different data science concepts, like statistics and ML algorithms, by providing learners with intuitive examples and interactions
Berkeley Law Center for Law, Energy & the Environment - Analyzing Research on the Environmental & Energy Impacts of the Digital Economy
This project will center around analyzing a database of sources for an emerging multidisciplinary field of practice in hopes of gaining helpful insight into the field’s major trends, focus areas, and trajectory.
Berkeley School of Law - What's Important to the Supreme Court
The Supreme Court of the United States has essentially unconstrained discretion to set its own docket.
ESPM - Addressing Structural Inequality in U.S. Agricultural Higher Education: An Assessment of Pedagogical Practices and Food Systems Coursework at Land-grant Institutions
In the last decade, many U.S. “land-grant” agricultural universities (including UC Berkeley) have turned a reflexive lens on the fact that their institutions marginalize certain community members.
UCSF - A Precision Medicine Recommender System for Inflammatory Bowel Disease: a pilot study using real electronic health records data from UCSF
The idea of Precision Medicine is to move beyond our one-size-fits-all healthcare system and towards one where we make data-driven treatment decisions that take into account individual factors, like patient demographics, genetic background, and me
DaanMatch - A decision-making tool/method to facilitate the correlation of NGO’s and Corporation’s datasets for an efficient fund/aid allocation
DaanMatch is a project to address social inequalities in fund distribution in India. Recent data shows that around 60% of organizing funding is allocated to projects in urban areas and that the most benefited are a few multinationals NGOs.
United Ways of California
We would like to be able to build a mechanism that can collect data from photos of various documents and use that data to automatically fill out applications for COVID-19 relief and ongoing safety-net programs.
Goodly Labs - Research Ready Government Archives
The Research Ready project seeks two students to help improve and maintain archives of government activity that researchers, journalists, and the public can easily query.
Bay Area Rapid Transit - Reduce service outages of BART
Due to the drop in ridership from the COVID-19 pandemic, BART's operating budget has been drastically reduced.
City of Paterson - Mapping Eviction Trends in the City of Paterson
With unemployment rate at its historic highs and moratoriums on eviction expiring across the United States, tens of thousands of renters are expected to face homelessness.
Girl Effect - Using NLU to derive evaluation results from qualitative in-product feedback
Girl Effect builds products for girls in developing markets. We work in Rwanda, India, Ethiopia, South Africa and Tanzania.
UC Berkeley, Biophysics - Identification and Classification of Intrinsically Disordered Regions in Proteins
Regions within proteins can be broadly classified into two types: ordered and disordered. Ordered regions assume a defined three dimensional structure and are identified by their unique sequence of amino acids.
BioXplor Inc - Data Visualisation for Covid-19 Free online Platform
Design & build a modern dashboard to show a summary behind various Covid-19 Ontologies powered by Biomedical Literature (gene, drug, disease, source, author, applications, gene mutations etc)
Mapping for Environmental Justice
Mapping for Environmental Justice (MEJ) creates easy-to-use, publicly-available maps that paint a holistic picture of intersecting environmental, social, and health impacts experienced by communities across the US.
Sumerian Networks
The goal of the Sumerian Network project has been to build reproducible socio-economic networks from the Ur III textual archives.
The Tempest Media - AI Content Management System for The Tempest Media
The Tempest is a digital publishing platform that has published hundreds of articles over the past 5 years.
Wordnik - Etymology Search
In this project, we hope to build an etymology search tool and API for Wordnik users. We'll digitize an out-of-copyright etymological dictionary and pull data from Wiktionary, and create an appropriate datastore.
Group Dynamics on Reddit
I downloaded 10 years of Reddit data (metadata and content). I am looking to clean the data and run some statistical models to examine what predicts group commitment (i.e., commitment to subreddit).
UCSF - Clinical Natural Language Understanding using transformer models and extensions incorporating tabular data
The advent of the BERT language model has achieved state of the art performance on a variety of Natural Language Understanding (NLU) tasks such as question-answering.
Powerside - System Telemetry Analysis
Powerside wishes to develop value condition monitoring algorithms for high-value systems with generally high consequential costs of failure... industrial, medical, transportation, government, telecoms, for example.