Application Deadlines

Priority Deadline: January 23

Final Deadline: January 30

Team Placement Deadline: February 5

sp23 timeline

Application Instructions

Use this application portal to browse this semester’s projects and submit your application.

  1.  Browse projects and click on the project name that interest you. You will be redirected to an application page with more information on the chosen project.  
  2.  Submit an application for up to 10 maximum projects that you are interested in. Some projects may close their applications early due to high levels of interest, so students are advised to apply as soon as they can.

Program Requirements

The program offers variable units on a P/NP basis.

In order to receiving a passing grade:

  1.  Students are only allowed to participate in 1 Discovery project per semester.
  2.  Students are expected to commit 6 - 12 hours per week to their research project (the exact time commitment expected should be confirmed with the project partner). 1 unit of academic credit is available for every 3 hours of work per week. Students will be enrolled in a DATA 198 course for units when they join a project. 
  3.  All student teams are required to submit a Mid-Term Progress Report (this will be sent to you as a Google Form to fill out) and a Final Presentation during RRR week.

Have any questions? Email us at ds-discovery@berkeley.edu.

Recruiting Projects

These projects are still recruiting:

Longitudinal data harmonization of child development in Latin America University of California, Berkeley School of Public Health

The Power of Health in Africa: Remote sensing of power quality and reliability in Congolese health facilities Renewable and Appropriate Energy Lab, UC Berkeley

Photogrammetric model volume studies for forest fire fuel load analysis Dept. Anthropology

Global Poverty and Practice Minor Organizations (GPP Orgs) Database | UC Berkeley Global Poverty and Practice Minor

Impacts of localized artificial enhancement of sea ice albedo in Arctic for the Fire-weather Worldwide | Climformatics Inc.

Anomalies detection on variable frequency drive | Powerside

FHA: Faculty Hiring Analysis | Moore Accuracy Lab

Optimizing the climate benefits of seaweed farming |  Environmental Defense Fund

Collective Intelligence for Continuous Improvement |  UC Berkeley CalHOPE

The Child and Adolescent Needs and Strengths (CANS) and Youth Outcomes Aspiranet 

AIrish | UC Berkeley Department of English, Irish Studies

PhiloBiblon: From siloed databases to linked open data via Wikibast | PhiloBiblon

Linking Scientific Articles to Media Mentions |  UC Berkeley

Supreme Court Opinion Draft Viewer | UC Berkeley School of Law

Valuation of Nature Capital Zulu Forest Sciences

Reconstructing the evolution of river systems across the Cretaceous-Paleogene boundary 66 million years ago Earth and Planetary Science (EPS), Berkeley Geochronology Center

Ten Strands/CAELI - How Data can Accelerate Environmental Literacy for All CA TK-12 Students Ten Strands and California Environmental Literacy Initiative (CAELI)

Idiographic Dynamics Lab Idiographic Dynamics Lab

The Eviction Research Network The Eviction Research Network

Istanpolis: Visualizing local Christian communities of Ottoman Istanbul UC Berkeley, Department of History

Natural Language Processing for raw naturalistic audio data to learn about parent-child interaction in low-income households Center for Effective Global Action (CEGA), UC Berkeley

Fuel management in pebble bed reactors |  Nuclear Engineering Department

Continued Research in D10 and San Francisco Districts Wu Yee Children's Services

New approaches to reconstructing ancient continental motions Swanson-Hysell Group

Ensuring the Longevity of Just As Special's Foster Care Resource Database Just As Special

We Have to Talk: Resolving Conflict Through Spoken Conversation Social and Moral Judgment Lab (SOMO)

Creating patient summary using NLG Innovaccer

Toward Computational Literature Reviews: Analyzing Organizational Theories through Citations and Concepts |  Dartmouth College and UC Berkeley

Simulating Police Responses to Calls for Service | Berkeley Police Department

The Causes and Consequences of Physician Misconduct |  Haas School of Business

Neuroprotective targets for treating Glaucoma UC Berkeley School of Optometry & Vision Science

NGO Insights Dashboard DaanMatch

Visualizing Trends in Student Experience at Research Universities (Spring 2023) |  Student Experience in the Research University (SERU) Consortium, CSHE, UC Berkeley

Cleaning criminal justice data | EPIC Data Lab at Berkeley

LMC: Language Model Calibration |  Moore Accuracy Lab

 

Academia / Education

FHA: Faculty Hiring Analysis | Moore Accuracy Lab

Collective Intelligence for Continuous Improvement |  UC Berkeley CalHOPE

Linking Scientific Articles to Media Mentions |  UC Berkeley

Ten Strands/CAELI - How Data can Accelerate Environmental Literacy for All CA TK-12 Students Ten Strands and California Environmental Literacy Initiative (CAELI)

HERE to Promote Student Well-being in K-12 Education UC Berkeley Greater Good Science Center

OUSD Budget Analysis Oakland Education Association

Identifying trends and insights from conference abstracts Regeneron Pharmaceuticals

Predicting Insights from Conference Abstracts |  Regeneron Pharmaceuticals

Visualizing Trends in Student Experience at Research Universities (Spring 2023) |  Student Experience in the Research University (SERU) Consortium, CSHE, UC Berkeley

The impact of personal networks on advancing environmental education in schools |  The Hebrew University of Jerusalem

Toward Computational Literature Reviews: Analyzing Organizational Theories through Citations and Concepts |  Dartmouth College and UC Berkeley

Better Learning Through Better Learning Spaces |  University of California, Berkeley (Full)

Access patterns of large distributed data systemLBNL

Data Visualization: Equity in College and University Athletics |  Accelerate Equity (affiliated with non-profit organization Athlete Ally) (Full)

LENS: Learning the Educational Needs of Students | NWEA (Full)

 

Business / Economics

Diversity, Dominance, & Discrimination: "Dominance Terms" in Venture Capital Contracts University of California, Berkeley, Culture, Diversity, and Intergroup Relations Lab (CDIRL)

Seven Million Demand Elasticities Arizona State University, the San Francisco Fed, and the University of Chicago

Workforce Analytics - Employee Engagement Petaluma Health Center

Workforce Analytics-Benefit UtilizationPetaluma Health Center (Full)

Workforce Analytics - Time in Meetings vs. Productivity |  Petaluma Health Center (Full, no longer taking applications)

Separating Financial Fact from Opinion | Berkeley Law (Full)

A SkyDeck backed startup empowering young stock market investors to have access to the same information as institutional investors | Wisdm (Full)

Understanding the food retail landscape: Are corporate businesses killing small businesses in Mexico? UC Berkeley (Full)

 

Environment / Sustainability 

Impacts of localized artificial enhancement of sea ice albedo in Arctic for the Fire-weather Worldwide | Climformatics Inc.

Optimizing the climate benefits of seaweed farming |  Environmental Defense Fund

Valuation of Nature Capital Zulu Forest Sciences

High spatial resolution mapping of emissions and air | UC Berkeley Department of Chemistry

Role of climate change in infectious disease pandemic | Climformatics Inc.

Geospatial Analysis for Wildland-Urban Interface fire |  University of California, Berkeley

Building localized climate prediction product | Climformatics Inc.

Analysis of Snowfall Storms Dynamics in the Western US  |  UCB, ESPM (Full)

Impact of Climate Change on Mangrove Forests | IBM (Full)

ChatBot for Drinking Water RegulationsCalifornia State Water Resources Control Board (Full)

Can planting trees help fight climate change? - Building a data-driven framework for evaluating natural climate solutionsUC Berkeley (Full)

Climate Synapse Carbon Mitigation PlatformTusher Initiative, Haas School of Business (Full)

How much CO2 does our ecosystem breathe in and breathe out?UC Berkeley (Full)

Fighting Climate Super-Pollutants |  Project Climate at Berkeley Law (Full)

 

Humanities

Photogrammetric model volume studies for forest fire fuel load analysis Dept. Anthropology

AIrish | UC Berkeley Department of English, Irish Studies

PhiloBiblon: From siloed databases to linked open data via Wikibast | PhiloBiblon

FactGrid Cuneiform FactGrid Cuneiform

Text Analysis of Securities and Exchange Commission Comments and Rules | UC Berkeley School of Law

LMC: Language Model Calibration |  Moore Accuracy Lab

The Project on Arms Trade History |  Berkeley History Department

Improving the Sumerian Language Linguistic Annotation Pipeline & lexicalizing the CDLICuneiform Digital Library Initiative (CDLI)

Investigating the Ethics of AI ArtDenova Labs (Full)

 

Industry

PAYGo Lab: scalable data analytics of off-grid solar receivables payments in Africa Catalyst Energy Advisors

Catalyzing a global movement of food systems leaders, powered by plants and data Plant Futures Initiative

Project membership YMCA of the East Bay

Project AEI  Koer A.I., Inc

DSG Interns - Various ProjectsData for Social Good Foundation (Full)

Technovation's data dashboard | Technovation (Full)

Creating employee-rated company review dataset |  University of Colorado Denver (Full)

ZeeMee Year End Analytics and Visualizations | ZeeMee (Full)

Square's Partner-Referred Sellers | Square (Full, no longer taking applications)

Analyzing API Call Activity | Block / Square (Full, no longer taking applications)

Telematics Data DashboardHonda Development & Manufacturing of America LLC (Full, no longer taking applications)

Applying Data to Advance Augmented Reality | Geopogo AR+ (Full)

Community App |  Baby2Baby (Full)

ImpactMapping |  Impact Circles (Full)

Determining success | Callisto (Full)

CAA Members Discovery | Cal Alumni Association (Full)

Self-service data ingestion framework Regeneron Pharmaceuticals (Full)

Fathers' UpLift Data Dashboard | Fathers' UpLift (Full)

Automated Quality Control and Analysis of Histopathology Images |  Merck (Full)

 

Natural Sciences

Reconstructing the evolution of river systems across the Cretaceous-Paleogene boundary 66 million years ago Earth and Planetary Science (EPS), Berkeley Geochronology Center

Ensuring authenticity of biological imaging data |  University of California, Berkeley

Un-supervised behavioural classification of laboratory mice |  Dan Lab at UC Berkeley

Creating a data product using Intelligent Semantic Search |  Regeneron Pharmaceuticals

Network modeling to infer sparse gene co-expression estimates from single-cell sequencing data |  Merck Research Laboratory; Data and Genome Sciences Department

Building allele analyzer from human genomes for CRISPR gene therapy |  Clelland lab, UCSF Dept of Neurology; Weill Neurosciences Institute 

Benchmark dataset generation for discovery biologics | Merck

Deep learning for pharmacokinetics and pharmacodynamics | Merck

IB 32 gecko notebook |  IB 32 Bioinspired Design

Pupil Power - Developing Predictive Models of Pupil SizeDEVCOM Army Research Laboratory (Full)

Building computational tools to analyze life at single-molecule resolutionTjian-Darzacq Lab, University of California, Berkeley (Full)

Automated Genome Manipulation PipelineRegeneron Pharmaceuticals (Full)

Automated Next Generation Sequencing Workflow |  Regeneron Pharmaceuticals​ (Full)

Host Pathogen interaction in agingUniversity of California, Berkeley (Full)

Estimating human-technology team consensus and creativity from physiological signals |  US DEVCOM Army Research Laboratory (Full)​​​​​​

 

Physical Sciences / Engineering

Anomalies detection on variable frequency drive | Powerside

New approaches to reconstructing ancient continental motions Swanson-Hysell Group

Space Weather Drivers: Understanding Ionospheric Variability using Satellite Data Space Sciences Laboratory (SSL)

ML for micro robots and solar sails |  Berkeley Autonomous Microsystems Lab

Machine Learning-based design and modeling of Analog/Mixed-Signal Circuits  Berkeley Wireless Research Center

Real-Time Damage Assessment for Aerial Drone Imagery and Videos using AI SpatialGIS 

Using Machine Learning for 2D Seismic Facies Classification Along the Pacific Outer Continental Shelf Bureau of Ocean Energy Management (BOEM), Pacific Region

Reconstructing 2D CT dicom images into 3D volumes and perform registration-based quality check Merck

Creating Machine Readable Well Test Data to Support Reservoir Evaluation and CCUS Efforts |  Bureau of Ocean Energy Management (BOEM), Pacific Region

Automated Optical Structure Recognition and Activity Extraction |  Merck Research Laboratories

Machine Learning for Optimised Magnetic Field Computation |  UC Berkeley

Fuel management in pebble bed reactors |  Nuclear Engineering Department

Using vision transformer for retinal video object detection and instance-level semantic segmentations |  C. Light Technologies

Machine learn with time series data to identify abnormality of air conditioner | BART (Full)

ADAM: Asteroid Discovery, Analysis, and Mapping Platform - Data Inconsistencies in the MPC B612 Foundation (Full)

Categorization of journal entries of Operation Control Center (OCC) | BART (Full)

 

Public Health / Medicine

SFFD EMS and Community Paramedicine San Francisco Fire Department

The Power of Health in Africa: Remote sensing of power quality and reliability in Congolese health facilities Renewable and Appropriate Energy Lab, UC Berkeley

Analyzing Hospital Prices and Hospital Market Areas | UC Berkeley Petris Center

Creating patient summary using NLG Innovaccer

Social Listening on Vaccine Confidence in Southeast Asia UC Berkeley School of Public Health

Neuroprotective targets for treating Glaucoma UC Berkeley School of Optometry & Vision Science

Generating 3D models of the human heart from biomedical images using deep learning Shadden Lab

Using Data Science and Text Mining to Improve Classification and Analysis of Healthy Start Grantee Progress Reports |  Health Resources & Services Administration, Maternal & Child Health Bureau, Division of Healthy Start & Perinatal Services

The Causes and Consequences of Physician Misconduct |  Haas School of Business

Spirometry AI |  Fisher Center for Business Analytics, Haas School of Business

Meta-analysis methods for publicly available datasets for Inflammatory bowel disease | Merck

Healthcare Pricing UC Berkeley

Developing analysis pipeline for functional calcium imaging dataset Kaufer Laboratory, Dept of Integrative Biology, UC Berkeley

Machine learning from brain scans for predicting TBI outcomesNeural Systems and Data Science Lab (Full)

AHA Data Science - Analysis of > 13 million patient records | American Heart Association (Full)

Hospital Affiliations in California: Trends and Impacts |  Petris Center for Health Care Market and Consumer Welfare (Full)

Deep learning brain data for detecting Alzheimer DiseaseCenter for human sleep science (Full)

Deriving eye gaze metrics from functional MRI data |  UC Berkeley Dept of Psychology (Full)

Denoising to improve drug discovery assay models |  Merck & Co (Full)

 

Social Sciences

Multispecies Cities Studio Datathon UC Berkeley

Current Status of the Arctic-related research cooperation between U.S., Russian, and Chinese US Arctic Research Commission

Mining and Mapping Racial Attitudes in National Surveys Goldman School of Public Policy

Ensuring the Longevity of Just As Special's Foster Care Resource Database Just As Special

Longitudinal data harmonization of child development in Latin America University of California, Berkeley School of Public Health

Global Poverty and Practice Minor Organizations (GPP Orgs) Database | UC Berkeley Global Poverty and Practice Minor

The Child and Adolescent Needs and Strengths (CANS) and Youth Outcomes Aspiranet 

Supreme Court Opinion Draft Viewer | UC Berkeley School of Law

Idiographic Dynamics Lab Idiographic Dynamics Lab

The Eviction Research Network The Eviction Research Network

Istanpolis: Visualizing local Christian communities of Ottoman Istanbul UC Berkeley, Department of History

Natural Language Processing for raw naturalistic audio data to learn about parent-child interaction in low-income households Center for Effective Global Action (CEGA), UC Berkeley

Understanding Racial Disparities in Behavioral Health Emergency Response Risk Resilience Research

Diversity Tagging and Scoring for Films and TV Media Metadata Research Lab (mmrl)

Filling Gaps in Social Demographic Data using Machine Learning, Energy Consumption, and Alternative Data East Bay Community Energy

Analyzing internet broadband availability to close the digital divide. EducationSuperHighway

UN Common Country Analysis (CCA) United Nations Development Coordination Office

Cooperation Frameworks (CF) Good Practices Database United Nations Development Coordination Office

NGO Insights Dashboard DaanMatch

We Have to Talk: Resolving Conflict Through Spoken Conversation Social and Moral Judgment Lab (SOMO)

Continued Research in D10 and San Francisco Districts Wu Yee Children's Services

Cleaning criminal justice data | EPIC Data Lab at Berkeley

Simulating Police Responses to Calls for Service | Berkeley Police Department

Diversity, Equity, Inclusion, and Belonging Megastudy Berkeley Culture Center

Bay Area Arrest Trends: Perceptions vs RealityImpact Justice (Full)

A risk model for Social-Political Conflict in US StatesSchool of Information and Breakwater Strategy (Full)

UNSDG Information Management System |  United Nations Development Coordination Office (Full)

United Nations Country Team ReportUnited Nations Development Coordination Office (Full)