UC Berkeley and MIT awarded $12.5 million to study critical issues in data science

September 1, 2020

Multidisciplinary institute to include five other universities

Berkeley, Calif.--The National Science Foundation has awarded $12.5 million to establish a multidisciplinary institute—a collaboration between UC Berkeley and MIT—to improve our understanding of critical issues in data science, including modeling, statistical inference, computational efficiency, and societal impacts. The award was announced Monday, Aug. 31, 2020.

The director of the new institute, called the Foundations of Data Science Institute (FODSI), will be UC Berkeley Prof. Peter Bartlett, who has appointments in the university’s Departments of Statistics and  Electrical Engineering and Computer Sciences (EECS). The co-director will be professor Piotr Indyk, a principal investigator (PI) at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). 

UC Berkeley co-PIs are Josh Hug, EECS;  and Michael Jordan, Martin Wainwright, and Bin Yu, all of Statistics and EECS. Other participating institutions are Boston, Northeastern, Harvard and Howard Universities; and Bryn Mawr College.

“Data science has emerged as the central science for the 21st century, a widespread approach to science and technology that permits empirical investigations at unprecedented scale and scope,” the team wrote in their winning proposal. “The explosion in the availability of data and growing awareness of the central role it can play in diverse domains from science, commerce and industry have added considerable urgency.”

The basis of data science started to form in the first half of the 20th century, combining the deductive and inductive traditions of mathematics to devise rigorous approaches to thinking about data and its role in scientific inquiry. But in the second half of the last century, new specialized disciplines such as computer science, mathematical statistics, control theory, information theory, and signal processing arose and were pursued individually, rather than as part of a larger whole. 

But now, data science is weaving these threads back together as it requires the expertise of mathematicians, statisticians, and theoretical computer scientists, among others, to make the most effective uses of massive datasets that increasingly affect how industry, academia, and government operate. The institute’s research themes include the complex interactions between decision makers, the data they use, and competing actors, as well as methods for making use of vast amounts of data, and the economic, social, and ethical implications of automated data analysis and decision-making.

“Under the banner of data science, those disciplines are now coming back together and we need to look at the theoretical foundations of all of them, across the breadth of issues raised

by data science problems,” said Bartlett, who is also Associate Director of the Simons Institute for the Theory of Computing. “We're starting to see a confluence of efforts in pursuing a better

understanding of how to solve scientific and societal problems by leveraging all of these disciplines. It’s important to consider possible solutions from many different perspectives.”

FODSI aims to meet this challenge, bringing together experts from many cognate academic disciplines to lay the theoretical foundations for the field of data science. It also aims to educate and mentor future leaders in data science at all levels, K-12 through postdoc, and to broaden participation and increase diversity in the data science workforce.

The new institute will convene public events, such as summer schools, research workshops and other collaborative research opportunities, that will serve the broader research community.  Many of these events will be hosted by the Simons Institute for the Theory of Computing, UC Berkeley's global center for collaboration in theoretical computer science.

“It was important to bring together participants from multiple universities to create this institution,” said Bartlett, who is also in Berkeley’s Division of Computing, Data Science, and Society (CDSS) “Berkeley has a strong technical affinity with the work being done at our partner institutions. As well as the vigorous flow of ideas, these collaborations will give the institute a national reach.”

The institute will exploit strong connections with researchers at its industrial partners (Amazon, Google, LinkedIn, Microsoft, and Verizon Media) to ensure engagement with the broad range of application domains that these partners represent.

With its collaborative structure and partnerships, the project is a good fit for the CDSS portfolio. CDSS fosters cross-campus partnerships, bringing together researchers in areas ranging from economics to social welfare, climate studies to public policy, computer science, electrical engineering, and statistics to biomedicine to apply tools like deep learning to solving societal problems. The new project builds on work by the EECS Department, which is jointly housed in CDSS and Berkeley’s College of Engineering, and the Statistics Department, which is in both CDSS and the Division of Mathematical and Physical Sciences.

The award was one of two made under the second phase of NSF’s Transdisciplinary Research in Principles of Data Science (TRIPODS) program. The two new projects build on 12 earlier Phase I projects and are closely tied to NSF’s Harnessing the Data Revolution (HDR) Big Idea.

In all, 20 UC Berkeley faculty will participate in the project representing CDSS, Computer Science, Economics, Electrical Engineering, Mathematics, Statistics and the Simons Institute for the Theory of Computing. Most of these faculty are also affiliated with BAIR, the Berkeley AI Research Lab. The BAIR Lab brings together UC Berkeley researchers across the areas of computer vision, machine learning, natural language processing, planning, and robotics. BAIR includes over 45 faculty and more than 200 graduate students and postdocs.

Read the NSF news release.

About CDSS

The Division of Computing, Data Science, and Society launched in July 2019 to leverage Berkeley’s preeminence in research and excellence across disciplines to propel data science discovery, education, and impact. Core to the Division is an understanding of how the digital revolution affects equality, equity, and opportunity—and the capacity to respond to related challenges.

The Division’s dynamic structure connects Data Science Education, the School of Information; the departments of Electrical Engineering and Computer Sciences and Statistics; and includes the Berkeley Institute for Data Science and the Data Science Commons. It’s designed to meet the opportunities and demands of a world increasingly informed and shaped by data, machine learning, and artificial intelligence in virtually every arena, from health to business to politics; from our cities to our climate to the cosmos.

About the Simons Institute for the Theory of Computing

The Simons Institute for the Theory of Computing based at UC Berkeley is the world's leading venue for collaborative research in theoretical computer science. The Institute brings together distinguished researchers in theoretical computer science and related fields, as well as the next generation of outstanding young scholars, to explore the foundations of computer science, and other scientific and social disciplines through a computational lens. Research is facilitated through curated, semester-long programs organized and attended by international cohorts of 60-70 scientists, scholars and practitioners.