Data Collaboratives: A conversation with Tom Kalil of Schmidt Futures

January 6, 2019

t kalilLast fall, Schmidt Futures provided a grant to UC Berkeley’s Division of Data Sciences, with the goal of identifying specific opportunities for new and expanded data collaboratives by engaging our students, faculty, and external partners.

Recently, we caught up with Tom Kalil, Chief Innovation Officer at Schmidt Futures, to learn more about the motivation for this grant.  Tom Kalil is no stranger to the UC Berkeley campus, having served as Special Assistant to the Chancellor for Science and Technology from 2001-08.  Previously, Tom worked for both President Clinton and President Obama, and helped to design and launch dozens of science and technology initiatives in areas such as nanotechnology, neuroscience, and data science.

Q.  What are data collaboratives?

Data collaboratives are partnerships to share and use data for the public good.  They require partnerships between organizations that have data, can derive insights from the data, and take some action or make a decision that is informed by those insights.  They often involve collaborations between companies, university researchers, non-profit organizations, and government agencies.

Much of the excitement about data collaboratives stems from the flood of data from the private sector, such as social media data, mobile Call Data Records, satellite imagery, and e-commerce transactions. There are many scenarios in which private sector data can be reused to create additional public value by helping to solve some important societal problem.

For example, Internet companies needed to decide which broadband technology they should deploy in a given developing country.  They used high-resolution satellite imagery and convolutional neural networks to create accurate population density maps.  This same information can be used to inform vaccination campaigns in sub-Saharan Africa, and reduce under-5 child mortality.

In short, data is not enough.  A successful data collaborative requires going from “data to knowledge to action.”

A successful data collaborative requires going from 'data to knowledge to action.'

Q.  Why did Schmidt Futures provide a grant to UC Berkeley to identify additional data collaboratives?

Finding opportunities for high-impact data collaboratives is going to require people and teams with expertise in both data science and a particular societal problem, such as energy and climate, water, affordable housing, poverty alleviation, public health, sustainable agriculture, etc.

Through the Division of Data Sciences, UC Berkeley is creating opportunities for multidisciplinary teams of students to work together with faculty advisors and external partners in the public and private sectors.  Examples include Discovery Projects, modules, and datathons/data challenges.  These students and their partners are in an ideal position to identify opportunities for new or expanded data collaboratives.

On a personal note, I launched the Big Ideas program in 2006, which has supported hundreds of student-led projects to help solve important societal problems.  I’ve always been impressed by Cal students, and in particular, their creativity, entrepreneurial mindset, and commitment to the public interest.

 I’ve always been impressed by Cal students, and in particular, their creativity, entrepreneurial mindset, and commitment to the public interest.

Q.  How can data collaboratives create public value?

There are a number of ways in which data collaboratives can create public or societal value, including:

  • Real-time situational awareness or now-casting, such as improving response to natural or man-made disasters. Some social media companies are providing aggregated location, movement, and self-reported safety data to partner organizations such as the Red Cross. 
  • Identifying variance in performance, and using that to close the gap between leading and lagging performers.  For example, thanks to the Dartmouth Medical Atlas, we know that some regions in the U.S. are getting “more for less” with their Medicare dollars.  (Rates of coronary stents are three times higher in Elyria, Ohio than in nearby Cleveland, home of the Cleveland Clinic).
  • Strengthening the monitoring and evaluation of government programs to enhance their effectiveness.
  • Improving the targeting of scarce resources.  For example, if a public health agency has limited resources to prevent childhood lead poisoning, how should it target inspections or otherwise allocate their resources?
  • Prediction and prevention – for example reducing the high school dropout rate or college non-completion rate by identifying students at greatest risk of dropping out, and intervening before they do.
  • Empowering individuals to make better decisions – for instance better data about job prospects/placements for the graduates of different training and postsecondary education programs can help individuals select programs which are more effective.
  • Increasing our fundamental understanding of social phenomena of interest, such as advances in computational social science that can inform policy.

Q.  What are some of the ways in which companies are participating in data collaboratives?

Some companies have created “data stewards.”  The data steward responds to requests for data, ensures that access is provided in a responsible way, manages the firm’s participation in data collaboratives, and creates a broader community around the use of their firm’s data.  An example of a company that has done this is LinkedIn.  LinkedIn has data on 590 million members, 30 million companies, 50,000 skills, 84,000 schools, and 20 million open jobs.  They have created a cross-functional Economic Graph team with the goal of increasing employment opportunities for people around the world.  In the U.S. they are partnering with mayors, foundations, employers, and non-profits to grow the local workforce in tech, manufacturing, and health IT.

Q.  What role are researchers playing in addressing the risks and ethical issues associated with data collaboratives?

Researchers are:

  • Developing privacy-preserving approaches to data science and machine learning, such as differential privacy and secure multiparty computation;
  • Designing toolkits that highlight unintended consequences of technologies; and

Q.  What resources would help Berkeley students learn more about data collaboratives?

An organization called the GovLab has information on examples of data collaboratives and the role of data stewards, a framework for designing data collaboratives, and a report on data collaboratives involving social media firms.