March 28, 2017

Advancing data-intensive research and teaching across Berkeley

The ability to integrate, analyze, and archive mammoth amounts of data has become a mainstay for cutting-edge research and teaching across many academic disciplines. As data science takes root, the D-Lab has emerged as a nexus for the support that the campus’ faculty and researchers need to take advantage of the latest methodologies.Photo of a D-Lab data science workshop at UC Berkeley

Operating out of brightly-painted offices in Barrows Hall, the D-Lab supports data-intensive research across Berkeley, with special strength in the social sciences and digital humanities.  D-Lab helps Berkeley researchers -- faculty, postdocs, graduate and undergraduate students from every school and college across campus --  to integrate the latest software and technology for applying qualitative and quantitative methods to their research and teaching.

Each year, D-Lab delivers more than 200 workshops serving more than 6,000 researchers from departments across campus.  D-Lab also offers one-on-one consulting for those conducting data-intensive research; last year, D-Lab’s 40 consultants received more than 1,100 consultation requests. D-Lab also convenes 15 working groups, open to all Berkeley affiliates, on topics such as Computational Text Analysis, Digital Humanities, and Machine Learning.

“Our trainings are where students learn to execute on complex areas of data science,” said Justin McCrary, faculty director of the D-Lab and a faculty member at Berkeley Law. “If you find yourself looking at your grad students’ code in dismay,  you’re doing it wrong. You should have already sent them down to D-Lab.”

Empowering graduate student researchers

Faculty research typically requires graduate students. But when professors undertake data-intensive research projects, their graduate students may lack the data science skills -- from programming to using the latest statistics software -- needed to achieve the project’s objectives. That’s where the D-Lab comes in.

“When I was a graduate student here, there was no support for me to learn the methodology I needed,” said Claudia von Vacano, who received her doctorate at Berkeley and is now D-Lab’s executive director. “The D-Lab now provides what was missing for me.”

Claudia von Vacano, UC Berkeley D-Lab

Claudia von Vacano, D-Lab's executive director

When faculty want to undertake a data-intensive research project, they can go to D-Lab to access free consulting services, take workshops themselves, or send their graduate student researchers (GSRs) to take a workshop, free of charge, for the specific skillset needed for the project. Topics might include qualitative research, the Python programming language, text analysis, the data visualization tool Tableau, and Drupal. In turn, the RAs emerge with new skills, and are able to produce accurate, replicable results.

Making data-intensive courses possible

With a philosophy of “it’s OK not to know,” D-Lab is known for providing the consulting and training necessary for faculty and other researchers to incorporate data-centric techniques into their classrooms. Using a peer-to-peer model, D-Lab pairs researchers with one another to facilitate learning pedagogica techniques for teaching data analysis. It also convenes working groups around common goals.

D-Lab workshop

This semester, Scott McGinnis, a Berkeley Ph.D. student in history, is teaching a Data Science connector course that explores the history of Japanese-American internment through the lens of data science. Prior to teaching the course, he’d had little experience with Python, the programming tool used in most Data Science Education Program courses.

“D-Lab is where I learned Python,” said McGinnis, who also coordinates a Digital Humanities working group that meets at the D-Lab. “I took a workshop with other faculty and researchers, and we were able to help each other learn.”

Rochelle Terman, who received her Ph.D. in political science at Berkeley with an emphasis in gender & women’s studies, credits the D-Lab with enabling her to teach courses using a mix of quantitative, qualitative and computational methods.  Now a postdoc at the Center for International Security and Cooperation at Stanford, she conducts research that examines international norms around gender and advocacy, with a focus on the Muslim world.

“After gaining confidence at the D-Lab, I was able to develop and teach my own course on computational social science and digital humanities," said Terman.

Enabling data access and archiving

Supporting the data needs of Berkeley researchers is a key part of the D-Lab’s mission. D-Lab can help faculty learn about options for computation, how to leverage data for which the D-Lab has access (including the US Census, the California/Field Poll, and other datasets), as well as how and where to store sensitive data.

Reed Walker, Assistant Professor of Business and Public Policy and Economics, works on research that explores the social costs of environmental externalities such as air pollution and how related regulations can contribute to gains and/or losses to the economy. He has worked closely with the D-Lab to secure the data needed for his research. 

Reed Walker, UC Berkeley

Professor Reed Walker, Haas School of Business

“D-Lab has been instrumental in supporting access to confidential, administrative data for my own research. Social science research is increasingly relying on large, confidential datasets to explore both new and age-old questions in ways that were never before possible,” said Walker, who received a 2017 Sloan Foundation Research Fellowship and the 2015 IZA Young Labor Economist Award. “These data products and data sharing agreements are necessarily sensitive, and D-Lab has been willing to offer assurances and financial resources that make data sharing agreements possible for many leading researchers at Berkeley.”

Similarly, Professors Kenneth Ayotte of Berkeley Law and Jared Ellias (UC Hastings College of Law) are currently working on a project that uses large-scale text analysis on bankruptcy court documents in order to learn more about the incentives and strategies of the parties in a bankruptcy case. D-Lab consultant Christopher Hench has helped collect and organize the unstructured data using UC Berkeley’s access to remote computing resources. In the project's next phase, Hench will also help to parse and build models of the text.

Kenneth Ayotte, UC Berkeley

Professor Kenneth Ayotte, Berkeley Law

"Chris Hench from D-Lab has been enormously helpful to us. He’s written the scripts to automate the downloading of the court documents, and we never could have done this on our own," Ayotte said.

Breaking down barriers to entry

The D-Lab has a long-standing commitment to making data-intensive approaches more accessible and welcoming to a diverse swath of the campus community, said Executive Director Claudia von Vacano.

Photo of participants in a D-Lab workshop at UC Berkeley

“As a queer Latina, and as an immigrant who came to the United States as a political refugee, I have experienced first hand the barriers to entry into the academy generally and specifically within data intensive social science,” said von Vacano.

More than half of D-Lab workshop attendees are women, and more than one in five attendees identifies as African-American, Latino/a, or Other/Multi-racial, a fraction substantially larger than the 13 percent of the 2016 entering graduate student body identifying as under-represented minorities.

Moving forward, D-Lab is expanding its partnerships with other data science related initiatives at Berkeley. In addition, D-Lab’s team is focused on listening to graduate students and using their input to guide and execute future workshops and events.

Learn more about the D-Lab at dlab.berkeley.edu.