UC Berkeley Statistics Professor Deborah Nolan readily admits that writing is difficult for her. And based on what she has heard and seen in 30+ years of teaching, most of her students face the same challenge.
Several years ago, she decided to do something about it. She successfully applied for an Art of Writing Seminar Fellowship to teach a class in writing about statistics. One of the requirements was to co-design the class with a graduate student. Several people on campus recommended Sara Stoudt, a Ph.D. student in statistics at Berkeley, and a partnership was born.
In addition to developing the class, Nolan and Stoudt co-authored "Communicating with Data: The Art of Writing for Data Science," a book to be published in March 2021 by the Oxford University Press. Their article "Reading to write," which presents "a framework for learning the art of statistical storytelling" will appear in the December 2020 issue of Significance, the journal of the Royal Statistical Society in Great Britain.
"I didn't expect to teach a class about writing or write a book, but there seemed to be a void," said Nolan, who is the Associate Dean for Undergraduate Studies in the Division of Computing, Data Science, and Society. "I still do find it difficult to write, but it's less painful than it used to be."
Despite this, Nolan has co-authored five books on statistics and data and edited another three, as well as writing 63 journal articles.
The project will come full circle during the Berkeley Spring 2021 semester when Writing Data Stories will be a Data Science connector course in the Data 88 class series. Adam Anderson will be the instructor.
Getting out the word about data
Stoudt is now a lecturer in the Statistical & Data Sciences Program at Smith College, where she earned her bachelor's degree in math with an emphasis in statistics. She said that "everything is data these days and it's important to be able to explain why data analysis is important."
She got a first-hand look at the process as a summer intern on the data desk at the Los Angeles Times in 2019. She developed a tool to aggregate census data into specific districts that reporters were interested in and then used that information in their newspaper articles.
"In return, they trained me to think like a journalist," Stoudt said. "The experience got me to think a lot about what I could do with statistics."
One of the challenges in writing about statistics is that the discipline has very specific words with very specific meanings, like confidence, significance and p-values, she said.
"The writing needs to be faithful to the research, but not so much that it turns people away from our work," said Stoudt, who was also a Berkeley Institute for Data Science Fellow. "Knowing what language works takes practice; there are not a lot of examples of writing about statistics in the wild. We want students to realize they can be both a writer and a statistician."
It also takes constant attention, Stoudt said, adding that her students sometimes call her out "when I get sloppy in my explanations."
But the fact that she is even teaching statistics and data science students about writing still surprises her.
"I was always self-conscious about my writing--it was the thing I dreaded most," Stoudt admitted. "I still sometimes think 'How am I teaching writing?'."
Determining what's important
Nolan said she has been encouraging her students to write since the early 1990s. During the courses, students would take on a particular problem and have a specific role in the analysis and interpretation. This helps them identify what's important and why it's important to communicate what they discover. The write-up could be in the form of a consumer guide, a memo to their supervisor, or some other targeted piece.
"The focus was on how to present their analysis," Nolan said. "I wanted more technical writing support, but couldn't find it on campus. When I saw the opportunity to teach an Art of Writing seminar, I jumped on it."
Both Nolan and Stoudt say that a key part of teaching students to write convincingly is persuading them to take a position in their writing. "The idea of making an argument in a technical paper is not that common to data scientists," Nolan said. "But you need to put your findings in a particular light to convince the reader of the importance of the work. At the same time, you can't over-reach."
Stoudt agrees, adding that "We need to learn what we can and cannot say. You also need to be able to articulate why you like something or don't like it."
Reading to write, or peeling the onion
In their article in Significance, Nolan and Stoudt offer a "Reading to write" template based on repeated reading of an article, looking for different things each time, before carefully reading the piece from start to finish. The first step is to map the organization of the article and mark specific points, such as descriptions, graphs and tables, and conclusions. The second step is to identify the statistical elements, including the analysis. The final step is to examine the author's argument, including looking at his or her choice of words and how they either support or weaken that argument.
"Reading an article is like peeling an onion," Nolan said. "You have to go over it several times to fully understand it. Reading, like writing, is an iterative process."
When the two got down to writing their book, iteration was a critical part of the work. The months they spent developing the writing seminar provided the basis and it took about a year for them to write the first draft. They invited a diverse group of their peers to read it, professors from statistics, English, rhetoric and civil engineering.
"The idea with convening such a review panel is to speak as little as possible and to let them be in charge," Nolan said. "On their advice we did a substantial rewrite, eliminating some parts, rearranging others, adding a new chapter and explaining what we meant in various parts.
"When we could not possibly imagine doing any more work on it, we knew the book was done," Nolan said.
Looking ahead to Data 88: Writing Data Stories
In preparing his syllabus for the upcoming Spring 2021 class, Anderson used an advance copy of the book, calling it a "wonderful, helpful guide." He joined UC Berkeley in 2017 as a Mellon Postdoctoral Fellow in the Digital Humanities at Berkeley and is a lecturer in Digital Humanities and Data Science.
A linguist by training, Anderson will use a mixed-methods approach to the class. Typically, data scientists are interested in the shortest possible description, especially when it comes to coding where less is more, he said. In the humanities, when it comes to more detail, the answer is "yes, please."
"There are definitely two different minds converging," Anderson said, adding that his usual data sources are ancient texts, not numbers. "I've explored a number of different ways of writing and they all require thinking about your audience."
Whether looking at basketball statistics, rap lyrics or Twitter feeds, words are quantifiable, Anderson said, and it can be hard to wrap your head around that concept. In the upcoming class, students will analyze numbers, then translate the findings back into regular language.
"Goethe wrote that science and art started as the same thing, then grew apart, but would someday come back together," Anderson said. "I think we're there, but we don't know we're there.”
"It's really great to be able to teach this subject to undergraduates, to show them different ways to write for different audiences," Anderson said. "Learning to write empowers you to publish your research, which helps others understand your ideas."
1 The German reads: "Man vergaß, daß Wissenschaft sich aus Poesie entwickelt habe, man bedachte nicht, daß, nach einem Umschwung von Zeiten, beide sich wieder freundlich, zu beiderseitigem Vorteil, auf höherer Stelle, gar wohl wieder begegnen könnten" (Goethe 1817, 493)