This project will center around analyzing a database of sources for an emerging multidisciplinary field of practice in hopes of gaining helpful insight into the field’s major trends, focus areas, and trajectory. The bibliographic database consists of 200-300 sources (journals, books, blogs, etc.) related to the environmental impacts of the digital economy. While there is a considerable amount of scholarship focused on the digital economy generally, there continues to be a significant gap in our knowledge of the environmental and energy dimension. Our team, in partnership with the Yale School of the Environment and the Environmental Law Institute, is seeking a better understanding of the current landscape to inform our support of future research endeavors.
Problem and Goal
The digital economy is an interdisciplinary field that has risen to prominence in recent decades, giving shape to the innumerous economic activities and interactions that are rooted in today's diverse digital technologies. A plethora of scientific research has been generated around the digital economy, but there is still a significant gap in our knowledge of the digital economy's environmental and energy dimension.
This semester, our research group aimed to explore different visualization methods and to design the best methodology for analyzing and displaying research on the environmental and energy impacts of the digital economy.
We utilized data from two sources, a Zotero database compiled by project partners, and the Web of Science, a multidisciplinary research database with over 79 million records. The Zotero database consists of 260 manually selected sources related to the environmental impacts of the digital economy. Our second database was drawn from the Web of Science by querying its Core Collection for papers related to the Digital Economy/Internet and the Environment. Because of limitations with our visualization software, we pruned the records from this query into a 5,000 large subset using the papers that the Web of Science deemed the most relevant.
To explore knowledge gaps, disciplinary silos, and emerging collaboration networks, we applied many different data visualization tools to our data. Here are some of the visualizations we made:
A map created using 364 keywords extracted from the automatic tags in Zotero:
A map created using 3,237 keywords extracted from the most relevant 5,000 records on the Web of Science:
These two keyword maps are factors maps created using VantagePoint. They represent the results of Principal Component Analysis (PCA) graphically, and show the keywords that frequently occur together in our databases, which help us understand how different fields of study are correlated to each other.
A keyword map created using author added and automatically generated keywords of all 7,235 records from the Web of Science Core Collection query, separated by year:
This keyword map was created using Gephi, displaying only keywords with a degree larger than 24. It visualizes which keywords have risen to prominence in recent years, how connections been created and changed, and the growing complexity of this field.
A line graph created using 7,235 records from the Web of Science Core Collection query:
This line graph displays the ten most prominent keywords in environmental digital economy research and their frequencies over the years. It supports the map of keywords over time by quantifying the sizes of the largest nodes, in addition to showing when these words became popular.
A keyword map created using a smaller query (digital economy and environmental/energy) of 513 records from the Web of Science Core Collection:
This map was created using Gephi with artificial intelligence colored red, blockchain colored blue, sustainability colored green, and other nodes automatically colored based on their connections to these three keywords. The dimension of color helps identify connections and gaps between subjects.
We have gained a thorough understanding of the digital economy as an emerging and important academic discipline, as well as the significance of its environmental impacts. Throughout the project, we have become familiar with many different visualization software, including VantagePoint, Displayr and Gephi. In addition, we have learned about working with text data and the steps involved in collecting and cleaning keywords.
To further our understanding of this topic in the future, we will create more data visualizations using parameters other than keywords and time, such as citations. It will also be a great idea to do analyses using different algorithms.
Finally, we would like to thank Discovery Program for giving us the opportunity to be involved in data science research, and to thank Dave Rejeski, Jordan Diamond and Cait Cady for their guidance and help on this project.
Sophie Chen, Jamie Ip