Creative Commons has collected graph data linking different web properties that use Creative Commons licenses. Nodes are domains (rather than, e.g., individual pages). For each node, we record the number of CC Licenses and their types found at that domain. The edges (as one might imagine) are links between two domains, and are weighted by how many links there are between the two domains. We currently have a graph with about 250,000 nodes.

- How can we quantify the ‘influence’ of a node within the ‘Commons’, i.e., the set of domains hosting CC Licensed content? It would be interesting to integrate a standard metric (PageRank) with some metric that takes into account the number of CC Licensed works hosted at a domain.
- How can we define and find communities within the Commons using this graph? (perhaps k-cores, but those are usually defined on undirected graphs).
- Are there choke points or ‘narrow’ spots in the graph? (Think minimal cuts, or maybe minimal tree-decompositions)

We’d also be open to other ideas for analyzing the graph as appropriate. In particular, it would be interesting to explore metrics that either Creative Commons, or domains hosting CC Licensed content, could use when seeking funding from donors.

View project submission here.

Term
Fall 2020
Topic
Industry/Economics
Platforms/Infrastructure