Large science projects and simulations are moving large amounts of data to analyze on scientific clusters. Much of the data goes through open science data federation (OSDF). The role of OSDF is rapidly increasing in the open science community. Currently, there are thousands of concurrent operations of data movement and data accesses at any time. However, many data accesses and data movement are repeated, for example, by the same user who repeats the analysis on the same set of files for debugging or by different users from the same research group who work on related research topics. OSDF software could reduce the redundant data transfers by sharing data among users. Furthermore, OSDF also has regional caches to reduce the traffic on the wide-area network backbones. These regional caches also reduce the latency of the data access, which increases the overall application performance. This work aims to quantify these anticipated benefits and develop patterns that could used to for planning of the next generation of OSDF.

Fall 2022