Macro View
The Research Data Culture Conversation confirms that research intensive universities face common challenges when setting out to understand and manage their research data. One particular challenge appeared urgent, namely an expected uncontrolled, exponential growth in the volume of research data due to improved sensing technology and increased data generation opportunities and mechanisms.
Dramatic scale up in data volume has ramifications on how institutions can operationally sustain their data services and would dramatically increase the urgency to be able to migrate data into cheaper less accessible retention services and ultimately to delete data.
A Macro View - the combined view of the data situation in practice - has been developed in order to better understand the actual Australian academic research data environment in relation to anecdotal expectations of a “data tsunami”.
In 2017, Monash undertook a review of its retained data volumes going back to 2009 in response to rapidly rising demand. An unsustainable compounding annual growth rate of 75% was observed.
Since then a focus on managing data at a collection layer has been introduced. An effect appears to exist because today Monash is able to project an overall lower growth rate, nearer 40%. The postulated 40% is higher than is observed in practice where growth in data under management at Monash has been below 30% for each of the last three years.
This observation motivated discussion around the macro scale growth rate observable in other universities.
All Universities are developing and maturing their data services.
In a similar fashion to Monash, UQ has operated corporate services for retained data for some time and reports the second largest volume. Melbourne and UNSW reported significant volumes of data transitioning into retained data management. They also reported their adoption of commercial services. University of Sydney has also transitioned onto new services.
Note that the volumes reported are first copy data volumes - the replication usually provided would double or triple the storage systems raw volume.
In terms of expectations, the expected high growth rate in data volume was a key concern motivating a macro view. Graph 3 illustrates an important observation:
The larger data holdings exhibit steady growth rates at a manageable (refer to footnote 1) scale
On average, the growth rate for the volume of unique (first copy) data averaged across the four universities is relatively modest (refer to footnote 2) and steady
On the assumption that unique data volumes at the macro scale are proportional to the scale of research activity, this graph makes the broad extrapolation from 40% of Australian University research activity to estimate the unique data volume at 100% of Australian University research activity.
The top line is based on the three largest holdings
The lowest line is based on the three smallest holdings
The middle line is based on all four
The scaling factors were calculated from the Australian Government’s research block grant tables for 2021.
Footnotes:
1 Below a steady 40% technology improvement as is reported for capacity / cost in storage systems.
2 Modest means here very much lower than was expected based on anecdotal claims regarding the rate of data growth (including the concept of a data tsunami).
All graphs are obtained from;
Soo, Ai-Lin; Quenette, Steve; Francis, Rhys (2022): Research Data Culture Conversation - A Macro View of Retained Australian Academic Research Data. Monash University. Report. https://doi.org/10.26180/20235570