Characterising Australia’s Experience with Research Data at Scale


eResearch Australasia 2022

As part of eResearch Australasia 2022, the RDCC jointly hosted a workshop with the ARDC at the Brisbane Convention and Exhibition Centre. The RDCC also hosted a Birds of a Feather session.

  • This workshop was subsidised by the ARDC.

    Morning: Expanding the Macro View

    The Macro View of data burden developed by the RDCC, provides a first ever estimate for the scale of research data managed at Australian universities. Prior to the workshop, this first perspective on data at scale will be improved by expanding the small group of research intensive institutions to other Universities and sector stakeholders (MRIs, NCRIS facilities, CSIRO etc).

    The objectives of the morning session are to:

    - Review accuracy of estimates for Australia's retained research data; 

    - Characterise macro level trends in the types of data being retained; and 

    - Define learnings and identify limitations for the macro view. 

    Afternoon: From LifeCycles to Functions and Decision Points

    Data are treated differently as they pass through their own particular lifecycle.  Efficient management of those data requires pragmatic decisions, which carry increasingly significant operational consequences.  As the data management scale increases so these decisions need to become easier.

    The ARDC Data Retention Project found that one familiar decision point, assigning a DOI, was a challenge for many university systems, processes and policies.  Foundational information on retained data was often distributed across internal operational units.

    During the workshop the relationship between three core functions will be explored along with the decision points that transition data between them:

    - Meeting organisational obligations and operational requirements

    - Publishing research data; and

    - Curating research data assets.

    Workshop participants will assist develop a Macro View of these functions by identifying their role within them and useful decision points providing those functions drawn from their particular institutional experiences, such as determining data is sensitive.

    The workshop goal is to propose a future design for an RDMP-2.0, which together with the Macro View will  inform an approach to data infrastructure management useful to both local systems and national coherence.

Thank you to everyone who attend the workshop! It was a great day, with over 60 participants from across the country gathering for a full day workshop to discuss the Macro View of Australia’s research data. Highlights of the day included presentations from Monash University, The University of Queensland, The University of New South Wales, Australian Data Archives, CSIRO and the Australia Antarctic Division as well as a thought exercise in the afternoon to gather around the key concepts.

Together, we explored the Green Space and Pink Space - a concept arising out of the Macro View that seeks to articulate the differences in the components of a data system, namely the Green Space for organisation's obligations and the Pink Space for research communities. To explore the content of the workshop as well as the outcomes, see the resources below.

Thank you again to the ARDC, the presenters and AeRO for the great workshop!

  • The Research Data Culture Conversation (RDCC) is an ongoing discussion held between Monash University, University of Melbourne, University of New South Wales, University of Queensland and the University of Sydney aiming to understand and improve research data culture. In 2021 the RDCC constructed a “Macro View” of their research data holdings (https://doi.org/10.26180/20235570.v1).

    Primary findings of this first attempt at a Macro View were:

    – The five Universities held in total 72 PetaBytes (PB) of unique research data in 2021

    – Data holding showed a compound growth rate over the last six years of ~31%

    – Extrapolating to all Universities the 2021 volume of unique data is estimated between 137 – 176 PB

    In 2022 the RDCC in partnership with the ARDC Data Retention and the Institutional Underpinnings programmes is extending the Macro View to more of Australia’s research data landscape including national research infrastructures, CSIRO, Medical Research Institutes and a broader set of universities. That work is revealing a key distinction that may exist between the approach to retaining and curating researcher files and retaining and curating research data.

    A quick recap of the Macro View will be given, then the floor will be opened to topics arising such as:

    - When and how do researcher files become research data?

    - What are the key differences between curating files and curating data?

    - How do research repositories and services such as FigShare compare on this topic?

    - If data curation is content or domain specific can institutions curate data independently?

To wrap up activities of the RDCC at eResearch Australasia 2022, a BoF was held on the last day of conference to summarise the learnings from the workshop. As part of this open discussion, a short presentation was given to update on the activities to date and this was then followed by an open floor discussion.

Thank you to all who participated in the discussion!