eResearch Infrastructure Implementation Frequently Asked Questions

The Project

This project is about delivering fit-for-purpose storage and computing infrastructure. This will be delivered by deploying new hardware, software, and expertise to create: 

  1. A central storage space for all AgResearch’s research data. 

    • Replacing our aging legacy High Performance Computing (HPC) with a more performant compute environment 

  2. Provide application support to researchers through our partnership with the talented people at NeSI (New Zealand eScience Infrastructure) 

Relevant Gateway articles: 

https://agresearchnz.sharepoint.com/sites/Gateway/SitePages/eResearch-platform-update.aspx  

https://agresearchnz.sharepoint.com/sites/Gateway/SitePages/Green-light-for-eResearch---our-first-Enabling-Platform.aspx  

eResearch enabling platform has four objectives to improve AgResearch’s eResearch capability. One of those objectives is to provide fit-for-purpose storage and computing infrastructure.  

A central storage space will help with: 

  • Data discoverability - the central storage space will work with our new Outputs Management System (OMS) to ensure all research data are catalogued, hence assisting with data discoverability. 

  • Data organisation - this will make it easier to locate and access data. Additionally, it will ensure that data is stored in a consistent and secure manner. 

  • Data protection - there will be robust systems in place for backup and recovery thus protecting valuable research data from loss, corruption, and malware. 

  • Collaboration - the system will allow us to give access to our collaborators, thus enabling researchers to collaborate on projects more easily.

The HPC will enable our researchers to: 

  • Processing large data sets, thus enabling them to analyse and visualize their data in a timely manner.  

  • Developing, training and scaling up models in Machine Learning and Deep Learning. 

  • Modelling and simulating complex systems 

  • Collaboration: The new system will enable our collaborators to also access the data and compute system. This is especially important because most of our research now is moving towards transdisciplinary fields and we will need expertise from a range of areas to to tackle complex problems.  

What’s changing

If you work with research data at AgResearch things will be changing for you.  

Data will be stored in a different place 

As mentioned earlier, one of the main components of this project is centralising all our organisation’s research data. As your data is moved to the new infrastructure file paths (i.e. the links you use to access data) will break. To minimise the impact to you we will be doing the migration for you, communicating throughout the change and will be ready to help if you need it.

Data will be stored in ‘Projects’ or 'Datasets

Projects and Datasets are the first two logical resources provided from the eResearch Infrastructure. Both concepts have well-defined ownership and access models, as well as lifecycle, which will help to ensure a basic consistent level of data-management is maintained and that our infrastructure is used efficiently. 

Both Projects and Datasets provide access to central storage on the eResearch Infrastructure.

Projects are intended for active/ongoing work whereas Datasets are to enable collaboration on, sharing of, and reference to research data. A single research activity/project might require both a Project and one or more Datasets on the eResearch Infrastructure. At the end of a research activity, a final step might involve turning the Project into a Dataset for archive (after some tidy up and additional description work). With your help we aim to develop best-practice guidance for different types of work over time. 

A Project will include a project directory, a scratch directory (for high-performance working storage within the HPC environment), and a prioritised share of access to HPC (i.e. compute/analysis) resources. Projects also have an owner and a team, where each member of the team will have full access to the contents of the project’s storage.

A Dataset will include a dataset directory only, no scratch storage, no computing/analysis resources. Datasets have an owner/custodian and a team of contributors, where each member of the team will have full access to the contents of the dataset’s storage. Datasets also have read-only access, either to a defined group of individuals or for all AgResearch users. Through this read-only access we can build well-known reference collections and make data more discoverable.

Research Data

Accessing and support for the eResearch Infrastructure

The Compute Environment

New questions for us to answer