Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Project

Expand
titleWhat is this project about and why are we doing it?

This project is about delivering fit-for-purpose storage and computing infrastructure. This will be delivered by deploying new hardware, software, and expertise to create: 

  1. A central storage space for all AgResearch’s research data. 

    • Replacing our aging legacy High Performance Computing (HPC) with a more performant compute environment 

  2. Provide application support to researchers through our partnership with the talented people at NeSI (New Zealand eScience Infrastructure) 

Relevant Gateway articles: 

https://agresearchnz.sharepoint.com/sites/Gateway/SitePages/eResearch-platform-update.aspx  

https://agresearchnz.sharepoint.com/sites/Gateway/SitePages/Green-light-for-eResearch---our-first-Enabling-Platform.aspx  

Expand
titleWhat is the difference between AgResearch’s eResearch Platform and the eResearch Infrastructure?

eResearch enabling platform has four objectives to improve AgResearch’s eResearch capability. One of those objectives is to provide fit-for-purpose storage and computing infrastructure.  

A central storage space will help with: 

  • Data discoverability - the central storage space will work with our new Outputs Management System (OMS) to ensure all research data are catalogued, hence assisting with data discoverability. 

  • Data organisation - this will make it easier to locate and access data. Additionally, it will ensure that data is stored in a consistent and secure manner. 

  • Data protection - there will be robust systems in place for backup and recovery thus protecting valuable research data from loss, corruption, and malware. 

  • Collaboration - the system will allow us to give access to our collaborators, thus enabling researchers to collaborate on projects more easily.

The HPC will enable our researchers to: 

  • Processing large data sets, thus enabling them to analyse and visualize their data in a timely manner.  

  • Developing, training and scaling up models in Machine Learning and Deep Learning. 

  • Modelling and simulating complex systems 

  • Collaboration: The new system will enable our collaborators to also access the data and compute system. This is especially important because most of our research now is moving towards transdisciplinary fields and we will need expertise from a range of areas to to tackle complex problems.  

What’s changing

Expand
titleAs we start using the eResearch Infrastructure, how will me and my research be impacted?

If you work with research data at AgResearch things will be changing for you.  

Data will be stored in a different place 

As mentioned earlier, one of the main components of this project is centralising all our organisation’s research data. As your data is moved to the new infrastructure file paths (i.e. the links you use to access data) will break. To minimise the impact to you we will be doing the migration for you, communicating throughout the change and will be ready to help if you need it.

Data will be stored in ‘Projects’ or 'Datasets

...

Expand
titleI am delivering on projects with a deadline of 30 June 2023, can my data and workflow be migrated after 30 June?

Absolutely, we will work with you and plan the data and workflow migration. For time critical projects, our priority will be to minimize any disruption. This will mean you can continue on working on existing HPC infrastructure to finish your project and we will migrate your data and workflows some time well after 30 June. 

Research Data

Expand
titleWhere will my data be sitting?

Our compute and primary data storage is located within NeSI’s Flexible HPC platform at Waipapa Taumata Rau’s (University of Auckland’s) Tāmaki Data Centre in Tāmaki Makaurau, Auckland. Geographically distinct back-up copies of the data are being made on AgResearch Infrastructure at NIWA’s High Performance Computing Facility (HPCF) Data Centre at Greta Point, Wellington.  

...

Expand
titleWill my collaborators be able to get access to this data?

Absolutely. Globus is our data sharing tool of choice, a new version will be deployed to support sharing. 

Accessing and support for the eResearch Infrastructure

Expand
titleWill projects be charged to use the eResearch Infrastructure?

We are not intending to charge for use, it will be treated as overhead. 

We do intend to account for all use, both compute and storage to build a picture of how the infrastructure is utilized and by whom. 

The one caveat is around fair use, if there is going to be a significantly large request for resource we may ask for a capex contribution to extend capacity. Ideally these sorts of requests go through the eResearch Platform Advisory service so they can be picked up before funding has been allocated. 

We will support standard growth for expansions so standard growth can be addressed, once we get data to forecast how we are tracking. We are hopeful 3PB will give us a good starting point for the storage infrastructure. 

While we will not be charging our internal users/projects for using the infrastructure, if you would like to pass some/all costs for use of the infrastructure through to your external customers we will have a mechanism for understanding usage and will be developing pricing as the need arises.  

...

Expand
titleHow do I get help?

The eResearch Infrastructure is supported by a Collaborative Support Desk populated with experts from AgResearch and NeSI. Access to this support is via an email to support@cloud.nesi.org.nz or via the support portal here

We know that you are already used to contacting AgResearch’s Support Desk and we have channels open with them so if your ticket lands there, the right people will still get it. 

Comprehensive support documentation for the eResearch Infrastructure will be developed before the infrastructure goes live and will be made available from eResearch Platform’s Intranet site. 

The Compute Environment

Expand
titleWill I still be able to use Conda to install some tools in the new environment?

Yes! We will have Conda and Apptainer (a version of Singularity) for this sort of work. If there is some other approach you’d like to use please get in touch so we can understand your needs a little better.

Expand
titleMy work is urgent/has commercial deadlines, how will the new platform support that?

The general compute will be managed via Fairshare (here is an explanation of how this runs on NeSI at the moment). We are aware of various workflows for which this approach may not provide sufficient service level guarantees, e.g., where there are urgent deadlines. Fortunately the Slurm scheduler has various mechanisms with which to support these requirements (e.g. reservations or quality-of-service) alongside ensuring fair sharing of resources in the general case. We will work with users to find the most appropriate approach as we migrate workflows to the new platform.

New questions for us to answer

Expand
titleWhat is the difference between the eResearch Infrastructure’s HPC cluster and NeSI’s National HPC platforms? When should I use one or other?

The eRI HPC cluster is hosted by NeSI, but is otherwise independent. That is, the eRI HPC cluster is AgResearch’s own infrastructure. In contrast, the HPC clusters such as the Mahuika are hosted and owned by NeSI and are the existing national HPC service, separate from AgResearch’s eRI. More technical details can be found here Differences in Platforms

image-20240503-022116.pngImage Added

Expand
titleIs the eResearch Infrastructure as secure as our existing eResearch options?

(To be developed)

Expand
titleWhat is Open OnDemand?

https://ondemand.eri.agresearch.co.nz/ is the portal for the eResearch Infrastructure accessible via web browser.