...
The Project
Expand | ||
---|---|---|
| ||
This project is about delivering fit-for-purpose storage and computing infrastructure. This will be delivered by deploying new hardware, software, and expertise to create:
Relevant Gateway articles: https://agresearchnz.sharepoint.com/sites/Gateway/SitePages/eResearch-platform-update.aspx |
What is this project about and why are we doing it?
This project is about delivering fit-for-purpose storage and computing infrastructure. This will be delivered by deploying new hardware, software, and expertise to create:
A central storage space for all AgResearch’s research data.
Replacing our aging legacy High Performance Computing (HPC) with a more performant compute environment
Provide application support to researchers through our partnership with the talented people at NeSI (New Zealand eScience Infrastructure)
Relevant Gateway articles:
https://agresearchnz.sharepoint.com/sites/Gateway/SitePages/eResearch-platform-update.aspx
Expand | ||
---|---|---|
|
...
eResearch enabling platform has four objectives to improve AgResearch’s eResearch capability. One of those objectives is to provide fit-for-purpose storage and computing infrastructure. A central storage space will help with:
The HPC will enable our researchers to:
|
What’s changing
Expand | ||
---|---|---|
|
...
If you work with research data at AgResearch things will be changing for you. Data will be stored in a different place As mentioned earlier, one of the main components of this project is centralising all our organisation’s research data. As your data is moved to the new infrastructure file paths (i.e. the links you use to access data) will break. To minimise the impact to you we will be doing the migration for you, communicating throughout the change and will be ready to help if you need it. Data will be stored in ‘Projects’ or 'Datasets |
Expand | ||
---|---|---|
| ||
Projects and Datasets are the first two logical resources provided from the eResearch Infrastructure. Both concepts have well-defined ownership and access models, as well as lifecycle, which will help to ensure a basic consistent level of data-management is maintained and that our infrastructure is used efficiently. |
Expand | |||
---|---|---|---|
| |||
Both Projects and Datasets provide access to central storage on the eResearch Infrastructure. Projects are intended for active/ongoing work whereas Datasets are to enable collaboration on, sharing of, and reference to research data. A single research activity/project might require both a Project and one or more Datasets on the eResearch Infrastructure. At the end of a research activity, a final step might involve turning the Project into a Dataset for archive (after some tidy up and additional description work). With your help we aim to develop best-practice guidance for different types of work over time. A Project will include a project directory, a scratch directory (for high-performance working storage within the HPC environment), and a prioritised share of access to HPC (i.e. compute/analysis) resources. Projects also have an owner and a team, where each member of the team will have full access to the contents of the project’s storage. A Dataset will include a dataset directory only, no scratch storage, no computing/analysis resources. Datasets have an owner/custodian and a team of contributors, where each member of the team will have full access to the contents of the dataset’s storage. Datasets also have read-only access, either to a defined group of individuals or for all AgResearch users. Through this read-only access we can build well-known reference collections and make data more discoverable. |
When will I be able to use the infrastructure?
So far all the necessary hardware has been delivered, the storage service has been developed and we are starting to migrate the many petabytes of AgResearch research data from the existing legacy systems over to the new infrastructure.
We are starting the migration with our archive datasets in order to relieve pressure on the legacy infrastructure while we wait for the compute service to become available. We will be in contact with you directly if/when your data is being migrated.
Our current service development focus is on getting the compute service up and running. As an implementation team we are aiming to have an early version of this working with a couple of friendly test users by the end of Q1, 2023.
Once we have the compute service up and running we will be ready to move your active datasets (i.e. those you are using on a day-to-day for your research) and start having you run standard compute jobs (HPC jobs via slurm).
While we don’t have specific dates for each group in terms of a migration plan we will give your warning when you are about to be migrated then we will work with you to get you over to and up-and-running on the new kit.
After our standard storage and compute services are sorted, we will start on implementation on areas such as Globus for Data Transfer, Jupyter Notebooks, and Virtual Environments.
Throughout all of this, if you are concerned about timelines, need to be migrated earlier or later for some reason or just want to learn more about what is happening we encourage you to reach out to the project team.
Expand | ||
---|---|---|
|
...
Absolutely, we will work with you and plan the data and workflow migration. For time critical projects, our priority will be to minimize any disruption. This will mean you can continue on working on existing HPC infrastructure to finish your project and we will migrate your data and workflows some time well after 30 June. |
Research Data
Expand | ||
---|---|---|
|
...
Our compute and primary data storage is located within NeSI’s Flexible HPC platform at Waipapa Taumata Rau’s (University of Auckland’s) Tāmaki Data Centre in Tāmaki Makaurau, Auckland. Geographically distinct back-up copies of the data are being made on AgResearch Infrastructure at NIWA’s High Performance Computing Facility (HPCF) Data Centre at Greta Point, Wellington. |
Expand | ||
---|---|---|
|
...
As mentioned above, all research data will be stored within New Zealand to aid AgResearch in complying with the Māori Data Sovereignty requirements. The OMS metadata will flag datasets that contain Māori Data Sovereignty concerns, ensuring that these datasets will not be accessible outside of New Zealand or made available as open data. |
Expand | ||
---|---|---|
|
...
We know that research data storage has not always been straightforward at AgResearch and as we move to the new infrastructure we are aiming to get all of our research data into one, safe, secure space. If you have digital research data hiding somewhere cringeworthy (on the desktop, box of hard drives in the corner of the office, personal memory stick... etc, etc) no judgement, please just get in touch and we will work with you to get the data stored safely on the new infrastructure. |
Expand | ||
---|---|---|
|
...
We get it, it can be confusing trying to navigate the many storage solutions available at AgResearch, and what to use each system for. The following guidelines can help with this decision making process:
|
...
For non-HPC users the research data should be stored in Shared/Network drives (as it is done now). These drives will be pointing to data stored in the central storage space.
|
Expand | ||
---|---|---|
|
...
We are not currently open to taking new projects/datasets but when we are there will be a short form to fill out. If your request is within default amounts (we will work with users to understand what these should be) it will be automatically provisioned and access granted. Where users ask for particularly large amounts or non-standard services we will connect with you directly so that we understand your needs before getting that request finalised. |
Expand | ||
---|---|---|
|
...
Absolutely. Globus is our data sharing tool of choice, a new version will be deployed to support sharing. |
Accessing and support for the eResearch Infrastructure
Expand | ||
---|---|---|
|
...
We are not intending to charge for use, it will be treated as overhead. We do intend to account for all use, both compute and storage to build a picture of how the infrastructure is utilized and by whom. The one caveat is around fair use, if there is going to be a significantly large request for resource we may ask for a capex contribution to extend capacity. Ideally these sorts of requests go through the eResearch Platform Advisory service so they can be picked up before funding has been allocated. We will support standard growth for expansions so standard growth can be addressed, once we get data to forecast how we are tracking. We are hopeful 3PB will give us a good starting point for the storage infrastructure. While we will not be charging our internal users/projects for using the infrastructure, if you would like to pass some/all costs for use of the infrastructure through to your external customers we will have a mechanism for understanding usage and will be developing pricing as the need arises. |
Expand | ||
---|---|---|
|
...
Before migrating users' data and workflow, we will hold on-boarding workshops for existing HPC users to familiarize them with the new environment and their support team. This will ensure a smooth transition to the new infrastructure. One of the important goals of the eResearch Platform is to equip researchers with the necessary skills to conduct their research effectively. With HPC skills becoming increasingly important, we will provide basic and advanced HPC training workshops for researchers who need to develop these skills. Additional information about these workshops will be available closer to the scheduled date. |
...
Expand | ||
---|---|---|
|
...
The eResearch Infrastructure is supported by a Collaborative Support Desk populated with experts from AgResearch and NeSI. Access to this support is via an email to support@cloud.nesi.org.nz or via the support portal here. We know that you are already used to contacting AgResearch’s Support Desk and we have channels open with them so if your ticket lands there, the right people will still get it. Comprehensive support documentation for the eResearch Infrastructure will be developed before the infrastructure goes live and will be made available from eResearch Platform’s Intranet site. |
The Compute Environment
Expand | ||
---|---|---|
| ||
Yes! We will have Conda and Apptainer (a version of Singularity) for this sort of work. If there is some other approach you’d like to use please get in touch so we can understand your needs a little better. |
Expand | ||
---|---|---|
| ||
The general compute will be managed via Fairshare (here is an explanation of how this runs on NeSI at the moment). We are aware of various workflows for which this approach may not provide sufficient service level guarantees, e.g., where there are urgent deadlines. Fortunately the Slurm scheduler has various mechanisms with which to support these requirements (e.g. reservations or quality-of-service) alongside ensuring fair sharing of resources in the general case. We will work with users to find the most appropriate approach as we migrate workflows to the new platform. |
New questions for us to answer
Expand | ||
---|---|---|
| ||
The eRI HPC cluster is hosted by NeSI, but is otherwise independent. That is, the eRI HPC cluster is AgResearch’s own infrastructure. In contrast, the HPC clusters such as the Mahuika are hosted and owned by NeSI and are the existing national HPC service, separate from AgResearch’s eRI. More technical details can be found here Differences in Platforms |
Expand | ||
---|---|---|
| ||
(To be developed) |
Expand | ||
---|---|---|
| ||
https://ondemand.eri.agresearch.co.nz/ is the portal for the eResearch Infrastructure accessible via web browser. |