Getting started on AgResearch eRI
Work in Progress: active data transfer/staging is still happening in the background. Hence, not all of your datasets will have read/write access at this point in time.
We are referring to this process as the “cutover process”. Please raise a request to clarify the status for your dataset.
Getting started
When making the move over to eRI, there is some information you can provide or have thought about to help smooth the process.
New AgResearch staff (or those who have not used the existing, legacy HPC) planning to utilise the compute resources, will need a username created and home directory (/home/agresearch.co.nz/username) provisioned.
Do this by emailing NeSI’s support desk at support@cloud.nesi.org.nz.
or via the eRI Support Portal
Do you have legacy data that needs to move (be cutover)? this could take some time and requires activation by AgResearch.
Background: Most data has been transferred from the legacy HPC to the new eRI HPC already. Any data that has not been migrated will not readable/writeable until it has been cut over. This ensures both data sets remain in sync.
Please make a request to support@cloud.nesi.org.nz
Do you need a new dataset/project (with compute allocation) created or will you be accessing another workspace?
If you are unsure what the difference between a dataset and a project is, check out the frequently asked questions page for more information.
Go to ColdFront at https://coldfront.eri.agresearch.co.nz/, a self-service tool that allows you to set up a project, apply for a share of compute resources and perform administrative tasks such as managing other project members and checking the progress of the project all in one place. For more information about this tool and how to connect to it, click here.
Important: when setting up a new project, you need to give it a title following the format yyyy-nameofproject, for example, 2077-micetrial.
If you experience difficulty getting set up with ColdFront, you can reach out for more support by emailing NeSI’s support desk.
How do I know if there is already data or a project?
What software/version might you need?
If you are unsure what software is available, please ask and we can help build and/or make it available.
Finally, NeSI also offers regular online office hour sessions, hosted via Zoom. These sessions are open to anyone - you don't need to be an existing NeSI user. Information on when the next office hour will be hosted is here.
Logging in
OPTIONAL: If working from outside the AgResearch local network, connect to AgR VPN
(or Inscrutable or Iramohio first: e.g. connect via a Terminal application:ssh inscrutable.agresearch.co.nz
)
See also Connecting to the eRI compute cluster from Windows
Once logged on, continue toConnect to an eRI login node:
ssh username@agresearch.co.nz@login-0.eri.agresearch.co.nz
(orlogin-1
) enter your password if promptedList the contents of your sandbox project folder:
ls -la /agr/persist/projects/XXXX-abc_defghijklm
List the contents of your scratch folder:
ls -la /agr/scratch/projects/XXXX-abc_defghijklm
Change into the scratch location:
cd /agr/scratch/projects/XXXX-abc_defghijklm
More technical details are given here on how to connect to the AgResearch HPC from your computer.
Accessing storage resources
Every AgResearch user can access the storage. Some datasets have tighter access restrictions.
Please raise a request to clarify the access permissions for your dataset.
What is the difference between a dataset and a project provisioning type?
Both types are filesets on the filesystem (storage).
The dataset consists of one fileset in /tdc_persist/datasets/
.
The project consists of two filesets, one in /tdc_persist/projects/
and the other in /tdc_scratch/projects/
.
On AgR eRI login or compute nodes
Symbolic links are:
/agr/projects
/agr/datasets
/agr/scratch
Links are:/tdc_persist/projects
/tdc_persist/datasets
/tdc_scratch/projects
Mount points are:
/mnt/gpfs/persist/
for both datasets and projects and /mnt/gpfs/scratch/
for projects temp. working space
See also How to access the AgResearch eRI compute cluster
Locating legacy data on eRI
In order to ease the transition from HPC to eRI, a legacy link farm has been created, with symbolic links from the legacy paths to the new locations. This is mounted on all eRI nodes (login and compute).
This addresses the problem of the very many legacy scripts which have hardcoded paths to the legacy data locations.
active -> /agr/persist/projects/{uniqueId}/active
scratch -> /agr/persist/projects/{uniqueId}/scratch
itmp -> /agr/scratch/projects/{uniqueId}
archive -> /agr/persist/datasets/{uniqueId}
Example:
login-0$ ls -l /dataset/blastdata/
total 2
lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 43 Jul 18 16:29 active -> /agr/persist/projects/2002-blastdata/active
lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 44 Jul 18 16:29 scratch -> /agr/persist/projects/2002-blastdata/scratch
lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 36 Jul 18 16:29 itmp -> /agr/scratch/projects/2002-blastdata
lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 36 Jul 18 16:29 archive -> /agr/persist/datasets/2002-blastdata
login-0$ ls -l /dataset/blastdata/active/
454temp hs_faa_bak plant.protein.faa.psi swissprot.00.pnd temp UniVec.nhr
agall.seq.exists junkx1 plant.protein.faa.psq swissprot.00.pni tigr UniVec.nin
agfilter1.pl ln_mirror.sh plant.rna.fna.nhr swissprot.00.pog Tobacco_MF_assembl.exists UniVec.nsd
bgi_sheep ln_mirror.sh.bu1 plant.rna.fna.nin swissprot.00.ppd Tobacco_MF_assembly.fa.exists UniVec.nsi
blastdata.exists log plant.rna.fna.nnd swissprot.00.ppi UMD3_OA_v1 UniVec.nsq
BTA_OA_ver.2.exists merged.dmp plant.rna.fna.nni swissprot.00.psd uniprot_kb.fa UniVec.prop
bt_faa_bak mirror plant.rna.fna.nsd swissprot.00.psi uniprot_kb.fasta.exists unpack.sh
citations.dmp names.dmp plant.rna.fna.nsi swissprot.00.psq uniprot_sprot.fa.exists vector.fa
cs08.seq.exists nodes.dmp plant.rna.fna.nsq swissprot.pal uniprot_swissprot.fasta.exists vector.nhr
delnodes.dmp OA_chromosomes_ver.1.0 public_readonly taxbti.bti.bak UniVec vector.nin
division.dmp OAR_chromosomes_ver.1.0.exists readme.txt taxdb.btd UniVec_Core.exists vector.nnd
est.exists OARv3.0_masked_with_SNPs_and_indels.exists reorg1.sh.bu1 taxdb.btd.bu1 UniVec_Core.fa.exists vector.nni
gc.prt obsolete_deleteafter01112008 riceensembl taxdb.bti UniVec_Core.nhr vector.nsd
gencode.dmp plant.protein.faa.phr sheep_chr_OAR.exists taxdb.bti.bak UniVec_Core.nin vector.nsi
geneious_blast_template plant.protein.faa.pin sheep.v3.0.14th.final.fa.exists taxdb.bti.bu1 UniVec_Core.nsd vector.nsq
geneious_blast_template.0 plant.protein.faa.pnd species taxdb.tar.gz UniVec_Core.nsi vector.tar.gz
geneious_blast_testdb plant.protein.faa.pni stampfiles taxdb.tar.gz.1 UniVec_Core.nsq Wrightson_ESTs.exists
gi plant.protein.faa.pog swissprot.00.phr taxdb.tar.gz.2 UniVec_Core.prop
hg17 plant.protein.faa.psd swissprot.00.pin taxdump.tar.gz UniVec.exists
From a Windows client
S: \\storage.eri.agresearch.co.nz\datasets
s driveM: \\storage.eri.agresearch.co.nz\projects
m drive
For standard workstations, a Group Policy will automatically map these with drive letters as above.
It can also be accessed as network location using the options “Add a network location”, “Quick access”, or “Map network drive…”. in the Windows File Explorer while connected to the AgResearch LAN (or via VPN).
Data Recovery
Weekly snapshots of the scratch directory are stored (4 weeks total)
/mnt/gpfs/scratch/.snapshots
with 4 subdirs like scratch@GMT-2024.10.04-09.00.50
So the latest is /mnt/gpfs/scratch/.snapshots/scratch@GMT-2024.10.04-09.00.50/projects
login-0 ~ $ ll /mnt/gpfs/scratch/.snapshots
total 2
drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.09.13-10.00.48
drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.09.20-10.00.49
drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.09.27-10.00.50
drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.10.04-09.00.50
example: file.txt
is lost from my projects scratch directory can be search for with…..ls -l /mnt/gpfs/scratch/.snapshots/scratch@GMT-2024.10.04-09.00.50/projects/<project_id>/
and then copied back to where it is requiredcp /mnt/gpfs/scratch/.snapshots/scratch@GMT-2024.10.04-09.00.50/projects/<project_id>/file.txt /where_i_need_it/
Accessing compute resources
Compute access is currently handled via a service request.
How to access the AgResearch eRI compute cluster
Accessing Open Ondemand (OOD)
Browser Access - https://ondemand.eri.agresearch.co.nz/
Login - one of several formats will work for the login
userid@agresearch.co.nz THIS IS MOST LIKELY TO WORK
agresearch\userid
Password - your AgResearch password