Getting started on AgResearch eRI

Work in Progress: active data transfer/staging is still happening in the background. Hence, not all of your datasets will have read/write access at this point in time.
We are referring to this process as the “cutover process”. Please raise a request to clarify the status for your dataset.

Getting started

When making the move over to eRI, there is some information you can provide or have thought about to help smooth the process.

Logging in

  • OPTIONAL: If working from outside the AgResearch local network, connect to AgR VPN
    (or Inscrutable or Iramohio first: e.g. connect via a Terminal application:
    ssh inscrutable.agresearch.co.nz)
    See also Connecting to the eRI compute cluster from Windows
    Once logged on, continue to

  • Connect to an eRI login node:
    ssh username@agresearch.co.nz@login-0.eri.agresearch.co.nz
    (or login-1) enter your password if prompted

  • List the contents of your sandbox project folder:
    ls -la /agr/persist/projects/XXXX-abc_defghijklm

  • List the contents of your scratch folder:
    ls -la /agr/scratch/projects/XXXX-abc_defghijklm

  • Change into the scratch location:
    cd /agr/scratch/projects/XXXX-abc_defghijklm

  • More technical details are given here on how to connect to the AgResearch HPC from your computer.

Accessing storage resources

Every AgResearch user can access the storage. Some datasets have tighter access restrictions.
Please raise a request to clarify the access permissions for your dataset.

See Accessing eRI storage

What is the difference between a dataset and a project provisioning type?

Both types are filesets on the filesystem (storage).
The dataset consists of one fileset in /tdc_persist/datasets/.
The project consists of two filesets, one in /tdc_persist/projects/ and the other in /tdc_scratch/projects/.

On AgR eRI login or compute nodes

Symbolic links are:

/agr/projects
/agr/datasets
/agr/scratch

Links are:
/tdc_persist/projects
/tdc_persist/datasets
/tdc_scratch/projects
Mount points are:

/mnt/gpfs/persist/ for both datasets and projects and
/mnt/gpfs/scratch/ for projects temp. working space

See also How to access the AgResearch eRI compute cluster

Locating legacy data on eRI

In order to ease the transition from HPC to eRI, a legacy link farm has been created, with symbolic links from the legacy paths to the new locations. This is mounted on all eRI nodes (login and compute).

This addresses the problem of the very many legacy scripts which have hardcoded paths to the legacy data locations.

active -> /agr/persist/projects/{uniqueId}/active
scratch -> /agr/persist/projects/{uniqueId}/scratch
itmp -> /agr/scratch/projects/{uniqueId}
archive -> /agr/persist/datasets/{uniqueId}

Example:

login-0$ ls -l /dataset/blastdata/ total 2 lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 43 Jul 18 16:29 active -> /agr/persist/projects/2002-blastdata/active lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 44 Jul 18 16:29 scratch -> /agr/persist/projects/2002-blastdata/scratch lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 36 Jul 18 16:29 itmp -> /agr/scratch/projects/2002-blastdata lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 36 Jul 18 16:29 archive -> /agr/persist/datasets/2002-blastdata login-0$ ls -l /dataset/blastdata/active/ 454temp hs_faa_bak plant.protein.faa.psi swissprot.00.pnd temp UniVec.nhr agall.seq.exists junkx1 plant.protein.faa.psq swissprot.00.pni tigr UniVec.nin agfilter1.pl ln_mirror.sh plant.rna.fna.nhr swissprot.00.pog Tobacco_MF_assembl.exists UniVec.nsd bgi_sheep ln_mirror.sh.bu1 plant.rna.fna.nin swissprot.00.ppd Tobacco_MF_assembly.fa.exists UniVec.nsi blastdata.exists log plant.rna.fna.nnd swissprot.00.ppi UMD3_OA_v1 UniVec.nsq BTA_OA_ver.2.exists merged.dmp plant.rna.fna.nni swissprot.00.psd uniprot_kb.fa UniVec.prop bt_faa_bak mirror plant.rna.fna.nsd swissprot.00.psi uniprot_kb.fasta.exists unpack.sh citations.dmp names.dmp plant.rna.fna.nsi swissprot.00.psq uniprot_sprot.fa.exists vector.fa cs08.seq.exists nodes.dmp plant.rna.fna.nsq swissprot.pal uniprot_swissprot.fasta.exists vector.nhr delnodes.dmp OA_chromosomes_ver.1.0 public_readonly taxbti.bti.bak UniVec vector.nin division.dmp OAR_chromosomes_ver.1.0.exists readme.txt taxdb.btd UniVec_Core.exists vector.nnd est.exists OARv3.0_masked_with_SNPs_and_indels.exists reorg1.sh.bu1 taxdb.btd.bu1 UniVec_Core.fa.exists vector.nni gc.prt obsolete_deleteafter01112008 riceensembl taxdb.bti UniVec_Core.nhr vector.nsd gencode.dmp plant.protein.faa.phr sheep_chr_OAR.exists taxdb.bti.bak UniVec_Core.nin vector.nsi geneious_blast_template plant.protein.faa.pin sheep.v3.0.14th.final.fa.exists taxdb.bti.bu1 UniVec_Core.nsd vector.nsq geneious_blast_template.0 plant.protein.faa.pnd species taxdb.tar.gz UniVec_Core.nsi vector.tar.gz geneious_blast_testdb plant.protein.faa.pni stampfiles taxdb.tar.gz.1 UniVec_Core.nsq Wrightson_ESTs.exists gi plant.protein.faa.pog swissprot.00.phr taxdb.tar.gz.2 UniVec_Core.prop hg17 plant.protein.faa.psd swissprot.00.pin taxdump.tar.gz UniVec.exists

From a Windows client

S: \\storage.eri.agresearch.co.nz\datasets s drive
M: \\storage.eri.agresearch.co.nz\projects m drive

For standard workstations, a Group Policy will automatically map these with drive letters as above.

It can also be accessed as network location using the options “Add a network location”, “Quick access”, or “Map network drive…”. in the Windows File Explorer while connected to the AgResearch LAN (or via VPN).

 

Data Recovery

Weekly snapshots of the scratch directory are stored (4 weeks total)

/mnt/gpfs/scratch/.snapshots

with 4 subdirs like scratch@GMT-2024.10.04-09.00.50

So the latest is  /mnt/gpfs/scratch/.snapshots/scratch@GMT-2024.10.04-09.00.50/projects

login-0 ~ $ ll /mnt/gpfs/scratch/.snapshots total 2 drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.09.13-10.00.48 drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.09.20-10.00.49 drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.09.27-10.00.50 drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.10.04-09.00.50

 

example: file.txt is lost from my projects scratch directory can be search for with…..
ls -l /mnt/gpfs/scratch/.snapshots/scratch@GMT-2024.10.04-09.00.50/projects/<project_id>/
and then copied back to where it is required
cp /mnt/gpfs/scratch/.snapshots/scratch@GMT-2024.10.04-09.00.50/projects/<project_id>/file.txt /where_i_need_it/

Accessing compute resources

Compute access is currently handled via a service request.

How to access the AgResearch eRI compute cluster

Accessing Open Ondemand (OOD)

Browser Access - https://ondemand.eri.agresearch.co.nz/

Login - one of several formats will work for the login

first.last@agresearch.co.nz

userid@agresearch.co.nz THIS IS MOST LIKELY TO WORK

agresearch\userid

Password - your AgResearch password

Authentication

How to get help?

How do I get help?