Accessing eRI storage

What is the difference between a dataset and a project provisioning type?

Both types are filesets on the filesystem (storage).
The dataset consists of one fileset in /tdc_persist/datasets/.
The project consists of two filesets, one in /tdc_persist/projects/ and the other in /tdc_scratch/projects/.

On AgR eRI login or compute nodes

Symbolic links are:

/agr/projects
/agr/datasets
/agr/scratch

Links are:
/tdc_persist/projects
/tdc_persist/datasets
/tdc_scratch/projects
Mount points are:

/mnt/gpfs/persist/ for both datasets and projects and
/mnt/gpfs/scratch/ for projects temp. working space

See also How to access the AgResearch eRI compute cluster

Locating legacy data on eRI

In order to ease the transition from HPC to eRI, a legacy link farm has been created, with symbolic links from the legacy paths to the new locations. This is mounted on all eRI nodes (login and compute).

This addresses the problem of the very many legacy scripts which have hardcoded paths to the legacy data locations.

active -> /agr/persist/projects/{uniqueId}/active
scratch -> /agr/persist/projects/{uniqueId}/scratch
itmp -> /agr/scratch/projects/{uniqueId}
archive -> /agr/persist/datasets/{uniqueId}

Example:

login-0$ ls -l /dataset/blastdata/ total 2 lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 43 Jul 18 16:29 active -> /agr/persist/projects/2002-blastdata/active lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 44 Jul 18 16:29 scratch -> /agr/persist/projects/2002-blastdata/scratch lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 36 Jul 18 16:29 itmp -> /agr/scratch/projects/2002-blastdata lrwxrwxrwx. 1 eri_migration@iam.flexi.nesi.org.nz eri_migration@iam.flexi.nesi.org.nz 36 Jul 18 16:29 archive -> /agr/persist/datasets/2002-blastdata login-0$ ls -l /dataset/blastdata/active/ 454temp hs_faa_bak plant.protein.faa.psi swissprot.00.pnd temp UniVec.nhr agall.seq.exists junkx1 plant.protein.faa.psq swissprot.00.pni tigr UniVec.nin agfilter1.pl ln_mirror.sh plant.rna.fna.nhr swissprot.00.pog Tobacco_MF_assembl.exists UniVec.nsd bgi_sheep ln_mirror.sh.bu1 plant.rna.fna.nin swissprot.00.ppd Tobacco_MF_assembly.fa.exists UniVec.nsi blastdata.exists log plant.rna.fna.nnd swissprot.00.ppi UMD3_OA_v1 UniVec.nsq BTA_OA_ver.2.exists merged.dmp plant.rna.fna.nni swissprot.00.psd uniprot_kb.fa UniVec.prop bt_faa_bak mirror plant.rna.fna.nsd swissprot.00.psi uniprot_kb.fasta.exists unpack.sh citations.dmp names.dmp plant.rna.fna.nsi swissprot.00.psq uniprot_sprot.fa.exists vector.fa cs08.seq.exists nodes.dmp plant.rna.fna.nsq swissprot.pal uniprot_swissprot.fasta.exists vector.nhr delnodes.dmp OA_chromosomes_ver.1.0 public_readonly taxbti.bti.bak UniVec vector.nin division.dmp OAR_chromosomes_ver.1.0.exists readme.txt taxdb.btd UniVec_Core.exists vector.nnd est.exists OARv3.0_masked_with_SNPs_and_indels.exists reorg1.sh.bu1 taxdb.btd.bu1 UniVec_Core.fa.exists vector.nni gc.prt obsolete_deleteafter01112008 riceensembl taxdb.bti UniVec_Core.nhr vector.nsd gencode.dmp plant.protein.faa.phr sheep_chr_OAR.exists taxdb.bti.bak UniVec_Core.nin vector.nsi geneious_blast_template plant.protein.faa.pin sheep.v3.0.14th.final.fa.exists taxdb.bti.bu1 UniVec_Core.nsd vector.nsq geneious_blast_template.0 plant.protein.faa.pnd species taxdb.tar.gz UniVec_Core.nsi vector.tar.gz geneious_blast_testdb plant.protein.faa.pni stampfiles taxdb.tar.gz.1 UniVec_Core.nsq Wrightson_ESTs.exists gi plant.protein.faa.pog swissprot.00.phr taxdb.tar.gz.2 UniVec_Core.prop hg17 plant.protein.faa.psd swissprot.00.pin taxdump.tar.gz UniVec.exists

From a Windows client

S: \\storage.eri.agresearch.co.nz\datasets s drive
M: \\storage.eri.agresearch.co.nz\projects m drive

For standard workstations, a Group Policy will automatically map these with drive letters as above.

It can also be accessed as network location using the options “Add a network location”, “Quick access”, or “Map network drive…”. in the Windows File Explorer while connected to the AgResearch LAN (or via VPN).

 

Data Recovery

Weekly snapshots of the scratch directory are stored (4 weeks total)

/mnt/gpfs/scratch/.snapshots

with 4 subdirs like scratch@GMT-2024.10.04-09.00.50

So the latest is  /mnt/gpfs/scratch/.snapshots/scratch@GMT-2024.10.04-09.00.50/projects

login-0 ~ $ ll /mnt/gpfs/scratch/.snapshots total 2 drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.09.13-10.00.48 drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.09.20-10.00.49 drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.09.27-10.00.50 drwxr-xr-x. 8 root root 4096 Feb 19 2024 scratch@GMT-2024.10.04-09.00.50


example: file.txt is lost from my projects scratch directory can be search for with…..
ls -l /mnt/gpfs/scratch/.snapshots/scratch@GMT-2024.10.04-09.00.50/projects/<project_id>/
and then copied back to where it is required
cp /mnt/gpfs/scratch/.snapshots/scratch@GMT-2024.10.04-09.00.50/projects/<project_id>/file.txt /where_i_need_it/

Related articles