Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Source: naming_conventions.md

Version: 20232024.0209.0213

The following describes AgResearch naming conventions for any eResearch project or dataset.

...

  • To be able to list all projects and datasets in a chronological order, all names MUST start with four-digit version of the current year eg. 2023.

  • All names MUST be case insensitive. Do not assume case sensitivity across operating systems eg. Project, PROJECT, project are three different names on Linux, but it is the same name on Windows.

  • The ONLY accepted character between the year and the project name/title is _ an underscore. Example 2024_myproject_title.
    Legacy records might show a hyphen instead.

Note

We are storing the name in lowercase in the projects database.

...

Code Block
([0-9]{4})([a-z0-9_-]{4,60})

Here is a sample Python code to illustrate the naming convention regular expression in action for existing datasets with shortest and longest names that currently exist in AgResearch HPC:

Code Block
    import re
    common = r"([0-9]{4})([a-zA-Z0-9_-]{4,60})"
    sample_names = [
        "2010_SVS"
        "2011-_KCCG"
        "2012_GCAM"
        "2013_Hyperspectral_Curiosity_HSI_Lamb",
        "2014-_Clostridium_butyricum_OPP0023470",
        "2015-_Rumen_Microbes_Genome_SequenceData",
    ]
    for sample_name in sample_names:
        matches = re.findall(common, sample_name)
        print(matches)

...