Source: naming_conventions.md
Version: 2024.09.13
The following notebook describes AgResearch
naming conventions for any eResearch
project
or dataset.Core needs for the naming conventions:
Requirements
All names MUST fulfill needs of their users, user groups, or
Communities of Practice
.All names MUST be interoperable across
Linux
,Windows
andMac
operating systems.All names MUST be parseable by a set of regular expressions defined in this notebook.
Common Needs
To be able to list all projects and datasets in a chronological order, all names MUST start with four-digit version of the current year eg.
2023
.2022
All names MUST be lower case insensitive. Do not assume case sensitivity across operating systems eg.
Project
,PROJECT
,project
are three different names on Linux, but it is the same name on Windows.The ONLY accepted character between the year and the project name/title is
_
an underscore. Example2024_myproject_title
.
Legacy records might show a hyphen instead.
Note |
---|
We are storing the name in lowercase in the projects database. |
All names MUST only contain lower case lowercase letters, digits, underscore and hyphen characters to achieve interoperability across the operating systems.
All names MUST have between 8 to 64 characters in total. Minimum amount of characters is derived from the current dataset names and maximum of 64 was agreed during a eRI project review on 2 February 2023.
RegExp for Common Needs
We can derive the following regular expression from the common needs:
Code Block |
---|
([0-9]{4})([a- |
...
z0-9_ |
...
] |
...
{4,60}) |
Python example: import re
common = Here is a sample Python code to illustrate the naming convention regular expression in action for existing datasets with shortest and longest names that currently exist in AgResearch HPC:
Code Block |
---|
import re common = r"([0-9]{4})([a- |
...
zA-Z0-9_-] |
...
{4,60})" |
...
sample_ |
...
names = [ "2010_SVS" "2011_KCCG" "2012_GCAM" "2013_Hyperspectral_Curiosity_HSI_Lamb", "2014_Clostridium_butyricum_OPP0023470", "2015_Rumen_Microbes_Genome_SequenceData", ] for sample_name in sample_names: matches = re.findall(common, sample_name) |
...
print(matches) |
...
Bioinformatics -
bio
ToDo
limit the names to max practical length based on the operating system constraints
year | team | leader / species
expand the regexp
...
|
Links
https://www.uab.edu/research/home/irb-irap/naming-convention-overview
https://esha.com/blog/best-practices-database-naming-conventions/
https://www.ed.ac.uk/records-management/guidance/records/practical-guidance/naming-conventions
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file
https://services.anu.edu.au/files/document-collection/ERMS-Naming-Conventions-v2.pdf