The following notebook describes AgResearch naming conventions for any eResearch project
or dataset.
Core needs for the naming conventions:
All names MUST fulfill needs of their users, user groups, or
Communities of Practice.
All names MUST be interoperable across Linux, Windows and Mac operating systems.
All names MUST be parseable by a set of regular expressions defined in this notebook.
Common Needs
To be able to list all projects and datasets in a chronological order, all names MUST start with four digit version of the current year eg.
2022
All names MUST be case insensitive. Do not assume case sensitivity across operating systems eg.
Project, PROJECT, project
are three different names on Linux, but it is
the same name on Windows.
All names MUST only contain upper/lower case letters, digits, underscore and hyphen characters to achieve interoperability across the operating systems.
RegExp for Common Needs
We can derive the following regular expression from the common needs:
([0-9]{4})([a-zA-Z_-]+)
Python example: import re
common = r"([0-9]{4})([a-zA-Z_-]+)"
sample_name = "2022-abc_cde"
matches = re.findall(common, sample_name)
print(matches)
List of AgResearch Teams
Bioinformatics -
bio
ToDo
limit the names to max practical length based on the operating system constraints
year | team | leader / species
expand the regexp
get a list of all AgResearch teams (32+)
Links
https://www.uab.edu/research/home/irb-irap/naming-convention-overview
https://esha.com/blog/best-practices-database-naming-conventions/
https://www.ed.ac.uk/records-management/guidance/records/practical-guidance/naming-conventions
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file
https://services.anu.edu.au/files/document-collection/ERMS-Naming-Conventions-v2.pdf