The format of the database for reporting metadata was defined as a simple Microsoft Excel sheet. Microsoft Excel enables categories for variables to be standardized through its “Data validation” function, which were applied to all variables. Colleagues from different institutions had to download the metadata sheet from the COMPARE share site, report the metadata into the sheet and upload the metadata sheet to the COMPARE share site thereafter. Information was not always added to the same version of the metadata sheet and the format and standardization of categories defined for the different variables were often violated due to the use of different software versions. Many working hours were therefore spent combining metadata into one sheet and validating it. Validation included simple corrections such as formatting every primary source to be spelt identical (because many computer programs distinguish between e.g. upper- and lower case letters), streamline date of sampling because this can be reported in many different formats and ensuring that at least information about time, place and source were reported by the variables: data provider, country of sampling origin, date of sampling, Salmonella serotype, travel information and whether a Salmonella case was travel related or part of an outbreak or not. Managing, sharing and storing data in Microsoft Excel files is not sustainable and risk of mistakenly introducing errors is high. Setting up the metadata in a sustainable and internet-based database format would have simplified the metadata reporting and validation to some extent. Here, sustainable refers to a database in which:
1. information can be added by different partners at the same time
2. the database can combine information added by different partners into one
version automatically
3. dropdown menus of possible categories are linked to every variable in the
metadata
4. information about date is divided into three separate variables: Year, month
and day
5. open text fields are only included if strictly necessary
6. new variables can be added if needed
7. subsets of data can be extracted easily and without having to extract the entire
metadata sheet.
8. allows the user to filter on more than one single variable
9. a unique identifier links metadata and sequence
10. data is easily available for all involved partner
11. notifications are send to partners when the database is updated
Future projects of this type are recommended to incorporate a sustainable and internet-based
database and a data-curator.
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in