The lot of any epidemiologist

Managing the reporting and validation of metadata together with data sharing was time consuming – which is the unavoidable lot of any epidemiologist.

Like 0 Comment
Read more

The format of the database for reporting metadata was defined as a simple Microsoft Excel sheet. Microsoft Excel enables categories for variables to be standardized through its “Data validation” function, which were applied to all variables. Colleagues from different institutions had to download the metadata sheet from the COMPARE share site, report the metadata into the sheet and upload the metadata sheet to the COMPARE share site thereafter. Information was not always added to the same version of the metadata sheet and the format and standardization of categories defined for the different variables were often violated due to the use of different software versions. Many working hours were therefore spent combining metadata into one sheet and validating it. Validation included simple corrections such as formatting every primary source to be spelt identical (because many computer programs distinguish between e.g. upper- and lower case letters), streamline date of sampling because this can be reported in many different formats and ensuring that at least information about time, place and source were reported by the variables: data provider, country of sampling origin, date of sampling, Salmonella serotype, travel information and whether a Salmonella case was travel related or part of an outbreak or not. Managing, sharing and storing data in Microsoft Excel files is not sustainable and risk of mistakenly introducing errors is high. Setting up the metadata in a sustainable and internet-based database format would have simplified the metadata reporting and validation to some extent. Here, sustainable refers to a database in which:

1. information can be added by different partners at the same time

2. the database can combine information added by different partners into one
version automatically

3. dropdown menus of possible categories are linked to every variable in the
metadata

4. information about date is divided into three separate variables: Year, month
and day

5. open text fields are only included if strictly necessary

6. new variables can be added if needed

7. subsets of data can be extracted easily and without having to extract the entire
metadata sheet.

8. allows the user to filter on more than one single variable

9. a unique identifier links metadata and sequence

10. data is easily available for all involved partner

11. notifications are send to partners when the database is updated

Future projects of this type are recommended to incorporate a sustainable and internet-based
database and a data-curator.

Go to the profile of Nanna Munck

Nanna Munck

Postdoc, National Food Institute, Technical University of Denmark

No comments yet.