Dataset de-identification levels

From ECRIN-MDR Wiki
Jump to navigation Jump to search

Last updated: 19/04/2022

These categories indicates the level of de-identification that has been applied. The possible values are:

id name description source
0 Not known No clear information available about the de-identification, if any, applied to the data. ECRIN
1 No de-identification Confirmed that no de-identification measures have been applied to the data set. ECRIN
2 De-identification applied Some de-identification measures have been applied. Details should be described in comments and / or indicated in the linked boolean fields, or in separate documents. ECRIN
3 De-identification applied, primary outcomes re-assessed Some de-identification measures have been applied and are described. In addition the data has been re-analysed against the primary outcomes and the results described. ECRIN

The categories are used alongside 5 more specific boolean data points that allows specific de-identification measures to be indicated, and a textual description that should be used to give details of the de-identification process (or reference a URL or other data object where such details can be found). The 5 specific data items are:

  • Direct Identifiers removed?
  • US HIPAA de-identification criteria applied?
  • Dates rebased or replaced by integers?
  • Narrative text fields removed?
  • k-anonymisation achieved?

Please note: We recognise that obtaining information on de-identification at the level of detail described above will be unusual during the retrospective 'harvesting' of such data for the ECRIN metadata repository - at least at the moment when de-identification details, if present at all, are usually very limited. The data points are provided, however, for use prospectively, by repositories and others who wish to structure such information in the future.