Dataset de-identification levels
These categories indicates the level of de-identification that has been applied. The possible values are:
id | name | description | source |
---|---|---|---|
0 | Not known | No clear information available about the de-identification, if any, applied to the data. | ECRIN |
1 | No de-identification | Confirmed that no de-identification measures have been applied to the data set. | ECRIN |
2 | De-identfication applied | Some de-identification measures have been applied. Details should be described in comments and / or indicated in the linked boolean fields, or in separate documents. | ECRIN |
3 | De-identfication applied, primary outcomes re-assessed | Some de-identification measures have been applied and are described. In addition the data has been re-analysed against the primary outcomes and the results described. | ECRIN |
The categories are used alongside 5 more specific boolean data points that allows specific de-identification measures to be indicated, and a textual description that should be used to give details of the de-identification process (or reference a URL or other data object where such details can be found). The 5 specific data items are:
- Direct Identifiers removed?
- US HIPAA de-identification criteria applied?
- Dates rebased or replaced by integers?
- Narrative text fields removed?
- k-anonymisation achieved?
Please note: We realise that obtaining information on de-identification at the level of detail described above will be unusual during the retrospective 'harvesting' of such data for the ECRIN metadata repository - at least at the moment de-identification details, when present at all, is usually very limited. The data points are provided, however, for use prospectively, by repositories and others who wish to structure such information in the future.