Difference between revisions of "Dataset de-identification levels"
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | <p style="color:blue; text-align:right"><small>'''''Last updated: 19/04/2022'''''</small></p> | ||
These categories indicates the level of de-identification that has been applied. The possible values are: | These categories indicates the level of de-identification that has been applied. The possible values are: | ||
<br> | <br> | ||
Line 12: | Line 13: | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
− | | 2 || De- | + | | 2 || De-identification applied || style="padding-bottom:10px;" | Some de-identification measures have been applied. Details should be described in comments and / or indicated in the linked boolean fields, or in separate documents. || style="text-align:center;" | ECRIN |
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
− | | 3 || De- | + | | 3 || De-identification applied, primary outcomes re-assessed || style="padding-bottom:10px;" | Some de-identification measures have been applied and are described. In addition the data has been re-analysed against the primary outcomes and the results described. || style="text-align:center;" | ECRIN |
|} | |} | ||
+ | |||
+ | The categories are used alongside 5 more specific boolean data points that allows specific de-identification measures to be indicated, and a textual description that should be used to give details of the de-identification process (or reference a URL or other data object where such details can be found). The 5 specific data items are: | ||
+ | * ''Direct Identifiers removed?'' | ||
+ | * ''US HIPAA de-identification criteria applied?'' | ||
+ | * ''Dates rebased or replaced by integers?'' | ||
+ | * ''Narrative text fields removed?'' | ||
+ | * ''k-anonymisation achieved?'' | ||
+ | |||
+ | ''' Please note: '''We recognise that obtaining information on de-identification at the level of detail described above will be unusual during the retrospective 'harvesting' of such data for the ECRIN metadata repository - at least at the moment when de-identification details, if present at all, are usually very limited. The data points are provided, however, for use ''prospectively'', by repositories and others who wish to structure such information in the future. |
Latest revision as of 14:53, 10 May 2022
Last updated: 19/04/2022
These categories indicates the level of de-identification that has been applied. The possible values are:
id | name | description | source |
---|---|---|---|
0 | Not known | No clear information available about the de-identification, if any, applied to the data. | ECRIN |
1 | No de-identification | Confirmed that no de-identification measures have been applied to the data set. | ECRIN |
2 | De-identification applied | Some de-identification measures have been applied. Details should be described in comments and / or indicated in the linked boolean fields, or in separate documents. | ECRIN |
3 | De-identification applied, primary outcomes re-assessed | Some de-identification measures have been applied and are described. In addition the data has been re-analysed against the primary outcomes and the results described. | ECRIN |
The categories are used alongside 5 more specific boolean data points that allows specific de-identification measures to be indicated, and a textual description that should be used to give details of the de-identification process (or reference a URL or other data object where such details can be found). The 5 specific data items are:
- Direct Identifiers removed?
- US HIPAA de-identification criteria applied?
- Dates rebased or replaced by integers?
- Narrative text fields removed?
- k-anonymisation achieved?
Please note: We recognise that obtaining information on de-identification at the level of detail described above will be unusual during the retrospective 'harvesting' of such data for the ECRIN metadata repository - at least at the moment when de-identification details, if present at all, are usually very limited. The data points are provided, however, for use prospectively, by repositories and others who wish to structure such information in the future.