Difference between revisions of "Dataset de-identification levels"

From ECRIN-MDR Wiki
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
These categories indicates the identifiers present in the data (whether or not pseudonymising keys are present) and thus gives an indication of the level of '''de-identification'''. It is included to clarify the degree of further processing that might be required. The possible values are:
+
<p style="color:blue; text-align:right"><small>'''''Last updated: 19/04/2022'''''</small></p>
 +
These categories indicates the level of de-identification that has been applied. The possible values are:
 
<br>
 
<br>
 
{| class="wikitable" style="width: 95%;"
 
{| class="wikitable" style="width: 95%;"
Line 6: Line 7:
  
 
|- style="vertical-align:top;"
 
|- style="vertical-align:top;"
| 1|| None|| style="padding-bottom:10px;" | A dataset with no direct or indirect identifiers. Would be rare as scientific utility is likely to be severely affected, but could be a subset of data used for a particular purpose. || style="text-align:center;" | ECRIN
+
| 0 || Not known || style="padding-bottom:10px;" | No clear information available about the de-identification, if any, applied to the data. || style="text-align:center;" | ECRIN
  
 
|- style="vertical-align:top;"
 
|- style="vertical-align:top;"
| 2|| De-identified|| style="padding-bottom:10px;" | A dataset with no direct identifiers, and with indirect identifiers modified by established de-identification steps (e.g. amalgamation of categories, rebasing of dates, removal of text comments) so that it is no longer possible to identify any individuals within the data set. || style="text-align:center;" | ECRIN
+
| 1 || No de-identification || style="padding-bottom:10px;" | Confirmed that no de-identification measures have been applied to the data set. || style="text-align:center;" | ECRIN
  
 
|- style="vertical-align:top;"
 
|- style="vertical-align:top;"
| 3|| Has Indirect Identifiers|| style="padding-bottom:10px;" | Dataset contains no direct identifiers, but does contain data fields that when considered in combination might be used to identify some of the individuals. In some cases, access would also be required to other systems. || style="text-align:center;" | ECRIN
+
| 2 || De-identification applied || style="padding-bottom:10px;" | Some de-identification measures have been applied. Details should be described in comments and / or indicated in the linked boolean fields, or in separate documents. || style="text-align:center;" | ECRIN
  
 
|- style="vertical-align:top;"
 
|- style="vertical-align:top;"
| 4|| Has Direct Identifiers|| style="padding-bottom:10px;" | The dataset contains at least one direct identifier, i.e. a name, code, system id or other data that allow the individual to the identified unambiguously – in some cases requiring access to an additional system. This would be very rare in the context of shared data. || style="text-align:center;" | ECRIN
+
| 3 || De-identification applied, primary outcomes re-assessed || style="padding-bottom:10px;" | Some de-identification measures have been applied and are described. In addition the data has been re-analysed against the primary outcomes and the results described. || style="text-align:center;" | ECRIN
  
|- style="vertical-align:top;"
+
|}
| 9|| Comment on identifiers present|| style="padding-bottom:10px;" | Indicators or comment on identifiers present but not classifiable as one of types 1-4. Details field should be used for the comment. || style="text-align:center;" | ECRIN
+
 
 +
The categories are used alongside 5 more specific boolean data points that allows specific de-identification measures to be indicated, and a textual description that should be used to give details of the de-identification process (or reference a URL or other data object where such details can be found). The 5 specific data items are:
 +
* ''Direct Identifiers removed?''
 +
* ''US HIPAA de-identification criteria applied?''
 +
* ''Dates rebased or replaced by integers?''
 +
* ''Narrative text fields removed?''
 +
* ''k-anonymisation achieved?''
  
|- style="vertical-align:top;"
+
''' Please note: '''We recognise that obtaining information on de-identification at the level of detail described above will be unusual during the retrospective 'harvesting' of such data for the ECRIN metadata repository - at least at the moment when de-identification details, if present at all, are usually very limited. The data points are provided, however, for use ''prospectively'', by repositories and others who wish to structure such information in the future.
| 0|| Not yet known|| style="padding-bottom:10px;" | Dummy value supplied by default on entity creation. || style="text-align:center;" | ECRIN
 
|}
 

Latest revision as of 14:53, 10 May 2022

Last updated: 19/04/2022

These categories indicates the level of de-identification that has been applied. The possible values are:

id name description source
0 Not known No clear information available about the de-identification, if any, applied to the data. ECRIN
1 No de-identification Confirmed that no de-identification measures have been applied to the data set. ECRIN
2 De-identification applied Some de-identification measures have been applied. Details should be described in comments and / or indicated in the linked boolean fields, or in separate documents. ECRIN
3 De-identification applied, primary outcomes re-assessed Some de-identification measures have been applied and are described. In addition the data has been re-analysed against the primary outcomes and the results described. ECRIN

The categories are used alongside 5 more specific boolean data points that allows specific de-identification measures to be indicated, and a textual description that should be used to give details of the de-identification process (or reference a URL or other data object where such details can be found). The 5 specific data items are:

  • Direct Identifiers removed?
  • US HIPAA de-identification criteria applied?
  • Dates rebased or replaced by integers?
  • Narrative text fields removed?
  • k-anonymisation achieved?

Please note: We recognise that obtaining information on de-identification at the level of detail described above will be unusual during the retrospective 'harvesting' of such data for the ECRIN metadata repository - at least at the moment when de-identification details, if present at all, are usually very limited. The data points are provided, however, for use prospectively, by repositories and others who wish to structure such information in the future.