Difference between revisions of "Lookup Data"

From ECRIN-MDR Wiki
Jump to navigation Jump to search
(Controlled Terminology)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
<p style="color:blue; text-align:right"><small>'''''Last updated: 19/04/2022'''''</small></p>
 +
 
===Controlled Terminology===
 
===Controlled Terminology===
 
Many of the data fields in the MDR system are categorised, i.e. they are constrained to hold one of a predefined set of possible values. <br>
 
Many of the data fields in the MDR system are categorised, i.e. they are constrained to hold one of a predefined set of possible values. <br>
Line 12: Line 14:
 
&nbsp;&nbsp;&nbsp;b) When searching or filtering records in the MDR portal, they can be used to select studies or data objects of a certain type, e.g. only studies with a status of 'completed', or studies that have associated protocols.<br>
 
&nbsp;&nbsp;&nbsp;b) When searching or filtering records in the MDR portal, they can be used to select studies or data objects of a certain type, e.g. only studies with a status of 'completed', or studies that have associated protocols.<br>
 
&nbsp;&nbsp;&nbsp;c) Within direct data entry of the metadata, e.g. by a researcher using a web portal to enter metadata, they can be used as the options available for many of the attributes of studies and objects.<br>
 
&nbsp;&nbsp;&nbsp;c) Within direct data entry of the metadata, e.g. by a researcher using a web portal to enter metadata, they can be used as the options available for many of the attributes of studies and objects.<br>
In the data entry context only a subset of the available categories may be required. For instance it is more likely that the 'not known' option will not be necessary if the data is being added by a researcher familiar with a study or digital olbject, whereas it is required for data that has been harvested from an external source such as a trial registry. Conversely some data that exists in systems, such as internal dates linked to PubMed processing, will not be known to the researcher when he or she adds the metadata manually. Many of the lookup tables therrefore contain a field ('use_in_data_entry') that indicates whether or not they should be used <br>
+
In the data entry context only a subset of the available categories may be required. For instance it is more likely that the 'not known' option will not be necessary if the data is being added by a researcher familiar with a study or digital olbject, whereas it is required for data that has been harvested from an external source such as a trial registry. Conversely some data that exists in systems, such as internal dates linked to PubMed processing, will not be known to the researcher when he or she adds the metadata manually. Many of the lookup tables therefore contain a field ('use_in_data_entry') that indicates whether or not they should be used, for example in a drop down box, as one of the options available within data entry.<br>
 
<br>
 
<br>
 
====Structure of Look Up tables====
 
====Structure of Look Up tables====
The MDR system uses a collection of these ‘look up’ tables. In the context of a user interface, each lookup table would often be used as the source of a dropdown or list box that displayed the categories for search or filtering purposes.  
+
The MDR system stores the available categories in a collection of ‘look up’ tables, all belonging to the 'lup' schema of the context database.  
 
Each look up table lists the options available for that data field, and almost all have a core common structure:
 
Each look up table lists the options available for that data field, and almost all have a core common structure:
 
* An integer ''id'' field, that provides a key for each option. The database only has to store the id to represent the category being used, making storage much more efficient.
 
* An integer ''id'' field, that provides a key for each option. The database only has to store the id to represent the category being used, making storage much more efficient.
 
* A string ''name'' field, with the short name of the category, as it would appear – for instance – in a dropdown box that displayed the alternatives available.
 
* A string ''name'' field, with the short name of the category, as it would appear – for instance – in a dropdown box that displayed the alternatives available.
 
* A text ''description'' field, that provides, when necessary, a brief explanation of what the category meant.
 
* A text ''description'' field, that provides, when necessary, a brief explanation of what the category meant.
*      A boolean ''use_in_data_entry'' field, that indicates if the field should be used in a data entry context, as opposed to a data harvesting context.
+
*      A boolean ''use_in_data_entry'' field, that indicates if the field should be used in a data entry context, as opposed to a data harvesting context or filtering context.
 
* An integer ''list_order'' field, which allows the default order in a display or report (normally that of the id, as the primary key) to be over-ridden, by ordering the records according to the value of this field.
 
* An integer ''list_order'' field, which allows the default order in a display or report (normally that of the id, as the primary key) to be over-ridden, by ordering the records according to the value of this field.
 
A few look up tables have one or two additional fields but the great majority follow the pattern above.  
 
A few look up tables have one or two additional fields but the great majority follow the pattern above.  
 
<br>
 
<br>
The categories used are normally as already exist within DataCite or other key systems, augmented if necessary by ECRIN to better cover the full range of categories required for clinical research data objects. For instance the listed study types are taken from the study type classification found within the ClinicalTrials.gov trials registry.
+
The categories used are normally as already exist within DataCite (for object attributes) or ClinicalTrials.gov (for study attributes), augmented if necessary by values from a few other key systems, or by ECRIN, to better cover the full range of categories required for clinical research data objects. For instance the listed study types are based on the study type classification found within the ClinicalTrials.gov trials registry.
 
The lookup tables contain two additional ‘audit’ field:
 
The lookup tables contain two additional ‘audit’ field:
 
* A string ''source'' field, denoting the system from which the term was originally taken. Terms created by the MDR team are labelled as ‘ECRIN’
 
* A string ''source'' field, denoting the system from which the term was originally taken. Terms created by the MDR team are labelled as ‘ECRIN’
Line 49: Line 51:
 
====Listing of Look Up tables====
 
====Listing of Look Up tables====
 
At present the MDR system contains 29 look up tables, with most indicating that they refer to categorised fields by their name, usually <something>_types, occasionally _codes, categories or _classes.<br/>
 
At present the MDR system contains 29 look up tables, with most indicating that they refer to categorised fields by their name, usually <something>_types, occasionally _codes, categories or _classes.<br/>
Details on each can be found by following the links listed below.
+
The values in each of them (id, name, description, use in data entry and source) can be found by following the links listed below.
 
* [[study attributes]]  (study_types, study_statuses, gender_eligibility_types)
 
* [[study attributes]]  (study_types, study_statuses, gender_eligibility_types)
 
* [[study features]]  (feature_types and categories of phase, allocation method, masking, time perspective etc.)
 
* [[study features]]  (feature_types and categories of phase, allocation method, masking, time perspective etc.)

Latest revision as of 14:49, 10 May 2022

Last updated: 19/04/2022

Controlled Terminology

Many of the data fields in the MDR system are categorised, i.e. they are constrained to hold one of a predefined set of possible values.
For example, a clinical study may fall into one of several different main ‘types’, according to the broad methodology used:

  • Interventional
  • Observational
  • Observational [Patient Registry]
  • Expanded access
  • Funded programme

Although clinical studies are usually categorised in similar ways, different systems can use different terms for those categories. Using the MDR’s controlled terminology, studies labelled as ‘clinical trial’, ‘randomised trial’, or ‘active intervention’ would all be mapped to a study type of ‘Interventional’, while other studies, described variously as ‘off-label’, ‘compassionate use’, or ‘pre-licence’, would all be mapped to ‘Expanded Access’. The options available for any categorised data item are stored in a 'look up' table.
The controlled terminologies are used in three main contexts.
   a) When data harvesting from external sources, they are used to make the various source data points as consistent as possible, by mapping the categories found to a single set of terms, as described above.
   b) When searching or filtering records in the MDR portal, they can be used to select studies or data objects of a certain type, e.g. only studies with a status of 'completed', or studies that have associated protocols.
   c) Within direct data entry of the metadata, e.g. by a researcher using a web portal to enter metadata, they can be used as the options available for many of the attributes of studies and objects.
In the data entry context only a subset of the available categories may be required. For instance it is more likely that the 'not known' option will not be necessary if the data is being added by a researcher familiar with a study or digital olbject, whereas it is required for data that has been harvested from an external source such as a trial registry. Conversely some data that exists in systems, such as internal dates linked to PubMed processing, will not be known to the researcher when he or she adds the metadata manually. Many of the lookup tables therefore contain a field ('use_in_data_entry') that indicates whether or not they should be used, for example in a drop down box, as one of the options available within data entry.

Structure of Look Up tables

The MDR system stores the available categories in a collection of ‘look up’ tables, all belonging to the 'lup' schema of the context database. Each look up table lists the options available for that data field, and almost all have a core common structure:

  • An integer id field, that provides a key for each option. The database only has to store the id to represent the category being used, making storage much more efficient.
  • A string name field, with the short name of the category, as it would appear – for instance – in a dropdown box that displayed the alternatives available.
  • A text description field, that provides, when necessary, a brief explanation of what the category meant.
  • A boolean use_in_data_entry field, that indicates if the field should be used in a data entry context, as opposed to a data harvesting context or filtering context.
  • An integer list_order field, which allows the default order in a display or report (normally that of the id, as the primary key) to be over-ridden, by ordering the records according to the value of this field.

A few look up tables have one or two additional fields but the great majority follow the pattern above.
The categories used are normally as already exist within DataCite (for object attributes) or ClinicalTrials.gov (for study attributes), augmented if necessary by values from a few other key systems, or by ECRIN, to better cover the full range of categories required for clinical research data objects. For instance the listed study types are based on the study type classification found within the ClinicalTrials.gov trials registry. The lookup tables contain two additional ‘audit’ field:

  • A string source field, denoting the system from which the term was originally taken. Terms created by the MDR team are labelled as ‘ECRIN’
  • A date date_added field, that indicates the date a category was agreed as being required by the MDR system (though full implementation may be later). If a term has been proposed but not yet agreed this date is null.

The study_types table, below, illustrates how this structure works in the case of study categories.

id name description use_in_data_entry list_order source date_added
0 Not yet known Dummy value supplied by default on entity creation. false 99 ECRIN 2019-02-08
11 Interventional A clinical trial. true 10 ClinicalTrials.gov 2019-02-08
12 Observational Any form of non-interventional research. true 20 ClinicalTrials.gov 2019-02-08
13 Observational Patient Registry Collecting data for a designated registry. true 30 ClinicalTrials.gov 2019-02-08
14 Expanded access Off label usage of a new product for individuals. true 40 ClinicalTrials.gov 2019-02-08
15 Funded programme With a single or linked series of grants. false 50 ClinicalTrials.gov 2019-02-08

Listing of Look Up tables

At present the MDR system contains 29 look up tables, with most indicating that they refer to categorised fields by their name, usually <something>_types, occasionally _codes, categories or _classes.
The values in each of them (id, name, description, use in data entry and source) can be found by following the links listed below.