Difference between revisions of "Contextual Data"
(→Introduction) |
(→Introduction) |
||
Line 3: | Line 3: | ||
* Context data: Data relating to the 'landscape' in which clinical research is carried out - organisations and people, countries and regions, systems for classifying keywords and topic etc. | * Context data: Data relating to the 'landscape' in which clinical research is carried out - organisations and people, countries and regions, systems for classifying keywords and topic etc. | ||
* Controlled terminology: Data relating to the options available for categorised questions within the MDR itself. | * Controlled terminology: Data relating to the options available for categorised questions within the MDR itself. | ||
− | Both these data types are stored in the context database, in the ctx and lup schemas respectively. | + | Both these data types are stored in the context database, in the ctx and lup (look up) schemas respectively. Postgres does not allow statements made in one database context to reference objects in another DB directly, so when processes in one of the source data databases require access to contextual data, one or both context schemas are imported into the database within a 'foreign table wrapper'. This allows the data, but not the table designs, to be manipulated. It does risk, however, cluttering up the databases with additional schema, so within the MDR such foreign table wrappers are always created on a temporary basis, when required, and are torn down again once the processing is completed (see the Managing Links between databases section in '''[[Data_Structures]]''').<br/> |
− | + | The sections below provide further details on the various types of contextual data in the system. | |
− | (see the Managing Links between | ||
===Contextual Data=== | ===Contextual Data=== |
Revision as of 15:48, 17 November 2020
Introduction
All the systems and databases within the MDR need to access so called 'contextual' data, which is of two main types:
- Context data: Data relating to the 'landscape' in which clinical research is carried out - organisations and people, countries and regions, systems for classifying keywords and topic etc.
- Controlled terminology: Data relating to the options available for categorised questions within the MDR itself.
Both these data types are stored in the context database, in the ctx and lup (look up) schemas respectively. Postgres does not allow statements made in one database context to reference objects in another DB directly, so when processes in one of the source data databases require access to contextual data, one or both context schemas are imported into the database within a 'foreign table wrapper'. This allows the data, but not the table designs, to be manipulated. It does risk, however, cluttering up the databases with additional schema, so within the MDR such foreign table wrappers are always created on a temporary basis, when required, and are torn down again once the processing is completed (see the Managing Links between databases section in Data_Structures).
The sections below provide further details on the various types of contextual data in the system.
Contextual Data
Organisation Data
Geographical Data
Topic Data
Publisher Data
Controlled Terminology
Many of the data fields in the MDR system are categorised, i.e. they are constrained to hold one of a predefined set of possible values.
For example, a clinical study may fall into one of several different main ‘types’, according to the broad methodology used:
- Interventional
- Observational
- Observational [Patient Registry]
- Expanded access
- Funded programme
Although clinical studies are usually categorised in similar ways, different systems can use different terms for those categories. Using the MDR’s controlled terminology, studies labelled as ‘clinical trial’, ‘randomised trial’, or ‘active intervention’ would all be mapped to a study type of ‘Interventional’, while other studies, described variously as ‘off-label’, ‘compassionate use’, or ‘pre-licence’, would all be mapped to ‘Expanded Access’.
The MDR system uses a set of ‘look up’ tables, each one providing the possible values of a categorised data field within the system. In the context of a user interface, each lookup table would often be used as the source of a dropdown or list box that displayed the categories for search or filtering purposes.
Each look up table lists the options available for that data field, and almost all have a core common structure:
- An integer id field, that provides a key for each option. The database only has to store the id to represent the category being used, making storage much more efficient.
- A string name field, with the short name of the category, as it would appear – for instance – in a dropdown box that displayed the alternatives available.
- A text description field, that provides, when necessary, a brief explanation of what the category meant.
- An integer list_order field, which allows the default order in a display or report (normally that of the id, as the primary key) to be over-ridden, by ordering the records according to the value of this field.
A few look up tables have one or two additional fields but the great majority follow the pattern above.
The categories used are normally as already exist within DataCite or other key systems, augmented if necessary by ECRIN to better cover the full range of categories required for clinical research data objects. For instance the listed study types are taken from the study type classification found within the ClinicalTrials.gov trials registry.
The lookup tables contain two additional ‘audit’ field:
- A string source field, denoting the system from which the term was originally taken. Terms created by the MDR team are labelled as ‘ECRIN’
- A date date_added field, that indicates the date a category was agreed as being required by the MDR system (though full implementation may be later). If a term has been proposed but not yet agreed this date is null.
The study_types table, below, illustrates how this structure works in the case of study categories.
id | name | description | list_order | source | date_added |
---|---|---|---|---|---|
0 | Not yet known | Dummy value supplied by default on entity creation. | 99 | ECRIN | 2019-02-08 |
11 | Interventional | A clinical trial. | 10 | ClinicalTrials.gov | 2019-02-08 |
12 | Observational | Any form of non-interventional research. | 20 | ClinicalTrials.gov | 2019-02-08 |
13 | Observational Patient Registry | Collecting data for a designated registry. | 30 | ClinicalTrials.gov | 2019-02-08 |
14 | Expanded access | Off label usage of a new product for individuals. | 40 | ClinicalTrials.gov | 2019-02-08 |
15 | Funded programme | With a single or linked series of grants. | 50 | ClinicalTrials.gov | 2019-02-08 |
At present the MDR system contains 29 look up tables, with most indicating that they refer to categorised fields by their name, usually <something>_types, occasionally _codes, categories or _classes.
- study attributes (study_types, study_statuses, gender_eligibility_types)
- study features (feature_types and categories of phase, allocation method, masking, time perspective etc.)
- study_relationship_types
- object_classes
- object_types
- object_filter_types
- object_access_types
- object_instance_types
- object_relationship_types
- contribution_types
- dataset_consent_types
- dataset_de-identification_levels
- dataset_record_key_types
- date_types
- description_types
- doi_status_types
- identifier_types
- language_codes
- language_usage_types
- resource_types
- title_types
- topic_types
- topic_vocabularies
- units (for size and time)
- composite_hash_types
Each of these tables is described in more detail on its own wiki page.
Another potential use of controlled vocabularies relates to organisations and people. References to both of these entities are often confused because any organisation or person may be known by a variety of names. The very large numbers of organisations and people in the MDR system, however, mean that it is impractical to provide codes for them all, or try to match all the names in source systems to those codes. Some degree of matching of organisations may be possible in the future. The problems of people and organisation representation are described in another section.