Difference between revisions of "Project Overview"

From ECRIN-MDR Wiki
Jump to navigation Jump to search
(The need to make clinical research data and documents FAIR)
 
(32 intermediate revisions by the same user not shown)
Line 1: Line 1:
Data generated by clinical research is collected and stored in a wide variety of different locations (e.g. data repositories, trial registries, publications, etc.); because of the sensitivity of the data, data and related documents are often available under restricted or managed access.
+
__NOTOC__
To maximise the discoverability of all these data objects, it is necessary to collect the metadata about them, including object provenance, location and access details, into a single system.<br>
+
===The need to make clinical research data and documents FAIR===
To that end the XDC project will develop an MDR (MetaData Repository) to standardise, assemble and display the metadata about clinical studies and the data objects generated by them, and provide access to them through a single system, accessed via a web portal.<br>
+
In recent years there has been a growing acceptance that to accurately assess the results of trials and other clinical research, and in particular to combine the results from different trials in meta-analyses, it is much better to have access to the original source data, the “individual participant data” (IPD), as well as the result summaries found in published papers.<ref>For a discussion of the advantages of access to IPD and data objects in clinical research, and some of the issues around secondary use of data, see Ohmann C, Banzi R, Canham S, et al. Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open 2017;7:e018647. doi: 10.1136/bmjopen-2017-018647 </ref> <br>
The web portal is developed in collaboration with ONEDATA (Cyfronet); functionality for discoverability of studies and related data objects is provided by INFN. So far metadata from 8 data sources have been collected using different modalities (e.g. DB download, OAI-PMH, scraping of web pages) and stored as JSON objects/relational DB form on the test bed server at INFN. Data will be then ingested, uploaded and made available to users. <br>
+
In addition, to make sure that the IPD can be fully understood and properly analysed, a variety of other study documents (protocols, analysis plans, etc.) are required. As a result, under pressure from funders and journal editors, more and more researchers are making such material (generically, “clinical trial data objects”) available for sharing with others. The datasets are rarely freely available - instead a variety of access mechanisms (e.g. individual request and review, membership of pre-authorised groups, or web based self-attestation), are used in combination with different access types (e.g. download versus in-situ perusal). Furthermore the various data objects are stored in a wide variety of different locations: a rapidly growing number of general and specialised data repositories, trial registries, publications, the original researchers’ institutions, etc. <br>
The ONEDATA solution will implement the metadata collection and transport from multiple OneProvider to the central OneZone service; it will also deal with the ACLs management to implement the data protection.
+
The researcher or reviewer wishing to locate relevant data objects for a study is therefore faced with a bewildering mosaic of possible source locations and access mechanisms, and this problem of ‘discoverability’ will almost certainly become much worse in the future as more and more materials are made available for sharing. Systems are therefore required to make the data and associated documents generated by clinical research more FAIR: Findable, Accessible, Inter-Operable, and Re-usable. The ECRIN Clinical Research Metadata Repository, or MDR, is designed to be one such system.<br><br>
 +
 
 +
=== The role of the MDR===
 +
The principal aim of the MDR is to make the data objects generated from clinical research easier to locate, and to describe how each of those data objects can be accessed, providing direct links to them where that is possible. The central idea is to develop systems that can collect the ''metadata'' about the data objects, including object provenance, location and access details, from a variety of source systems (e.g. trial registries, data repositories, bibliographic systems) and aggregate it into a single '''MetaData Repository''', the MDR. The system is designed to first assemble the metadata, on a global scale, and using a variety of methods, e.g. files obtained through API calls, direct file downloads, and web scraping (for further details see '''[[Data Collection Overview]]'''). It then standardises that metadata into a single schema, devised by ECRIN to capture the essential information about each object's discoverability, access and provenance (see '''[[The ECRIN Metadata Schemas]]'''). The MDR then provides access to the standardised metadata through a single system, accessed via a web portal. The portal system carries out comprehensive indexing of the metadata, to support easy searching and filtering, so that researchers can quickly identify the data objects of interest to them (described in more detail in '''[[Portal Functionality]]''').
 +
<br><br>
 +
 
 +
=== Implementation and Partners===
 +
The MDR system was initially designed and developed by [https://www.ecrin.org/ '''ECRIN'''] (the European Clinical Research Infrastructure Network), in collaboration with ([https://www.onedata.org/#/home '''ONEDATA''' ]) and [http://www.bo.infn.it/ '''INFN'''] (Istituto Nazionale di Fisica Nucleare) at Bologna. This was in the context of the H2020 [http://www.extreme-datacloud.eu/the-project/ '''eXtreme - DataCloud'''](XDC) project, funded by the EU under grant agreement 777367.
 +
<br>
 +
In the initial implementation, which served as a positive proof of concept, metadata from a variety of data sources were collected by ECRIN and stored in a relational DB on a test server at INFN. Data was then exported as json file metadata to the OneData file management system and indexed via Elastic Search to make it available to the web portal (also developed by OneData).
 +
<br>
 +
The next stage of development of the MDR is taking place within the H2020 [https://www.eosc-life.eu/ '''EOSC-Life'''] project, under EU grant agreement grant 824087, and is being largely carried out by ECRIN. This stage should expand the data collection process to include a greater number and variety of data sources, and automate more of the MDR's functioning, making it easier to keep the data as up to date as possible. Data is now collected onto ECRIN managed servers (supplied by OVH). An ECRIN managed portal has been designed to replace that originally provided by OneData. The intention is to integrate that portal with the European Open Science Cloud's [https://www.eosc-hub.eu/ '''EOSC-hub'''] services.
 +
<br><br>
 +
 
 +
=== Notes===
 +
<references />

Latest revision as of 12:04, 28 October 2020

The need to make clinical research data and documents FAIR

In recent years there has been a growing acceptance that to accurately assess the results of trials and other clinical research, and in particular to combine the results from different trials in meta-analyses, it is much better to have access to the original source data, the “individual participant data” (IPD), as well as the result summaries found in published papers.[1]
In addition, to make sure that the IPD can be fully understood and properly analysed, a variety of other study documents (protocols, analysis plans, etc.) are required. As a result, under pressure from funders and journal editors, more and more researchers are making such material (generically, “clinical trial data objects”) available for sharing with others. The datasets are rarely freely available - instead a variety of access mechanisms (e.g. individual request and review, membership of pre-authorised groups, or web based self-attestation), are used in combination with different access types (e.g. download versus in-situ perusal). Furthermore the various data objects are stored in a wide variety of different locations: a rapidly growing number of general and specialised data repositories, trial registries, publications, the original researchers’ institutions, etc.
The researcher or reviewer wishing to locate relevant data objects for a study is therefore faced with a bewildering mosaic of possible source locations and access mechanisms, and this problem of ‘discoverability’ will almost certainly become much worse in the future as more and more materials are made available for sharing. Systems are therefore required to make the data and associated documents generated by clinical research more FAIR: Findable, Accessible, Inter-Operable, and Re-usable. The ECRIN Clinical Research Metadata Repository, or MDR, is designed to be one such system.

The role of the MDR

The principal aim of the MDR is to make the data objects generated from clinical research easier to locate, and to describe how each of those data objects can be accessed, providing direct links to them where that is possible. The central idea is to develop systems that can collect the metadata about the data objects, including object provenance, location and access details, from a variety of source systems (e.g. trial registries, data repositories, bibliographic systems) and aggregate it into a single MetaData Repository, the MDR. The system is designed to first assemble the metadata, on a global scale, and using a variety of methods, e.g. files obtained through API calls, direct file downloads, and web scraping (for further details see Data Collection Overview). It then standardises that metadata into a single schema, devised by ECRIN to capture the essential information about each object's discoverability, access and provenance (see The ECRIN Metadata Schemas). The MDR then provides access to the standardised metadata through a single system, accessed via a web portal. The portal system carries out comprehensive indexing of the metadata, to support easy searching and filtering, so that researchers can quickly identify the data objects of interest to them (described in more detail in Portal Functionality).

Implementation and Partners

The MDR system was initially designed and developed by ECRIN (the European Clinical Research Infrastructure Network), in collaboration with (ONEDATA ) and INFN (Istituto Nazionale di Fisica Nucleare) at Bologna. This was in the context of the H2020 eXtreme - DataCloud(XDC) project, funded by the EU under grant agreement 777367.
In the initial implementation, which served as a positive proof of concept, metadata from a variety of data sources were collected by ECRIN and stored in a relational DB on a test server at INFN. Data was then exported as json file metadata to the OneData file management system and indexed via Elastic Search to make it available to the web portal (also developed by OneData).
The next stage of development of the MDR is taking place within the H2020 EOSC-Life project, under EU grant agreement grant 824087, and is being largely carried out by ECRIN. This stage should expand the data collection process to include a greater number and variety of data sources, and automate more of the MDR's functioning, making it easier to keep the data as up to date as possible. Data is now collected onto ECRIN managed servers (supplied by OVH). An ECRIN managed portal has been designed to replace that originally provided by OneData. The intention is to integrate that portal with the European Open Science Cloud's EOSC-hub services.

Notes

  1. For a discussion of the advantages of access to IPD and data objects in clinical research, and some of the issues around secondary use of data, see Ohmann C, Banzi R, Canham S, et al. Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open 2017;7:e018647. doi: 10.1136/bmjopen-2017-018647