Difference between revisions of "Background and History"

From ECRIN-MDR Wiki
Jump to navigation Jump to search
(The XDC project and the pilot MDR, 2017-2020)
Line 5: Line 5:
 
<br/><br/>
 
<br/><br/>
 
===The XDC project and the pilot MDR, 2017-2020===
 
===The XDC project and the pilot MDR, 2017-2020===
The opportunity to actually build a demonstrator MDR came in 2017, when the H2020 project Extreme Data Cloud (XDC) was developed, with the MDR as one of the proposed use cases. This project focused on developing services for very large or very heterogeneous data sets.
+
The opportunity to actually build a demonstrator MDR came in 2017, when the H2020 project Extreme Data Cloud ([http://www.extreme-datacloud.eu/the-project/ '''XDC''']) was developed, with the MDR as one of the proposed use cases. This project focused on developing services for very large or very heterogeneous data sets. Clinical research data is not large in volume (not compared to the huge volumes generated by, for example, high energy and particle physics research) but it is extremely heterogeneous in nature, with many hundreds of thousands of small files, in different formats, located in many different places. ENCRIN therefore set about specifying the MDR portal, as well as developing systems to collect and extract data from different sources.
 +
The system was to be developed with two partners in XDC: [https://www.onedata.org/#/home '''ONEDATA''' ], based in Poland and [http://www.bo.infn.it/ '''INFN'''] (Istituto Nazionale di Fisica Nucleare) at Bologna. INFN would provide the IT infrastructure and carry out indexing of the collected data using Elastic Search, where as OneData would provide the file storage system and also the web portal (to ECRIN's specification).
  
  
 +
 +
The MDR system was initially designed and developed by [https://www.ecrin.org/ '''ECRIN'''] (the European Clinical Research Infrastructure Network), in collaboration with ([https://www.onedata.org/#/home '''ONEDATA''' ]) and [http://www.bo.infn.it/ '''INFN'''] (Istituto Nazionale di Fisica Nucleare) at Bologna. This was in the context of the H2020 [http://www.extreme-datacloud.eu/the-project/ '''eXtreme - DataCloud'''](XDC) project, funded by the EU under grant agreement 777367.
 +
<br>
 +
In the initial implementation, which served as a positive proof of concept, metadata from a variety of data sources were collected by ECRIN and stored in a relational DB on a test server at INFN. Data was then exported as json file metadata to the OneData file management system and indexed via Elastic Search to make it available to the web portal (also developed by OneData).
 +
<br>
 
<br/><br/>
 
<br/><br/>
 +
 
===The European Open Science Cloud and the MDR, 2020 onwards ===
 
===The European Open Science Cloud and the MDR, 2020 onwards ===
  

Revision as of 14:48, 28 October 2020

Initial Planning, 2016-2017

Beginning in 2016, ECRIN's work within the H2020 CORBEL project, in particular the leadership of a group looking at 'data sharing' issues within clinical research, highlighted the need to improve the FAIRness of clinical research data. It became clear that if researchers made more and more data objects available to others, as they were being encouraged to do, those objects would often be in a wide variety of places and available under a wide range of conditions. Even discovering where the various data objects associated with a study were located might become difficult and time-consuming, and therefore costly, and once found there would be the additional problem of understanding how to access them - because many such objects would only be available under controlled access. The concept of a 'metadata repository', that could bring all this discoverability, access and provenance (DAP) metadata together, evolved out of these concerns.
The initial task was seen as the creation of a metadata schema that focused on the required discoverability, access and provenance data points. The first version of such a schema [1] was published in late 2016. In fact that metadata schema (now at version 5) has evolved into a combination of two schemas, one for studies and the other for the associated data objects. The first is based on a subset of the data points within the ClinicalTrials.gov trial registry (by far the largest trial registry in the world) and the second is based on DataCite. Two separate schemas are necessary because the relationship between studies and data objects in clinical research is many-to-many. It is therefore necessary to store study details and data object details separately, with a separate 'link' table indicating which data objects are associated with which study.

The XDC project and the pilot MDR, 2017-2020

The opportunity to actually build a demonstrator MDR came in 2017, when the H2020 project Extreme Data Cloud (XDC) was developed, with the MDR as one of the proposed use cases. This project focused on developing services for very large or very heterogeneous data sets. Clinical research data is not large in volume (not compared to the huge volumes generated by, for example, high energy and particle physics research) but it is extremely heterogeneous in nature, with many hundreds of thousands of small files, in different formats, located in many different places. ENCRIN therefore set about specifying the MDR portal, as well as developing systems to collect and extract data from different sources. The system was to be developed with two partners in XDC: ONEDATA , based in Poland and INFN (Istituto Nazionale di Fisica Nucleare) at Bologna. INFN would provide the IT infrastructure and carry out indexing of the collected data using Elastic Search, where as OneData would provide the file storage system and also the web portal (to ECRIN's specification).


The MDR system was initially designed and developed by ECRIN (the European Clinical Research Infrastructure Network), in collaboration with (ONEDATA ) and INFN (Istituto Nazionale di Fisica Nucleare) at Bologna. This was in the context of the H2020 eXtreme - DataCloud(XDC) project, funded by the EU under grant agreement 777367.
In the initial implementation, which served as a positive proof of concept, metadata from a variety of data sources were collected by ECRIN and stored in a relational DB on a test server at INFN. Data was then exported as json file metadata to the OneData file management system and indexed via Elastic Search to make it available to the web portal (also developed by OneData).


The European Open Science Cloud and the MDR, 2020 onwards



The current progress of the project within EOSC life is tabulated in Progress (EOSC Life)

Notes

  1. Canham, S., Ohmann, C. A metadata schema for data objects in clinical research. Trials 17, 557 (2016). https://doi.org/10.1186/s13063-016-1686-5