Difference between revisions of "Project Overview"
Line 1: | Line 1: | ||
− | + | __NO_TOC__ | |
− | <br><br> | + | === Background === |
+ | In recent years there has been a growing acceptance that to accurately assess the results of trials, and in particular to combine the results from different trials in meta-analyses, it is necessary to have access to the original source data, the “individual participant data” (IPD), as well as the result summaries found in published papers. <br><br> | ||
+ | In addition, to make sure that the IPD can be fully understood and properly analysed, a variety of other study documents (protocols, analysis plans, etc.) are required. As a result, under pressure from funders and journal editors, more and more researchers are making such material (generically, “clinical trial data objects”) available for sharing with others. The datasets are rarely freely available - instead a variety of access mechanisms (e.g. individual request and review, membership of pre-authorised groups, or web based self-attestation), are used in combination with different access types (e.g. download versus in-situ perusal). Furthermore the various data objects are stored in a wide variety of different locations: a rapidly growing number of general and specialised data repositories, trial registries, publications, the original researchers’ institutions, etc. | ||
+ | The researcher or reviewer wishing to locate relevant data objects for a study is therefore faced with a bewildering mosaic of possible source locations and access mechanisms, and this problem of ‘discoverability’ will almost certainly become much worse in the future as more and more materials is made available for sharing. <br><br> | ||
+ | |||
+ | === The Project's Aims === | ||
To maximise the discoverability of all these data objects, it is necessary to collect the metadata about them, including object provenance, location and access details, into a single system.<br> | To maximise the discoverability of all these data objects, it is necessary to collect the metadata about them, including object provenance, location and access details, into a single system.<br> | ||
− | To that end this project will attempt to develop an MDR (MetaData Repository) to standardise | + | To that end this project will attempt to develop an MDR (MetaData Repository) to assemble, standardise and display the metadata about clinical studies and the data objects generated by them, providing access to that metadata through a single system, accessed via a web portal. |
<br><br> | <br><br> | ||
+ | |||
+ | === Implementation === | ||
The web portal is developed in collaboration with ONEDATA (see https://www.onedata.org/#/home) and INFN at Bologna (Istituto Nazionale di Fisica Nucleare Sezione di Bologna, see http://www.bo.infn.it/). Development of the whole project has been within the H2020 eXtreme - DataCloud (XDC) project, see http://www.extreme-datacloud.eu/the-project/), funded by the EU under grant agreement 777367. | The web portal is developed in collaboration with ONEDATA (see https://www.onedata.org/#/home) and INFN at Bologna (Istituto Nazionale di Fisica Nucleare Sezione di Bologna, see http://www.bo.infn.it/). Development of the whole project has been within the H2020 eXtreme - DataCloud (XDC) project, see http://www.extreme-datacloud.eu/the-project/), funded by the EU under grant agreement 777367. | ||
<br><br> | <br><br> | ||
Metadata from a variety of data sources have been collected by ECRIN using different modalities (e.g. DB download, import of XML files through an API, scraping of web pages) and stored in a relational DB on the test bed server at INFN. Data is then exported as json file metadata to the OneData file management system and indexed via Elastic Search to make it available to the web portal. <br> | Metadata from a variety of data sources have been collected by ECRIN using different modalities (e.g. DB download, import of XML files through an API, scraping of web pages) and stored in a relational DB on the test bed server at INFN. Data is then exported as json file metadata to the OneData file management system and indexed via Elastic Search to make it available to the web portal. <br> |
Revision as of 16:48, 10 November 2019
__NO_TOC__
Background
In recent years there has been a growing acceptance that to accurately assess the results of trials, and in particular to combine the results from different trials in meta-analyses, it is necessary to have access to the original source data, the “individual participant data” (IPD), as well as the result summaries found in published papers.
In addition, to make sure that the IPD can be fully understood and properly analysed, a variety of other study documents (protocols, analysis plans, etc.) are required. As a result, under pressure from funders and journal editors, more and more researchers are making such material (generically, “clinical trial data objects”) available for sharing with others. The datasets are rarely freely available - instead a variety of access mechanisms (e.g. individual request and review, membership of pre-authorised groups, or web based self-attestation), are used in combination with different access types (e.g. download versus in-situ perusal). Furthermore the various data objects are stored in a wide variety of different locations: a rapidly growing number of general and specialised data repositories, trial registries, publications, the original researchers’ institutions, etc.
The researcher or reviewer wishing to locate relevant data objects for a study is therefore faced with a bewildering mosaic of possible source locations and access mechanisms, and this problem of ‘discoverability’ will almost certainly become much worse in the future as more and more materials is made available for sharing.
The Project's Aims
To maximise the discoverability of all these data objects, it is necessary to collect the metadata about them, including object provenance, location and access details, into a single system.
To that end this project will attempt to develop an MDR (MetaData Repository) to assemble, standardise and display the metadata about clinical studies and the data objects generated by them, providing access to that metadata through a single system, accessed via a web portal.
Implementation
The web portal is developed in collaboration with ONEDATA (see https://www.onedata.org/#/home) and INFN at Bologna (Istituto Nazionale di Fisica Nucleare Sezione di Bologna, see http://www.bo.infn.it/). Development of the whole project has been within the H2020 eXtreme - DataCloud (XDC) project, see http://www.extreme-datacloud.eu/the-project/), funded by the EU under grant agreement 777367.
Metadata from a variety of data sources have been collected by ECRIN using different modalities (e.g. DB download, import of XML files through an API, scraping of web pages) and stored in a relational DB on the test bed server at INFN. Data is then exported as json file metadata to the OneData file management system and indexed via Elastic Search to make it available to the web portal.