Sources and Contributors

From ECRIN-MDR Wiki
Revision as of 14:28, 13 January 2021 by Admin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Data Sources

At the moment (01/2021) the MDR obtains study data from:

These sources span all the study registrations within the WHO's global network of registries. Data is downloaded directly as XML files from the CTG web site, 'scraped' from the web pages of the EUCTR and ISRCTN, and downloaded as CSV files from the WHO ICTRP. The data from the WHO is used only for those studies not in CTG, EUCTR and ISRCTN. Studies in more than one registry are only entered once in the system (using the first registry entry encountered) though metadata on associated data objects may be drawn from a variety of sources.
The intention is to increase the amount of individual trial registries used as sources (because the data available usually includes more detail than in the ICTRP data), while retaining the CSV downloads from the WHO for those registries that cannot be included directly. It is also planned to use listings of observational studies and / or post authorisation studies when they are available (e.g. from EU PAS).
The study data provides study entries for all studies registered globally, with the data up to 7 days behind the source registry. It provides linked 'data objects' in the shape of trial registry entries and results entries, and – where listed in the original registry entry – references to other documents such as protocols, information sheets, and CSRs, and a limited number of datasets – though these are usually not publicly available. It also provides links to journal articles.
The sources used chiefly for data object metadata are:

XML files can be downloaded directly from PubMed, but data must be scraped from the other two web sites. The intention is to significantly expand the number of data repositories repositories used as sources of material.

Contributing Organisations

During the first phase of the MDR's development (2017 - 2020, in the XDC project) the work was divided up between three organisations:
ECRIN (https://www.ecrin.org/) was responsible for the collection of the data from the source sites, and its transformation into a common format (the ECRIN metadata schema) and into a single database. It then generated the json files that were transferred to OneData.
Onedata (https://onedata.org/) was responsible for the development of the web portal, written on EmberJS, which accessed the data via Elasticsearch queries. It also provided the file system that held the data (as file metadata).
INFN (Istituto Nazionale di Fisica Nucleare, http://www.bo.infn.it/) provided the physical infrastructure of the system and also indexed the metadata of the OneCloud files using Elasticsearch.

During the second phase of the system's development (April 2020 onwards, as part of EOSC Life), the development work has largely been carried out by ECRIN, though continued support was available from OneData and INFN until early 2021.
In particular, ECRIN took on the tasks of developing a web portal and indexing the data, whilst continuing to develop the data extraction and aggregation systems.

Funding

The MDR was initially funded (2017 - 2020) within the H2020 project Extreme Data Cloud (XDC): http://www.extreme-datacloud.eu/the-project/, EU grant agreement 777367.

Since April 2020 it has been funded by The H2020 EOSC-Life project, (Work package 1), https://www.eosc-life.eu/, to further develop the data collection and aggregation mechanisms from a wider set of sources, under grant agreement number 824087. EOSC-Life brings together the 13 Life Science ‘ESFRI’ research infrastructures (LS RIs) and is designed to create 'an open, digital and collaborative space for biological and medical research.' The project will promote and publish ‘FAIR’ data and provide a catalogue of services from participating RIs for the management, storage and reuse of data in the European Open Science Cloud (EOSC).

The project (in particular the development of the ECRIN portal) is also funded directly from ECRIN's own non-project resources. This is with the intention of integrating the portal into the EOSC Hub project (https://www.eosc-hub.eu/, grant agreement number 824087).