JSON files v6 to v7 changes

From ECRIN-MDR Wiki
Revision as of 11:19, 11 November 2022 by Admin (talk | contribs) (Clarification of Study and Object Contributors (v7.1))
Jump to navigation Jump to search

Addition of country and location attributes

For studies, the country or countries where participants were recruited is now included.
In addition, where the data exists in the source material - for the moment only within ClinicalTrials.gov data - the clinical sites for the study are also listed, including the city and country of the site and the status as of the most recent data harvesting.
Internally within the system integer Geonames ids are used for countries and cities (see https://geocode.xyz/). For display and within the schema the city and country names are also included.

Changes for topic records

For ‘topic’ records – both study_topics and object_topics – the original controlled terminology (CT) code and controlled terminology code have been restored to the schema (these were never removed from the data). In most cases the CT will be MESH (code = 14) but in some cases MedDRA and ICD codes, and very occasionally a few other CTs, are used. Returning these datapoints to the schema simply allows them to be displayed if and when required.

Inclusion of study start time (v7.1)

The year and month the study started has always been part of the data extracted, where it exists in the source material. This information has now been added to the schema and so is now available for 'export' to the MDR UI and other systems.

Clarification of Study and Object Contributors (v7.1)

Within the MDR databases there has always been a clear distinction between study contributors (study leads, sponsors, funders, etc.) and object contributors (chiefly authors of papers). They are extracted and stored separately. When exporting the data objects as JSON files, however, this situation has been muddled. Study contributors were not exported at all, and instead were 'given' to linked data objects, as contributors to the generation of the object. In particular, study organisational contributors were given to all data objects, including journal papers (that have their own authors), and both organisational and individual study contributors were given to non-article data objects, which in almost all cases do not have any contributors specified in the source data. This was not unreasonable - study contributors do contribute, indirectly, to the generation of all data objects, and it allowed the object data to more easily match the expectations of the DataCite schema - but it is now beginning to cause some issues and so has been changed to a simpler organisation of the exported data. The reasons for the change include:

  • With the advent of the RMS, and perhaps other mechanisms for capturing object metadata directly, there is an opportunity to identify the real contributors to any particular object, rather than assume they were created in some way by the whole study management team.
  • At a time when there is increased interest in attributing data generation and rewarding its re-use, capturing accurate data on object authorship, beyond paper authorship, is increasingly important.
  • It is confusing for potential external users of the data, who, from previously published schema descriptions, were unable to see that separate study contributor data existed. Without a good understanding of how the contributor data was constructed the risk was that they would misinterpret it.
  • There is not yet any significant use of this data for searching or filtering purposes. This re-organisation needs to take place now rather than after filtering / searching systems are constructed that use contributor data. Having separate study and object contributor data, rather than having it all in one place, may make such mechanisms a little more complex, but it makes the data itself much more accurate.

Note that study topic data is still added to object data, other than for journal papers, to allow potential searches like 'find all datasets / or protocols on topic X' to take place more directly.