JSON files v1 to v2 changes

From ECRIN-MDR Wiki
Jump to navigation Jump to search

Changes affecting both Study and Data Object Definitions
1. Lower case names. In both the Study and the Data Object files, element names were changed from PascalCase (e.g. "ScientificTitle", "URLDirectlyAccessible") to lower case with underscores, sometimes known as ‘snake_case’, (e.g. "scientific_title", "url_direct_access"). This was to bring the element names in line with database names in PostgreSQL, which normally uses a snake_case naming convention.

2. Compound Lookup Values. In version 1 reference to lookup values, i.e. controlled terminology listed in look up tables, such as "file_type", (object) "class”, or "access_type" used an integer reference to the value key. In version 2 those integers are normally replaced by a compound structure that includes both the integer key and the string value (or decode) of the value in question. Though strictly speaking the text is redundant this does make the JSON files much more readable for humans, and allows the receiving system to optionally decide to store the text as well as the integer.

3. More flexible Organisation and Person data structures. In version 1 the over-optimistic assumption was that organisations and people added to the system from various data sources could all be included within a ‘context’ database, given an Id in that database, and that Id would then be used within the JSON files to refer to the organisation or person concerned.
In fact the numbers of organisations and people being harvested make this impractical. Two new structures were therefore defined for both organisations and people, which allow either an Id to be included, with associated names being derived from the context database (giving a consistency to the data), or which simply pass on the names found in the source data, with no attempt to link these to a record stored elsewhere.

Changes affecting the Study Definition
4. Study type. A study_type compound element was added (id and name), giving a broad indication of study type, using the terms defined in Clinical Trials.gov.

5. Study status. A study_status compound element was added (id and name), giving a broad indication of the recruitment status of a study, using the terms defined in Clinical Trials.gov.

6. Topic data with controlled terminology data. The topic structures for studies was extended to include reference to any controlled terminology (e.g. MESH, ICD)_ being used. "TopicTypeId " became "topic_source_type" (id and name) and indicates how the topic was categorised in the source data. "topic_ct" (id and name) was added and indicates the controlled terminology scheme being used (when one is defined in the source data).

7. Addition of language code. A two letter language code was added to the scientific_title.

8. The Study Relationships element was dropped. In retrospect it is not clear why, but it may be that it was felt that little data for this would be available, or it was simply missed in error.

Changes affecting the Data Object Definition
9. Object_topics added. Although originally deprecated, with the emphasis on study topics instead, it became clear that for some data objects (especially journal articles) it would be necessary to collect and store the topics related to the data objects. The study_topics structure was therefore duplicated within the data object JSON structure.

10. CreationYear became publication_year. The central importance of publication_year, as the year a resource became available (though not necessarily publicly), and thus a key part of its citation, was recognised by changing the name and definition of this field. It also brings the data object definition more in line with DataCite.

11. Addition of language code. A two letter language code was added to the data_object_other_titles.

12. The IsPerson field dropped. In the data_object_contributors structure the field indicating if the contributor was a person or an organisation was dropped (the reason is not clear). The suggestion is to re-introduce it or an equivalent indicator field.

13. Addition of date comments. A comments field was added to the object_dates structure.

14. Addition of data_object_rights. A structure representing the copyright and distribution rights that can be attached to a data object was added. This helps to bring the data object structure closer to that defined within DataCite.