ClinicalTrials.gov SourceMetadata

From ECRIN-MDR Wiki
Jump to navigation Jump to search

Introduction

The structure of the new CTG API varies slightly, depending on whether the XML files are downloaded in a block, as an 'AllXML' download', or are retrieved using an API query. Most elements of the structure are the same, but

  • The files downloaded as a block do not have a "<?xml version "1.0"?>" statement at their head. One needs to be added to each file before most tools will recognise the file as valid xml.
  • Files downloaded as a block each start at the level of an individual study, i.e. the root element is FullStudy.
  • The API retrieved files contain one or more FullStudy records within a FullStudyList, which is simply a sequence of FullStudy elements. The FullStudyList is itself embedded in the FullStudyResponse root element. This has additional elements that hold information about the query and the records found against it.

This section describes the structure of the FullStudy element, which contains only three element types: Struct, List and Field.
Fields are simple name - value pairs. A Field element has a name attribute that indicates what the value is about, and then the element's value (always a string) gives the Field's value, e.g.

<Field Name="Gender">All</Field>
<Field Name="MinimumAge">70 Years</Field>
<Field Name="MaximumAge">82 Years</Field>

Lists are exactly that, i.e. sequences of Fields, or Structs, or Lists. They have a name attribute to indicate what the List is about and then the list of child elements, e.g.

<List Name="ConditionMeshList">
	<Struct Name="ConditionMesh"> … … </Struct>
	<Struct Name="ConditionMesh"> … … </Struct>
	<Struct Name="ConditionMesh"> … … </Struct>
</List>

Structs are complex elements that contain one or more fields, and / or one or more lists, and / or one or more structs. Again a name attribute indicates what the Struct is about, e.g.

<Struct Name="DesignModule">
	<Field Name="StudyType">Interventional</Field>
	<List Name="PhaseList">
		<Field Name="Phase">Not Applicable</Field>
         </List>
         <Struct Name="DesignInfo">
		<Field Name="DesignAllocation">Randomized</Field>
		<Field Name="DesignInterventionModel">Parallel Assignment</Field>
		<Field Name="DesignPrimaryPurpose">Prevention</Field>
		<Struct Name="DesignMaskingInfo">
			<Field Name="DesignMasking">Triple</Field>
			<List Name="DesignWhoMaskedList">
				<Field Name="DesignWhoMasked">Participant</Field>
				<Field Name="DesignWhoMasked">Care Provider</Field>
				<Field Name="DesignWhoMasked">Investigator</Field>
			</List>
		</Struct>
	</Struct>
	<Struct Name="EnrollmentInfo">
		<Field Name="EnrollmentCount">1400</Field>
		<Field Name="EnrollmentType">Anticipated</Field>
	</Struct>
</Struct>

Structs and Lists together therefore create the structure and hierarchy within the xml document, in which the Field values are inserted.

Overall structure

The FullStudy element has 4 top level Struct elements, designated as Sections…

	<Struct Name="ProtocolSection"> … … </struct>
	<Struct Name="ResultsSection"> … … </struct>
	<Struct Name="DocumentSection"> … … </struct>
	<Struct Name="DerivedSection"> … … </struct>

The protocol section is present for all records, the presence of the other three depends on whether relevant content is available.
Each of the top level sections is split up into further Structs, called Modules. The top level structure of FullStudy is shown below:

	<Struct Name="ProtocolSection"> 
		<Struct Name="IdentificationModule">  … … </struct>
		<Struct Name="StatusModule">  … … </struct>
		<Struct Name="SponsorCollaboratorsModule">  … … </struct>
		<Struct Name="OversightModule">  … … </struct>
		<Struct Name="DescriptionModule">  … … </struct>
		<Struct Name="ConditionsModule">  … … </struct>
		<Struct Name="DesignModule">  … … </struct>
		<Struct Name="ArmsInterventionsModule">  … … </struct>
		<Struct Name="OutcomesModule">  … … </struct>
		<Struct Name="EligibilityModule">  … … </struct>
		<Struct Name="ContactsLocationsModule">  … … </struct>
		<Struct Name="ReferencesModule">  … … </struct>
		<Struct Name="IPDSharingStatementModule">  … … </struct>
	</struct>
	<Struct Name="ResultsSection"> 
		<Struct Name="ParticipantFlowModule">  … … </struct>
		<Struct Name="BaselineCharacteristicsModule">  … … </struct>
		<Struct Name="OutcomeMeasuresModule">  … … </struct>
		<Struct Name="AdverseEventsModule">  … … </struct>
		<Struct Name="MoreInfoModule">  … … </struct>
	</struct>
	<Struct Name="DocumentSection"> 
		<Struct Name="LargeDocumentModule">  … … </struct>
	</struct>
	<Struct Name="DerivedSection"> 
		<Struct Name="MiscInfoModule">  … … </struct>
		<Struct Name="ConditionBrowseModule">  … … </struct>
		<Struct Name="InterventionBrowseModule">  … … </struct>
	</struct>

Each of these modules are described in more detail below, with the emphasis on those that are worth extracting for the MDR (if not always for direct mapping). The Field descriptions are taken (or adapted) from the official ClinicalTrials.gov descriptions (especially at https://prsinfo.clinicaltrials.gov/definitions.html).
For each module, module level Fields are described first, then the Lists and Structures and the fields within them.

The Identification Module

Field Name = "NCTId"
The unique code ('NCT' plus an 8 digit number) assigned by clinicaltrials.gov to each clinical study. Obviously required and acts as the source data study identifier (sd_sid), linking the records of each study in the extracted data tables. It will also be combined with base URLs to indicate the links to the study's protocol entry on clinical trials.gov, and the results entry if one exists.

Field Name = "BriefTitle"
A short title of the clinical study written in language intended for the lay public. The title should include information on the participants, condition being evaluated, and intervention(s) studied. (Limit: 300 characters.). Extracted and also used as the default 'display title' for the study.

Field Name = "OfficialTitle"
The title of the clinical study, corresponding to the title of the protocol. (Limit: 600 characters.). Extracted.

Field Name = "Acronym"
An acronym or abbreviation used publicly to identify the clinical study, if any. (Limit: 14 characters.). Extracted.

Struct Name = "OrgStudyIdInfo"
This includes the following Fields:

  • OrgStudyId – defined as any unique identifier assigned to the protocol by the sponsor.
  • OrgStudyIdType – type will always be "sponsor's protocol id". Appears rarely used in the source data. Extracted to check content.
  • OrgStudyIdDomain – rarely used, as domain would be the organisation, as identified elsewhere
  • OrgStudyIdLink - for research funded by NIH grants, appears to link to the reporting page on the NIH website where further grant information is displayed.

All this data is extracted, as the components of a study identifier.

Struct Name = "Organization"
This provides information on the organisation (the trial sponsor) that provided the Org Study Id. It has two Fields:

  • OrgFullName: The name of the organisation.
  • OrgClass: e.g. 'INDUSTRY', 'OTHER'

The name but not the class is extracted. The information may also be supplied elsewhere, as the lead sponsor organisation is given in the sponsor collaborators module.

List Name = "SecondaryIdInfoList"
If it exists, this list comprises one or more Structs called "SecondaryIdInfo", each of which is very similar to the 'primary' organisation as described in the 'OrgStudyIdInfo' Struct. Each Struct contains the fields:

  • SecondaryId – The Id itself. This includes any study identifiers assigned by other clinical trial registries. If the clinical study is funded in whole or in part by a U.S. Federal Government agency, the complete grant or contract number must be submitted as a Secondary ID.
  • SecondaryIdType - A description of the type of Secondary ID. Can be one of:
    • U.S. National Institutes of Health (NIH) Grant/Contract Award Number: Includes an activity code, institute code, and 6-digit serial number.
    • Other Grant/Funding Number: Identifier assigned by a funding organization other than the U.S. NIH - in this case the name of the funding organisation is also erquired.
    • Registry Identifier: Number assigned by a clinical trial registry; the name of the clinical trial registry is also required.
    • EudraCT Number: Identifier assigned by the European Medicines Agency Clinical Trials Database (EudraCT).
    • Other Identifier: The name of the organisation that issued the identifier is also required.
  • SecondaryIdDomain - If a Secondary ID Type of "Other Grant/Funding Number," "Registry Identifier," or "Other Identifier" is selected, the name of the funding organization, clinical trial registry, or organisation that issued the identifier.
  • SecondaryIdLink – for research funded by NIH grants, appears to link to the reporting page on the NIH website where further grant information is displayed..


List Name = "NCTIdAliasList"
This refers to NCT 'Alias identifier's, which were used historically for some studies. It is not extracted.

The Status Module

Field Name = "StatusVerifiedDate"
The date on which the responsible party last verified the clinical study information in the entire ClinicalTrials.gov record for the clinical study, even if no additional or updated information is being submitted. Extracted and added to any data sharing statement, as a final '(as of ...)' to indicate date of statement. Not used elsewhere.

Field Name = "OverallStatus"
Overall Recruitment Status - The recruitment status for the clinical study as a whole, based upon the status of the individual sites. If at least one facility in a multi-site clinical study has an Individual Site Status of "Recruiting," then the Overall Recruitment Status for the study is "Recruiting." One of:

  • Not yet recruiting: Participants are not yet being recruited
  • Recruiting: Participants are currently being recruited, whether or not any participants have yet been enrolled
  • Enrolling by invitation: Participants are being (or will be) selected from a predetermined population
  • Active, not recruiting: Study is continuing, meaning participants are receiving an intervention or being examined, but new participants are not currently being recruited or enrolled
  • Completed: The study has concluded normally; participants are no longer receiving an intervention or being examined (that is, last participant’s last visit has occurred)
  • Suspended: Study halted prematurely but potentially will resume
  • Terminated: Study halted prematurely and will not resume; participants are no longer being examined or receiving intervention
  • Withdrawn: Study halted prematurely, prior to enrollment of first participant

Needs to be extracted and then coded as study status

Field Name = "LastKnownStatus" and Field Name = "DelayedPosting"
No definitions available – not extracted.

Field Name = "WhyStopped"
A brief explanation of the reason(s) why a clinical study was stopped (for a study that is "Suspended," "Terminated," or "Withdrawn"). Not extracted

Field Name = "StudyFirstSubmitDate"
Field Name = "StudyFirstSubmitQCDate"
Field Name = "ResultsFirstSubmitDate"
Field Name = "ResultsFirstSubmitQCDate"
Field Name = "DispFirstSubmitDate"
Field Name = "DispFirstSubmitQCDate"
Field Name = "LastUpdateSubmitDate"
Most of these internally generated dates, relating to submission and QC checks, do not need to be extracted.
The exception is the StudyFirstSubmitDate, which has been taken as the date the NCT Id is assigned (though this is not known for certain).

Struct Name = "StartDateStruct"
Includes the two fields

  • StartDate – The estimated date on which the clinical study will be open for recruitment of participants, or the actual date on which the first participant was enrolled.
  • StartDateType – 'Actual' or 'Anticipated'.

Extracted and retained in the database as Study Start Year and Month, but not currently included in the ECRIN metadata.

Struct Name = "PrimaryCompletionDateStruct"
Includes the two fields

  • PrimaryCompletionDate – The date that the final participant was examined or received an intervention for the purposes of final collection of data for (all) the primary outcome.
  • PrimaryCompletionDateType – 'Actual' or 'Anticipated'.

Neither data point is extracted.

Struct Name = "CompletionDateStruct"
Includes the two fields

  • CompletionDate – The date the final participant was examined or received an intervention for purposes of final collection of data for the primary and secondary outcome measures and adverse events (for example, last participant’s last visit), whether the clinical study concluded according to the pre-specified protocol or was terminated.
  • CompletionDateType – 'Actual' or 'Anticipated'.

Neither data point is extracted.

Struct Name = "StudyFirstPostDateStruct"
Includes the two fields

  • StudyFirstPostDate – Date the registry entry first posted on the CTG site.
  • StudyFirstPostDateType – 'Actual' or 'Anticipated'. Not extracted but used to decide if the associated date should be used.

The date is extracted if it has 'Actual' status, as the date the data object was made available.

Struct Name = "ResultsFirstPostDateStruct"
Includes the two fields

  • ResultsFirstPostDate – For the Results registry entry, needs to be extracted if type is Actual, as the date the results data object was made available.
  • ResultsFirstPostDateType – 'Actual' or 'Anticipated'. Not extracted but used to decide if the associated date should be used.

The existence of a results section should be checked if this date is extracted.

Struct Name = "DispFirstPostDateStruct"
Includes the two fields

  • DispFirstPostDate – Unclear what this date refers to.
  • DispFirstPostDateType – 'Actual' or 'Anticipated'.

Not extracted.

Struct Name = "LastUpdatePostDateStruct"
Includes the two fields

  • LastUpdatePostDate – Date of last update of record.
  • LastUpdatePostDateType – 'Actual' or 'Anticipated'. Used to decide if the associated date should be used.

The date should be extracted if Actual, as the date the record was last updated. Also needs to be used when filtering records to identify new edits, although this is better done in the context of an API call.

Struct Name = "ExpandedAccessInfo"
Includes the fields

  • HasExpandedAccess – Whether there is expanded access to the investigational product for patients who do not qualify for enrollment in a clinical trial. One of Yes, No, Unknown.
  • ExpandedAccessNCTId – If expanded access is available, the NCT number of the expanded access record.
  • ExpandedAccessStatusForNCTId – No definition available, not extracted.

Extracted as it represent a 'study relationship', with one study being the 'expanded access' version of another.

The SponsorCollaborators Module

Struct Name = "ResponsibleParty"
This Struct contains the following fields

  • ResponsiblePartyType – An indication of whether the responsible party (i.e. for the CTG data) is the sponsor, the sponsor-investigator, or a principal investigator designated by the sponsor to be the responsible party. One of Sponsor, Principal Investigator, or Sponsor-Investigator.

If the Responsible Party, by Official Title is either "Principal Investigator" or "Sponsor-Investigator", the following is required:

  • ResponsiblePartyInvestigatorFullName – Name of the investigator, including first and last name
  • ResponsiblePartyInvestigatorTitle – The official title of the investigator at the primary organizational affiliation.
  • ResponsiblePartyInvestigatorAffiliation – Primary organizational affiliation of the individual.
  • ResponsiblePartyOldNameTitle – Unclear if the 'Old' refers to system usage or means 'Previous'. Requires investigation. Listed as the official title, and name of the investigator, including first and last name.
  • ResponsiblePartyOldOrganization – Unclear if the 'Old' refers to system usage or means 'Previous'. Requires investigation. Listed as Primary organizational affiliation of the individual

This data is extracted, excluding the 'Old' fields, as study contributor data.

Struct Name = "LeadSponsor"

  • LeadSponsorName – The name of the entity or the individual who is the sponsor of the clinical study.
  • LeadSponsorClass – A very broad classification of the organisation type. Too coarse a category to be useful so not extracted.


List Name = "CollaboratorList"
This is a list of Structs named "Collaborator", representing other organizations (if any) providing support. Support may include funding, design, implementation, data analysis or reporting. Each Collaborator Struct has the same fields as those of the LeadSponsor Struct:

  • CollaboratorName – The name of the entity providing support. Unfortunately the nature of the support is not required so this is of limited value.
  • CollaboratorClass – A very broad classification of the organisation type. Too coarse a category to be useful so not extracted.


The Oversight Module

Field Name = "OversightHasDMC"
Indicates whether a data monitoring committee has been appointed for this study. Not extracted.

Field Name = "IsFDARegulatedDrug" and Field Name = "IsFDARegulatedDevice"
Indicates that a clinical study is studying a drug product (including a biological product) or approved device product regulated by the FDA. Neither field is extracted.

Field Name = "IsUnapprovedDevice"
Indication that at least one device product studied in the clinical study has not been previously approved or cleared by the U.S. Food and Drug Administration (FDA) for one or more uses. Not extracted.

Field Name = "IsPPSD"
Indicates the study includes a U.S. FDA-regulated device product is a pediatric postmarket surveillance of a device product. Not extracted.

Field Name = "IsUSExport"
Whether any drug product (including a biological product) or device product studied in the clinical study is manufactured in the United States or one of its territories and exported for study in a clinical study in another country. Not extracted.

The Description Module

Field Name = "BriefSummary"
A short description of the clinical study, including a brief statement of the clinical study's hypothesis, written in language intended for the lay public. Extracted as a 'brief description' attribute of the study, as this information useful for people to quickly assess the relevance of a study to their search.

Field Name = "DetailedDescription"
An extended description of the protocol, including more technical information (as compared to the Brief Summary), if desired. Do not include the entire protocol; do not duplicate information recorded in other data elements, such as Eligibility Criteria or outcome measures. Not extracted as this level of detail not required.

The Conditions Module

List Name = "ConditionList"
A list of a single field

  • Condition

Representing the name(s) of the disease(s) or condition(s) studied in the clinical study, or the focus of the clinical study. Terms from NLM's Medical Subject Headings (MeSH) controlled vocabulary thesaurus or terms should be used. Extracted as a study topic, type condition, but not extracted if the same term has already been found in the (MESH) browse list.

List Name = "KeywordList"
A list of a single field

  • Keyword

Representing words or phrases that best describe the protocol. Keywords help users find studies in the database. Appropriate descriptors from NLM's Medical Subject Headings (MeSH) controlled vocabulary thesaurus or terms should be used. Extracted as a study topic.

The Design Module

Field Name = "StudyType"
The nature of the investigation or investigational use for which clinical study information is being submitted. Can be

  • Interventional (clinical trial): Participants are assigned prospectively to an intervention or interventions according to a protocol to evaluate the effect of the intervention(s) on biomedical or other health related outcomes.
  • Observational: Studies in human beings in which biomedical and/or health outcomes are assessed in pre-defined groups of individuals. Participants in the study may receive diagnostic, therapeutic, or other interventions, but the investigator does not assign specific interventions to the study participants. This includes when participants receive interventions as part of routine medical care, and a researcher studies the effect of the intervention.
  • Expanded Access: An investigational drug product (including biological product) available through expanded access for patients who do not qualify for enrollment in a clinical trial. Expanded Access includes all expanded access types: (1) for individual patients, including emergency use; (2) for intermediate-size patient populations; and (3) under a treatment IND or treatment protocol.

For extraction as the study's type attribute.

Field Name = "PatientRegistry"
This is a Yes / No field that is only applicable to observational studies. If Yes the study type can be changed from 'Observational' to 'Observational Patient Registry'.

Field Name = "TargetDuration"
For Patient Registries, the anticipated time period over which each participant is to be followed. A number is required plus a unit of time (years, months, weeks, days). Not extracted.

Struct Name = "ExpandedAccessTypes"
Has 3 fields but usage unclear, possibly Yes / No

  • ExpAccTypeIndividual – For individual participants, including for emergency use
  • ExpAccTypeIntermediate – For intermediate-size participant populations
  • ExpAccTypeTreatment – Under a treatment IND or treatment protocol

Not extracted.

N.B. The Phaselist, DesignAllocation, DesignInterventionModel, DesignPrimaryPurpose, and DesignMaskingInfo data only apply to Interventional studies.
The DesignObservationalModel, DesignTimePerspective, BioSpecRetention and BioSpecDescription data only apply to observational studies.

List Name = "PhaseList"
In the XML structure a list although the completion instructions say select one. Contains a single field, Phase, that indicates, for a clinical trial of a drug product (including a biological product), the numerical phase of such clinical trial, taken from:

  • Phase
    • N/A: Trials without phases (for example, studies of devices or behavioral interventions).
    • Early Phase 1 (Formerly listed as "Phase 0"): Exploratory trials, involving very limited human exposure, with no therapeutic or diagnostic intent (e.g., screening studies, microdose studies). See FDA guidance on exploratory IND studies for more information.
    • Phase 1: Includes initial studies to determine the metabolism and pharmacologic actions of drugs in humans, the side effects associated with increasing doses, and to gain early evidence of effectiveness; may include healthy participants and/or patients.
    • Phase 1/Phase 2: Trials that are a combination of phases 1 and 2.
    • Phase 2: Includes controlled clinical studies conducted to evaluate the effectiveness of the drug for a particular indication or indications in participants with the disease or condition under study and to determine the common short-term side effects and risks.
    • Phase 2/Phase 3: Trials that are a combination of phases 2 and 3.
    • Phase 3: Includes trials conducted after preliminary evidence suggesting effectiveness of the drug has been obtained, and are intended to gather additional information to evaluate the overall benefit-risk relationship of the drug.
    • Phase 4: Studies of FDA-approved drugs to delineate additional information including the drug's risks, benefits, and optimal use.

For extraction, as a study feature of type 'phase'.

Struct Name = "DesignInfo"
Includes a variety of Fields, Lists and Structs…

  • DesignAllocation – The method by which participants are assigned to arms in a clinical trial. Can be
    • N/A (not applicable): For a single-arm trial
    • Randomized: Participants are assigned to intervention groups by chance
    • Nonrandomized: Participants are expressly assigned to intervention groups through a non-random method, such as physician choice

    For extraction, as a study feature of type 'allocation type'.

  • DesignInterventionModel – The strategy for assigning interventions to participants. Can be
    • Single Group: Clinical trials with a single arm
    • Parallel: Participants are assigned to one of two or more groups in parallel for the duration of the study
    • Crossover: Participants receive one of two (or more) alternative interventions during the initial phase of the study and receive the other intervention during the second phase of the study
    • Factorial: Two or more interventions, each alone and in combination, are evaluated in parallel against a control group
    • Sequential: Groups of participants are assigned to receive interventions based on prior milestones being reached in the study, such as in some dose escalation and adaptive design studies

   For extraction, as a study feature of type 'intervention model'.

  • DesignInterventionModelDescription – Provides details about the Interventional Study Model. Not extracted.
  • DesignPrimaryPurpose – The main objective of the intervention(s) being evaluated by the clinical trial. Can be one of
    • Treatment: One or more interventions are being evaluated for treating a disease, syndrome, or condition.
    • Prevention: One or more interventions are being assessed for preventing the development of a specific disease or health condition.
    • Diagnostic: One or more interventions are being evaluated for identifying a disease or health condition.
    • Supportive Care: One or more interventions are evaluated for maximizing comfort, minimizing side effects, or mitigating against a decline in the participant's health or function.
    • Screening: One or more interventions are assessed or examined for identifying a condition, or risk factors for a condition, in people who are not yet known to have the condition or risk factor.
    • Health Services Research: One or more interventions for evaluating the delivery, processes, management, organization, or financing of healthcare.
    • Basic Science: One or more interventions for examining the basic mechanism of action (for example, physiology or biomechanics of an intervention).
    • Device Feasibility: An intervention of a device product is being evaluated in a small clinical trial (generally fewer than 10 participants) to determine the feasibility of the product; or a clinical trial to test a prototype device for feasibility and not health outcomes. Such studies are conducted to confirm the design and operating specifications of a device before beginning a full clinical trial.
    • Other: None of the other options applies.

   For extraction, as a study feature of type 'primary purpose'.

List Name = "DesignObservationalModelList" (in DesignInfo)
Includes the single field

  • DesignObservationalModel – The Primary strategy for participant identification and follow-up. One of
    • Cohort: Group of individuals, initially defined and composed, with common characteristics (for example, condition, birth year), who are examined or traced over a given time period.
    • Case-Control: Group of individuals with specific characteristics (for example, conditions or exposures) compared to group(s) with different characteristics, but otherwise similar.
    • Case-Only: Single group of individuals with specific characteristics.
    • Case-Crossover: Characteristics of case immediately prior to disease onset (sometimes called the hazard period) compared to characteristics of same case at a prior time (that is, control period).
    • Ecologic or Community Studies: Geographically defined populations, such as countries or regions within a country, compared on a variety of environmental (for example, air pollution intensity, hours of sunlight) and/or global measures not reducible to individual level characteristics (for example, healthcare system, laws or policies median income, average fat intake, disease rate).
    • Family-Based: Studies conducted among family members, such as genetic studies within families or twin studies and studies of family environment.
    • Other: Explain in Detailed Description.

For extraction, as a study feature of type 'observational model'.

List Name = "DesignTimePerspectiveList" (in DesignInfo)
Has a single field

  • DesignTimePerspective – For observational studies, describes the temporal relationship of observation period to time of participant enrollment. One of:
    • Retrospective: Look back using observations collected predominantly prior to subject selection and enrollment
    • Prospective: Look forward using periodic observations collected predominantly following subject enrollment
    • Cross-sectional: Observations or measurements made at a single point in time, usually at subject enrollment
    • Other: Explain in Detailed Description

For extraction, as a study feature of type 'time perspective'.

Struct Name = "DesignMaskingInfo (in DesignInfo)
A Struct providing Masking (blinding) information with various fields and Lists. Includes

  • DesignMasking – The party or parties involved in the clinical trial who are prevented from having knowledge of the interventions assigned to individual participants.
  • DesignMaskingDescription – Information about other parties who may be masked in the clinical trial, if any.
  • DesignWhoMaskedList (List) with single Field DesignWhoMasked. May be Participant, Care Provider, Investigator, Outcomes Assessor (The individual who evaluates the outcome(s) of interest), No Masking.

DesignMasking is for extraction, as a study feature of type 'masking'. The other elements are not extracted.

Struct Name = "BioSpec"
Includes the fields

  • BioSpecRetention – Indicates whether samples of material from research participants are retained in a biorepository. May be: None Retained / Samples With DNA / Samples Without DNA
  • BioSpecDescription – Specify all types of biospecimens to be retained (e.g., whole blood, serum, white cells, urine, tissue). Limit: 1000 characters.

The retention field is extracted, as study feature of type 'biospecimens retained'. The description field is not currently extracted.

Struct Name = "EnrollmentInfo"
Includes the fields

  • EnrollmentCount – The estimated total number of participants to be enrolled (target number) or the actual total number of participants that are enrolled in the clinical study. Note: "Enrolled" means a participant’s, or their legally authorized representative’s, agreement to participate in a clinical study following completion of the informed consent process. Extracted.
  • EnrollmentType – Actual or Anticipated

The count is extracted as a 'keyword' of type enrolment. The enrolment type can be actual or anticipated but is often null and therefore has limited use. It is not extracted.

The ArmsInterventions Module

List Name = "ArmGroupList"
Includes a Struct called "ArmGroup", that includes the fields

  • ArmGroupLabel – The short name used to identify the arm. (Limit: 62 characters.)
  • ArmGroupType – The role of each arm in the clinical trial. May be one of Experimental, Active Comparator, Placebo Comparator, Sham Comparator, No Intervention, Other
  • ArmGroupDescription – If needed, additional descriptive information (including which interventions are administered in each arm) to differentiate each arm from other arms in the clinical trial. (Limit: 999 characters.)
  • ArmGroupInterventionList – A List with the single Field "ArmGroupInterventionName", acting – presumably as the cross matrix between Arm Groups and Interventions.

Not extracted at present.

List Name = "InterventionList"
Includes a Struct called " Intervention", that includes the fields

  • InterventionType - For each intervention studied in the clinical study, the general type of intervention. One of Drug: Including placebo, Device: Including sham, Biological/Vaccine, Procedure/Surgery, Radiation, Behavioral: For example, psychotherapy, lifestyle counselling, Genetic: Including gene transfer, stem cell and recombinant DNA, Dietary Supplement: For example, vitamins, minerals, Combination Product: Combining a drug and device, a biological product and device; a drug and biological product; or a drug, biological product, and device, Diagnostic Test: For example, imaging, in-vitro, Other
  • InterventionName – Definition: A brief descriptive name used to refer to the intervention(s) studied in each arm of the clinical study. A non-proprietary name of the intervention must be used, if available. If a non-proprietary name is not available, a brief descriptive name or identifier must be used. (Limit: 200 characters.)
  • InterventionDescription – Details that can be made public about the intervention, other than the Intervention Name(s) and Other Intervention Name(s), sufficient to distinguish the intervention from other, similar interventions studied in the same or another clinical study. For example, interventions involving drugs may include dosage form, dosage, frequency, and duration. (Limit: 1000 characters.)
  • InterventionArmGroupLabelList – A List with a single Field, "InterventionArmGroupLabel". Not clear what this indicates.
  • InterventionOtherNameList – A List with a single Field, "InterventionOtherName" – presumed to be alternate names for the intervention.

This data was investigated for possible use, as a source of key words, but was found to be too detailed and often expressed in technical / internal terminology. Its usefulness was therefore limited and it was decided not to extract it.

The Outcomes Module

Consists of 3 lists, each of which is comprised of similar Structs

List Name = "PrimaryOutcomeList"
A sequence of "PrimaryOutcome" Structs, each of which has the Fields

  • PrimaryOutcomeMeasure – Name of the specific primary outcome measure.
  • PrimaryOutcomeDescription – Description of the metric used to characterize the specific primary outcome measure, if not included in the primary outcome.
  • PrimaryOutcomeTimeFrame – Time point(s) at which the measurement is assessed for the specific metric used. The description of the time point(s) of assessment must be specific to the outcome measure and is generally the specific duration of time over which each participant is assessed (not the overall duration of the study).

Too much detail for the MDR's purpose - not extracted.

List Name = "SecondaryOutcomeList"
A sequence of "SecondaryOutcome" Structs, each of which has the Fields

  • SecondaryOutcomeMeasure – Name of the specific primary outcome measure.
  • SecondaryOutcomeDescription – Description of the metric used to characterize the specific primary outcome measure, if not included in the primary outcome measure title.
  • SecondaryOutcomeTimeFrame – Time point(s) at which the measurement is assessed for the specific metric used. The description of the time point(s) of assessment must be specific to the outcome measure and is generally the specific duration of time over which each participant is assessed (not the overall duration of the study).

Not extracted.

List Name = "OtherOutcomeList"
A sequence of "OtherOutcome" Structs, each of which has the Fields

  • OtherOutcomeMeasure – Name of the specific primary outcome measure.
  • OtherOutcomeDescription – Description of the metric used to characterize the specific primary outcome measure, if not included in the primary outcome measure title.
  • OtherOutcomeTimeFrame – Time point(s) at which the measurement is assessed for the specific metric used. The description of the time point(s) of assessment must be specific to the outcome measure and is generally the specific duration of time over which each participant is assessed (not the overall duration of the study).

Not extracted.

The Eligibility Module

Field Name = "EligibilityCriteria"
A limited list of criteria for selection of participants in the clinical study, provided in terms of inclusion and exclusion criteria and suitable for assisting potential participants in identifying clinical studies of interest. Use a bulleted list for each criterion below the headers "Inclusion Criteria" and "Exclusion Criteria".
Too much detail for extraction.

Field Name = "HealthyVolunteers"
Indication that participants who do not have a disease or condition, or related conditions or symptoms, under study in the clinical study are permitted to participate in the clinical study. Yes or No. Not extracted.

Field Name = "Gender"
The sex of the participants eligible to participate in the clinical study. Extracted as 'gender eligibility'.

Field Name = "GenderBased"
If applicable, indicate whether participant eligibility is based on gender. Note: "Gender" means a person's self-representation of gender identity.

  • Yes: Eligibility is based on gender
  • No: Eligibility is not based on gender

Not extracted.

Field Name = "GenderDescription
If eligibility is based on gender, provide descriptive information about Gender criteria. Not for extraction.

Field Name = "MinimumAge"
The numerical value, if any, for the minimum age a potential participant must meet to be eligible for the clinical study. Unit of Time – one of Years, Months, Weeks, Days, Hours, Minutes, N/A (=No limit). Extracted as 'minimum age' and 'minimum age units'.

Field Name = "MaximumAge"
The numerical value, if any, for the maximum age a potential participant can be to be eligible for the clinical study. Unit of Time – one of Years, Months, Weeks, Days, Hours, Minutes, N/A (=No limit). Extracted as 'maximum age' and 'maximum age units'.

List Name = "StdAgeList"
Consists of a list of a single Field

  • StdAge – No documentation on what this signifies. Not extracted.


Field Name = "StudyPopulation"
(For observational studies only) A description of the population from which the groups or cohorts will be selected (for example, primary care clinic, community sample, residents of a certain town). Not extracted.

Field Name = "SamplingMethod"
(For observational studies only) Indicates the method used for the sampling approach and explain in the Detailed Description. One of Probability Sample or Non-Probability Sample. Not extracted.

The ContactsLocations Module

List Name = "CentralContactList"
Contains a Struct called " CentralContact ", which has Fields…

  • CentralContactName – First Name & Middle Initial & Last Name or Official Title & Degree
  • CentralContactRole – the role is being the contact for enrolment data
  • CentralContactPhone – Not extracted
  • CentralContactPhoneExt – Not extracted
  • CentralContactEMail – Extracted for internal use but not mapped to the MDR

No part of the list is extracted

List Name = "OverallOfficialList"
Contains a Struct called "OverallOfficial ", which has Fields…

  • OverallOfficialName – First Name & Middle Initial & Last Name & Degree
  • OverallOfficialAffiliation – Full name of the official's organization.
  • OverallOfficialRole – One of Study Chair, Study Director, Study Principal Investigator

This data should be extracted, under the generic study contributor type 'study lead'.

List Name = "LocationList"
Contains a Struct called "Location", for the location of clinical sites involved in the study.
It has Fields…

  • LocationFacility – Full name of the organization where the clinical study is being conducted.
  • LocationStatus – The recruitment status of each participating facility in a clinical study. One of Not yet recruiting, Recruiting, Enrolling by invitation
  • LocationCity
  • LocationState – Required for U.S. locations (including territories of the United States)
  • LocationZip – Required for U.S. locations (including territories of the United States)
  • LocationCountry
  • LocationContactList – A lit with structured contact details

No part of the list is extracted

The References Module

This module has 3 Lists, each of which contains a sequence of Structs.

List Name = "ReferenceList"
Contains a Struct called "Reference", which has Fields…

  • ReferencePMID – PMID for the citation in MEDLINE
  • ReferenceType – Indicates if the reference provided reports on results from this clinical study. Types include 'background', 'derived' and 'result'.
  • ReferenceCitation – A bibliographic reference in NLM's MEDLINE format
  • And a List called "RetractionList", which contains a Struct called "Retraction". This has Fields
    • RetractionPMID
    • RetractionSource

The references of type 'result' are extracted and the PMID IDs used to access further details. Any retraction lists should also be extracted (although these are rare).

List Name = "SeeAlsoLinkList
Contains a Struct called "SeeAlsoLink", which has Fields…

  • SeeAlsoLinkLabel – Title or brief description of the linked page
  • SeeAlsoLinkURL – Complete URL, including http:// or https://

The Link and Label can be extracted though only a small proportion are useful objects, e.g. point to study websites.
The free text nature of the link label makes it difficult to select the useful links automatically - some entries can be identified as useful manually, after extraction.

List Name = "AvailIPDList"
Contains a Struct called "AvailIPD", which lists the IPD and supporting documents currently available. It has the Fields…

  • AvailIPDId – No definition available
  • AvailIPDType – The type of data set or supporting information being shared. May be Individual Participant Data Set, Study Protocol, Statistical Analysis Plan, Informed Consent Form, Clinical Study Report, Analytic Code, etc.
  • AvailIPDURL – The web address used to request or access the data set or supporting information.
  • AvailIPDComment – The web address used to request or access the data set or supporting information.

This data is extracted as it indicates a possible data object. In many cases detailed access arrangements are not provided - the user is directed to a web site instead (often CSDR).

The IPDSharingStatement Module

Refers to future plans for IPD sharing - and is a series of related statements.
When a IPD sharing statement is made the data can be usefully extracted as concatenated statements, that together form an additional study item. The record last verified data can be used to add an '(as of...)' statement at the end.

Field Name = "IPDSharing"
Yes, No or Undecided.

Field Name = "IPDSharingDescription"
An overall statement describing planned approach. Often the only statement completed (if this section is completed at all).

Field Name = "IPDSharingTimeFrame"
Description of probable time frame.

Field Name = "IPDSharingAccessCriteria"
Description of possible criteria to be applied.

Field Name = "IPDSharingURL"
Actual or planned URL for data sharing information and / or access.

List Name = "IPDSharingInfoTypeList"
Has a single field, repeated as necessary within the list

  • IPDSharingInfoType

Lists the information types to be made available - needs to be concatenated into a list.

The Results Section

The results section includes a lot of very detailed information concerning the study's results, but none of this is required for the MDR.

The LargeDocument Module

This deals with uploaded documents, and comprises a single List ("LargeDocList") of the Struct "LargeDoc ". The Struct has the Fields:

  • LargeDocTypeAbbrev – Type of uploaded study document. One of Study Protocol, Statistical Analysis Plan (SAP), Informed Consent Form (ICF), Study Protocol with SAP and/or ICF
  • LargeDocHasProtocol – Presumed to be a boolean, needs checking
  • LargeDocHasSAP – Presumed to be a boolean, needs checking
  • LargeDocHasICF – Presumed to be a boolean, needs checking
  • LargeDocLabel – Not documented, presumed to be the document title (needs checking)
  • LargeDocDate – The date on which the uploaded document was most recently updated and, if needed, approved by a human subjects protection review board.
  • LargeDocUploadDate – The date the document was uploaded
  • LargeDocFilename – The File Name (format needs checking)

All of this information needs to be extracted and examined further, as linked data objects.

The MiscInfo Module

Field Name = "VersionHolder"
Includes the date of the data - i.e. the date it was downloaded, or prepared for download. Not extracted.

List Name = "RemovedCountryList"
Not extracted.

The ConditionBrowse Module

This allows Conditions to be more formally defined using MESH terms and codes.
The ConditionMeshList is extracted before the 'ordinary' condition list as it gives more standardised information.
Terms in the non MESH coded list are not extracted unless they are additional to the terms listed here.

List Name = "ConditionMeshList"
A List of a Struct called ="ConditionMesh", which has the fields

  • ConditionMeshId – Mesh id for condition
  • ConditionMeshTerm – Mesh term for condition

Extracted and merged with the Condition data from elsewhere in the record

List Name = "ConditionAncestorList"
A List of a Struct called ="ConditionAncestor", which has the fields

  • ConditionAncestorId – Mesh Id of the immediate ancestors of condition terms
  • ConditionAncestorTerm – Name of the immediate ancestors of condition terms

Not extracted.

List Name = "ConditionBrowseLeafList"
A List of a Struct called ="ConditionBrowseLeaf", which has the fields

  • ConditionBrowseLeafId – Mesh id of the listed condition
  • ConditionBrowseLeafName – Mesh name of the listed condition
  • ConditionBrowseLeafAsFound – Original name as given in the submission
  • ConditionBrowseLeafRelevance – high or low

Not extracted - the data is too detailed and its significance is unclear.

List Name = "ConditionBrowseBranchList"
A List of a Struct called ="ConditionBrowseBranch", which has the fields

  • ConditionBrowseBranchAbbrev – A Mesh abbreviation of the main Intervention branches covered by the conditions
  • ConditionBrowseBranchName – Name of the main Intervention branches covered by the interventions

Not extracted.

The InterventionBrowse Module

This allows Interventions to be more formally defined using MESH terms and codes.

List Name = "InterventionMeshList"
A List of a Struct called ="InterventionMesh", which has the fields

  • InterventionMeshId – Mesh id for Intervention
  • InterventionMeshTerm – Mesh term for Intervention

Extracted and used as the source of Intervention data for the study, as study topics (data from the investigations module being too detailed).

List Name="InterventionAncestorList"
A List of a Struct called ="InterventionAncestor", which has the fields

  • InterventionAncestorId – Mesh Id of the immediate ancestors of intervention terms
  • InterventionAncestorTerm – Name of the immediate ancestors of intervention terms

Not extracted.

List Name="InterventionBrowseLeafList"
A List of a Struct called ="InterventionBrowseLeaf", which has the fields

  • InterventionBrowseLeafId – Mesh id of the listed intervention
  • InterventionBrowseLeafName – Mesh name of the listed intervention
  • InterventionBrowseLeafAsFound – Original name as given in the submission
  • InterventionBrowseLeafRelevance – high or low

Not extracted - the data is too detailed and its significance is unclear.

List Name="InterventionBrowseBranchList"
A List of a Struct called ="InterventionBrowseBranch", which has the fields

  • InterventionBrowseBranchAbbrev – A Mesh abbreviation of the main Intervention branches covered by the interventions
  • InterventionBrowseBranchName – Name of the main Intervention branches covered by the interventions

Not extracted.