Difference between revisions of "System Scheduling"

From ECRIN-MDR Wiki
Jump to navigation Jump to search
(Processing PubMed data (Friday))
(Processing PubMed data (Friday))
Line 80: Line 80:
 
! Time !! Target!! Call  
 
! Time !! Target!! Call  
 
|-
 
|-
| 09:00 || PubMed     || ...\DataDownloader.exe -s 100135 -t 114 -q 10003
+
| 09:00 || PubMed          || ...\DataDownloader.exe -s 100135 -t 114 -q 10003
 
|-
 
|-
 
| 10:30 || PubMed || ...\DataDownloader.exe -s 100135 -t 114 -q 10004
 
| 10:30 || PubMed || ...\DataDownloader.exe -s 100135 -t 114 -q 10004

Revision as of 17:56, 13 January 2021

Introduction

The various data extraction and data aggregation processes are all scheduled on a weekly basis. Each is done using a console application, which can be switched on with the relevant parameters to carry out the specified operation on the target source. Windows Task Scheduler is used to control the scheduling. Details of the scheduling will vary as more sources are added, but the situation as of January 2021 is described below. Several days in the week are each assigned a focus on a particular task, with a few spare days available for catch up.

Downloads (Monday)

Most of the downloads are scheduled for Monday. Because they take place on a weekly basis most can be done quite quickly - the main exception is that for EU CTR, which does not provide a 'date last revised' field. The current schedule is as follows, with the relevant parameters given:

Time Target Call
07:00 Yoda ...\DataDownloader.exe -s 101901 -t 102
07:30 BioLINCC ...\DataDownloader.exe -s 101900 -t 102
09:00 ClinicalTrials.gov ...\DataDownloader.exe -s 100120 -t 111
11:00 ISRCTN ...\DataDownloader.exe -s 100126 -t 112
13:00 WHO ...\DataDownloader.exe -s 100115 -t 113 -f "C:\MDR_sources\WHO\<file name>.csv"
14:00 EUCTR ...\DataDownloader.exe -s 100123 -t 142

Note that the WHO 'download' involves processing a specific named file (with a different date-stamp each week), and the file name must therefore be manually inserted into the call each week.
The EUCTR download is placed last because it is by far the longest and also the one most prone to errors, because of apparent issues and maintenance work on the web site. It may therefore need to be re-run the following day.

Harvests (Wednesday)

Harvests are relatively straightforward and are scheduled for Wednesdays. The ids for the sources are string arrays and therefore enclosed in quotes, even when there is only one of them. (The Downloader exe, by contrast, only expects a single integer source.)

Time Target Call
07:00 BioLINCC ...\DataHarvester.exe -s "101900" -t 1
07:30 Yoda ...\DataHarvester.exe -s "101901" -t 1
08:00 ClinicalTrials.gov ...\DataHarvester.exe -s "100120" -t 2
09:00 ISRCTN ...\DataHarvester.exe -s "100126" -t 2
10:00 EUCTR ...\DataHarvester.exe -s "100123" -t 2
13:00 WHO A ...\DataHarvester.exe -s "100116, 100117, 100118, 100119" -t 2
13:00 WHO B ...\DataHarvester.exe -s "100121, 100122, 100124, 100125" -t 2
13:00 WHO C ...\DataHarvester.exe -s "100127, 100128, 100129, 100130, 100131, 1000132, 101989" -t 2

The WHO A, B and C harvests refers to different collections of WHO registries. A processes sources 100116 (the Australia / New Zealand registry), 100117 (the Brazilian registry), 100118 (the Chinese registry) and 100119 (thew South Korean registry). B processes 100121 (the Indian registry), 100122 (the Cuban registry), 100124 (the German DRKS registry) and 100125 (the Iranian registry). C processes the rest - 100127 through to 100132 (Registries in Japan, Africa, Peru, Sri Lanka, Thailand and the Netherlands respectively), plus 101989 (the Lebanese registry).

Imports (Thursday)

Imports are also relatively straightforward, taking place on Thursdays. The parameters (just the source id(s)) and organisation of the calls is very similar to that for harvests.

Time Target Call
09:00 BioLINCC ...\DataImporter.exe -s "101900"
09:20 Yoda ...\DataImporter.exe -s "101901"
09:40 ISRCTN ...\DataImporterexe -s "100126"
10:00 EUCTR ...\DataImporterexe -s "100123"
11:00 ClinicalTrials.gov ...\DataImporter.exe -s "100120"
12:00 WHO A ...\DataImporter.exe -s "100116, 100117, 100118, 100119"
12:30 WHO B ...\DataImporter.exe -s "100121, 100122, 100124, 100125"
13:00 WHO C ...\DataImporter.exe -s "100127, 100128, 100129, 100130, 100131, 1000132, 101989"

Processing PubMed data (Friday)

Processing of the PubMed data is best done after all the other (study based) sources have been imported, because one of the two mechanisms for identifying relevant PubMed records uses references inside other source databases. It is therefore scheduled for Friday, and runs through all aspects of the extraction process, including two initial downloads. as shown below.

Time Target Call
09:00 PubMed          ...\DataDownloader.exe -s 100135 -t 114 -q 10003
10:30 PubMed ...\DataDownloader.exe -s 100135 -t 114 -q 10004
17:00 PubMed ...\DataHarvester.exe -s "100135" -t 2
18:00 PubMed ...\DataImporter.exe -s "100135"

Aggregating data (Sunday)