Difference between revisions of "System Scheduling"

Revision as of 17:59, 13 January 2021

The various data extraction and data aggregation processes are all scheduled on a weekly basis. Each is done using a console application, which can be switched on with the relevant parameters to carry out the specified operation on the target source. Windows Task Scheduler is used to control the scheduling. Details of the scheduling will vary as more sources are added, but the situation as of January 2021 is described below. Several days in the week are each assigned a focus on a particular task, with a few spare days available for catch up.

Downloads (Monday)

Most of the downloads are scheduled for Monday. Because they take place on a weekly basis most can be done quite quickly - the main exception is that for EU CTR, which does not provide a 'date last revised' field. The current schedule is as follows, with the relevant parameters given:

Time	Target	Call
07:00	Yoda	...\DataDownloader.exe -s 101901 -t 102
07:30	BioLINCC	...\DataDownloader.exe -s 101900 -t 102
09:00	ClinicalTrials.gov	...\DataDownloader.exe -s 100120 -t 111
11:00	ISRCTN	...\DataDownloader.exe -s 100126 -t 112
13:00	WHO	...\DataDownloader.exe -s 100115 -t 113 -f "C:\MDR_sources\WHO\<file name>.csv"
14:00	EUCTR	...\DataDownloader.exe -s 100123 -t 142

Note that the WHO 'download' involves processing a specific named file (with a different date-stamp each week), and the file name must therefore be manually inserted into the call each week.
The EUCTR download is placed last because it is by far the longest and also the one most prone to errors, because of apparent issues and maintenance work on the web site. It may therefore need to be re-run the following day.

Harvests (Wednesday)

Harvests are relatively straightforward and are scheduled for Wednesdays. The ids for the sources are string arrays and therefore enclosed in quotes, even when there is only one of them. (The Downloader exe, by contrast, only expects a single integer source.)

Time	Target	Call
07:00	BioLINCC	...\DataHarvester.exe -s "101900" -t 1
07:30	Yoda	...\DataHarvester.exe -s "101901" -t 1
08:00	ClinicalTrials.gov	...\DataHarvester.exe -s "100120" -t 2
09:00	ISRCTN	...\DataHarvester.exe -s "100126" -t 2
10:00	EUCTR	...\DataHarvester.exe -s "100123" -t 2
13:00	WHO A	...\DataHarvester.exe -s "100116, 100117, 100118, 100119" -t 2
13:00	WHO B	...\DataHarvester.exe -s "100121, 100122, 100124, 100125" -t 2
13:00	WHO C	...\DataHarvester.exe -s "100127, 100128, 100129, 100130, 100131, 1000132, 101989" -t 2

The WHO A, B and C harvests refers to different collections of WHO registries. A processes sources 100116 (the Australia / New Zealand registry), 100117 (the Brazilian registry), 100118 (the Chinese registry) and 100119 (thew South Korean registry). B processes 100121 (the Indian registry), 100122 (the Cuban registry), 100124 (the German DRKS registry) and 100125 (the Iranian registry). C processes the rest - 100127 through to 100132 (Registries in Japan, Africa, Peru, Sri Lanka, Thailand and the Netherlands respectively), plus 101989 (the Lebanese registry).

Imports (Thursday)

Imports are also relatively straightforward, taking place on Thursdays. The parameters (just the source id(s)) and organisation of the calls is very similar to that for harvests.

Time	Target	Call
09:00	BioLINCC	...\DataImporter.exe -s "101900"
09:20	Yoda	...\DataImporter.exe -s "101901"
09:40	ISRCTN	...\DataImporter.exe -s "100126"
10:00	EUCTR	...\DataImporter.exe -s "100123"
11:00	ClinicalTrials.gov	...\DataImporter.exe -s "100120"
12:00	WHO A	...\DataImporter.exe -s "100116, 100117, 100118, 100119"
12:30	WHO B	...\DataImporter.exe -s "100121, 100122, 100124, 100125"
13:00	WHO C	...\DataImporter.exe -s "100127, 100128, 100129, 100130, 100131, 1000132, 101989"

Processing PubMed data (Friday)

Processing of the PubMed data is best done after all the other (study based) sources have been imported, because one of the two mechanisms for identifying relevant PubMed records uses references inside other source databases. It is therefore scheduled for Friday, and runs through all aspects of the extraction process, including two initial downloads. as shown below.

Time	Target	Call
09:00	PubMed	...\DataDownloader.exe -s 100135 -t 114 -q 10003
10:30	PubMed	...\DataDownloader.exe -s 100135 -t 114 -q 10004
17:00	PubMed	...\DataHarvester.exe -s "100135" -t 2
18:00	PubMed	...\DataImporter.exe -s "100135"

Aggregating data (Sunday)

The final phase is the aggregation of the data. which takes places in the sequence of: a) aggregation from source databases, b) creation of new core tables, c) creation of statistics of the aggregation, d) creation of JSON files, and e) zipping of JSON files. The detailed schedule is as below:

Revision as of 17:57, 13 January 2021 (view source) Admin (talk \| contribs) (→‎Imports (Thursday)) ← Older edit		Revision as of 17:59, 13 January 2021 (view source) Admin (talk \| contribs) (→‎Aggregating data (Sunday)) Newer edit →
Line 90:		Line 90:

	===Aggregating data (Sunday)===		===Aggregating data (Sunday)===
		+	The final phase is the aggregation of the data. which takes places in the sequence of: a) aggregation from source databases, b) creation of new core tables, c) creation of statistics of the aggregation, d) creation of JSON files, and e) zipping of JSON files. The detailed schedule is as below:

Difference between revisions of "System Scheduling"

Revision as of 17:59, 13 January 2021

Contents

Introduction

Downloads (Monday)

Harvests (Wednesday)

Imports (Thursday)

Processing PubMed data (Friday)

Aggregating data (Sunday)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

The Project

Metadata schemas

Data Structures

Data Extraction

The Portal

Help and F.A.Q.