Difference between revisions of "Logging and Tracking"

From ECRIN-MDR Wiki
Jump to navigation Jump to search
(The source data tables)
Line 10: Line 10:
 
logging data layer - the logging repo<br/>
 
logging data layer - the logging repo<br/>
 
standard functions for doing the tracking<br/>
 
standard functions for doing the tracking<br/>
 +
 +
Logging of data dowwnload is critical because it provides the basis for orchestrating processes later on in the extraction pathway. A record is created for each study that is downloaded (in study based sources like trial registries) or for each data object downloaded (for object based resources like PubMed). The **'data source record'** that is established includes:
 +
* the source id,
 +
* the object's own id, in the source data (e.g. a registry identifier or PubMed id),
 +
* the URL of its record on the web - if it has one. This applies even to data that is not collected directly from the web, such as from WHO csv files.
 +
* the local path where the XML file downloaded or created is stored
 +
* the datetime that the record was last revised, if available
 +
* a boolean indicating if the record is assumed complete (used when no revision date is available)
 +
* the download status - an integer - where 0 indicates found in a search but not yet (re)downloaded, and 2 indicates downloaded.
 +
* the id of the fetch / search event in which it was last downloaded / created
 +
* the date-time of that fetch / search
 +
* the id of the harvest event in which it was last harvested
 +
* the date-time of that harvest
 +
* the id of the import event in which it was last imported
 +
* the date-time of that import
  
 
===The source parameters===
 
===The source parameters===

Revision as of 00:02, 3 November 2020

The need for logging and tracking

Separate components, all need to refer to a common 'map' of the status of the system
makes it easier to see what is happening for system, what has happened for users
Therefore, records of each of the 4 main processes, map of data package statuses, logging of issues

The source data tables

The source data tables
<< diagram of table structure>>
study-study links (??? - in nk surely)
logging data layer - the logging repo
standard functions for doing the tracking

Logging of data dowwnload is critical because it provides the basis for orchestrating processes later on in the extraction pathway. A record is created for each study that is downloaded (in study based sources like trial registries) or for each data object downloaded (for object based resources like PubMed). The **'data source record'** that is established includes:

  • the source id,
  • the object's own id, in the source data (e.g. a registry identifier or PubMed id),
  • the URL of its record on the web - if it has one. This applies even to data that is not collected directly from the web, such as from WHO csv files.
  • the local path where the XML file downloaded or created is stored
  • the datetime that the record was last revised, if available
  • a boolean indicating if the record is assumed complete (used when no revision date is available)
  • the download status - an integer - where 0 indicates found in a search but not yet (re)downloaded, and 2 indicates downloaded.
  • the id of the fetch / search event in which it was last downloaded / created
  • the date-time of that fetch / search
  • the id of the harvest event in which it was last harvested
  • the date-time of that harvest
  • the id of the import event in which it was last imported
  • the date-time of that import

The source parameters

Holds a central position
Sources are orgs so that the original records for each source are in the contexts organisation table
That gives the ids
But processing specific data is in this table
Used in all phases of processing to know what tables to expect to find / process in each database

The event records

The events tables and types
Creating and filling event records
statistics linked to aggregation

Extraction notes

Purpose and usage
Extraction tables
feedback and notes, serialising feedback