from original 'Requirements'...
Portal Core Functionality
1) The core purpose of the metadata repository is to allow users to identify the data objects linked to a clinical research study, or group of such studies, and obtain information on the accessibility of those data objects.
Here a study is any clinical research study with humans as study participants, and which is therefore subject to ethical approval, whether or not the study is interventional (a 'clinical trial'), or observational (including disease registry data), or a case study.
A data object is any information available in electronic form (a 'data stream') and may be a document (e.g. a pdf), a dataset in one of a variety of formats (spreadsheet, csv files, database file, XML etc.), a media (audio, video) file, an image (e.g. a conference poster) or simply a web page with useful text.
2) Information on all studies, and on all the data objects linked to each study, should be available within a single portal system that accesses and aggregates data from a wide variety of source systems. In other words the user should not have to leave the main web portal when carrying out a search for study / data object information.
3) The portal is designed to be public and completely freely available. There is no user identification or registration required, and therefore no authentication or authorisation mechanisms are needed. No information need be stored about users.
4) Because initial searches (see below) are likely to use study identifiers and characteristics, basic information on studies should be included and visible within the portal system - including names, identifiers, a brief description, status, type, associated topics (keywords), important contributors, and design characteristics. Any statement from the study sponsors / study leads on data sharing should also be visible in the system.
The system does not, however, need further details, for example of study design, outcome measures, inclusion / exclusion criteria, and clinical sites because such information is available from following links, e.g. to trial registry pages.
5) Where a data object is freely and publicly available, e.g. a link to an open access journal article, or a trial registry entry, the system should provide a direct link to the object so that the user can proceed to immediately view / download the data object.
6) Where a data object is available under restricted access (e.g. requiring application to the investigator for data, or to a metadata / data repository acting on behalf of the sponsor), then as far as possible the details of the access arrangements should be displayed within the portal. These should be supplemented by a url that links to a web page with a more detailed description of the access procedure.
7) Any link within the system should open the target web page within an additional tab or browser window. The portal interface should remain open and not be replaced by the linked page.
8) Any link within the system should be checked to ensure that it is still valid - i.e. gives a 200 success response rather than a 404 (though this does not guarantee that the content is as described by the system). Link checking is expensive in terms of time but and needs to be run independently of the data collection and aggregation processes.
Portal Basic Layout
1) The portal is envisaged as a single page application (SPA). A fixed header and a 'sticky' footer strip sits either side of the central working area and has links to static content and forms (see section on static content below). The central working area is designed for user input, for the user's searches, and returns the result of those searches (see figure 1 below).
2) The portal should be called a 'Metadata Repository for clinical trial data objects'. The main header logo is 'MDR' - see the mock-up of the interface below.
3) Logos of participating organisations should be included in the footer strip.
4) Within the central working area, the top area should be used for inputting search parameters to identify studies of interest. In Figure 1, the top area has been used to input a study id.
5) Within the central working area, the left and right margins should be used for filtering returned records, and / or requesting serialisation of search results to local storage or files. Vertical scrolling of the results will therefore be restricted to the central area of the screen.
Search Capability Required
1) There should be three ways of searching for studies, and hence for their associated data objects:
a) From a study identifier - most often a trial registry id but other identifiers should be possible.
b) Using keywords, found within study titles or the topics / keywords linked to studies
c) From a identifiable linked data object - i.e. an object with a doi or a unique name, such as a published article.
2) Users should be able to first select the search method they want to use, and then insert the relevant parameters. The interface will therefore have to change to reflect the initial choice made. The study search parameters are entered into input boxes at the top of the working area.
3) When identifying via a study id the user should first select the identifier type (e.g. registry id, funder's id, sponsor's id - the options will be available from the identifier_types lookup table), then insert the identifier value, and then click on Find to return the study, as in Figure 1.
4) When identifying a group of studies using keywords the user needs to input those terms, in either the title or the topics boxes, or both. The system should then use a text search for those words to bring back a list of related studies. (see figure 2). If both text boxes are used the returned list should be a union of those found with and duplicates removed - using both boxes is therefore equivalent to an 'OR' clause combining the title and the topic search.
5) It should be possible to use boolean logic to combine text based search terms in various ways - in particular
- to signify an OR condition using the keyword OR, or an equivalent symbol, such as '|'.
- to signify an AND condition using the keyword AND, or an equivalent symbol, such as '&'.
- to signify a NOT condition using the keyword NOT, or an equivalent symbol, such as '!'.
- to combine boolean clauses in a particular order using brackets
In figure 2, the search is for any study that contains a title containing 'celiac' or 'coeliac', and contains either 'screen', 'prevent' or 'health' and 'promot'. In addition, any study with associated topics that included 'coeliac disease' or 'celiac disease', but excluding those that also contained 'genetic' or 'omic' in their topic list, should be added to the list.
6) The text based searching should be case and accent insensitive, and allow for fuzzy matching and have sufficient ‘intelligence’ to cope with common misspellings and spelling variations, some translated terms, and common acronyms and synonyms. The doubling of 'coeliac' and 'celiac' should not be required in the mature system. (A list needs to be prepared).
7) When using search via a data object the user is requested to insert a doi (recommended) or the title of the object - expected to usually be a journal article. The system should then return the study, or studies, which are linked to that data object (and in the process a list of all the other data objects linked to that study) - see figure 3.
8) The search methods may produce single (a and c) or multiple (b or c) study records in response to the search criteria used. The study should allow the aggregation of results from successive search events - e.g. a user may insert 5 study identifiers, each returning a study into the results area, as a steadily growing 'stack' of returned studies. The user should be able to delete any specific study or the whole stack. In Figure 2 the red cross on each study header bar is designed to be used as a button to remove that study from the list found.
Format of Returned Data
Any study returned should be represented on screen (see figure 3) by
1) A 'header bar' that shows the study title. Very long titles (e.g. > 500 characters) may need to be truncated and end with an ellipsis (...). The bar should include an arrow or similar indicator that allows the details panel (see below) to be expanded or hidden. It also should contain buttons or icons that allow an 'expand all' or a 'collapse all' format for the listed data objects.
2) A 'details bar' that presents basic information about the study and the linked data objects. The nature of each data object should be clear, e.g. using a type indicator on the left of the panel. There should then follow a brief description of the data object.
3) Note that the topmost 'data object' is not an object per se, but instead should always be a brief description of the study. This could perhaps be in a different font style to distinguish it from the data objects that follow.
4) For studies that include such a statement - currently a small minority - an explicit statement on the study's data sharing policy should be added to the results panel.
5) The various data objects found for that study are then listed. If publicly available that listing should include an href tag pointing to the object itself.
6) Note that by default study and data object descriptions are limited to 2 lines on screen. When, as will often be the case, the description exceeds that limit, it should be truncated and ended with an ellipsis - at least on initial display. When truncation occurs the object title should have a button, link or icon that allows the user to expand the description to its full size. When expanded, the button, link or icon should change its caption and toggle the description bac=k to a 2 line size.
7) Expansion of an object description should not represent a repeat call to the database. Instead the full descriptions should be retrieved with the study, and the display processing should take place on the client. As well as being to expand / contract individual descriptions, the user should be able to expand / contract all descriptions for a study at the same time, by using suitable buttons or icons on the study's header bar. If the study's details panel is hidden (again by using a button on the header bar) than the system should 'remember' the configuration of the panel, in other words if the panel is re-opened the contracted / expanded state of individual data items should be restored to the state they were in when it was hidden.
8) Each listed data object (but not the study description, and if present the data sharing statement) should also indicate its publication year, and by means of a simple traffic light system (or something similar) indicate if the object is publicly available (green), available but under managed access (orange) or not available at all, at least at the moment (red).
9) Each listed data object should also indicate, by means of a check box (that will be checked by default) whether it should be included in any saved and / or printed results list.
10) At the moment about 60% of the studies in the system only have a single data object associated with them - the original trial registry entry. It is hoped that in time the related journal articles, or for commercially sponsored studies the clinical study reports (at least) will be found for all published studies, and that results data will be catalogued as it gradually appears in public. Studies with more than 3 or 4 data objects listed are comparatively rare but do illustrate the potential value of the system.
Filtering Study Lists
Once a list of studies has been obtained, it should be possible to further filter it if desired, by selecting for particular values within specific study characteristics or design points. These include:
- study type - e.g. interventional versus observational (see study_types table).
- study status - e.g. completed versus (see study_statuses table).
- gender applicability - e.g. both, male only, female only (see study_topics table).
- age range - against figures for minimum and / or maximum age eligible, usually in years
for interventional studies only (see topic_categories table)
- intervention type
- allocation model
- primary purpose
for observational studies only (see topic_categories table)
- observational model
- time perspective
- biospecimens retained
It is suggested that a 'Filter Studies' screen widget could be placed on the left hand side of the central area, above the 'Filter Data Objects' widget.
For each filter item, it should be possible to select multiple options to match, i.e. selection will probably require a check box or equivalent mechanism against a list of options, rather than a single selection from a drop down box (see figure 4).
Filtering Data Objects
1) Once a list of matching data objects is obtained, it should be possible to filter that list on a variety of characteristics. These include
- By object type, e.g. protocol, individual participant data, statistical analysis plan, published papers etc. (see object_classes, object_types tables - given the large number of object types a 2 stage filtering mechanism may be required - that shown in Figure 4 may therefore need to be modified.)
- By access type, e.g. public, managed using group membership, managed on a case by case basis (see object_access_types)
- By publication year (where publication year means the year it became available under some mechanism, not necessarily the year when it became public). Usually this will be by filtering for objects from before or after a particular date.
- By publisher / provider, or in terms of the MDR's metadata by 'managing organisation'. In this case the available options should include only the managing organisations present in the current list of linked data objects, for all the studies on screen.
2) Changing the selection of filtering options should automatically uncheck the checkbox next to objects that do not meet the selected criteria. In other words there is not a requirement to re-interrogate the database, only to work on the client to show the changed selection, which also provides immediate visual feedback to the user.
3) The user should also be able to filter objects on an individual basis by unchecking the check box next to the object entry. Again this does not remove them from the screen or study details panel, only excludes them from any subsequent storage or printing operations.
Saving and Printing Features
It should be possible to serialise the results of any search so that they are available for future reference. The difficulty is that any web site is not, in general, allowed to write to the local file system, for security reasons, though they can read some material if it is located in a predefined location. Nevertheless the system needs to be able to provide some serialisation capabilities. In particular:
1) The user should be able to 'print' out results, by clicking a button and asking for a pdf or similar document to be generated by the system (on the server), which can then be displayed through the browser and then downloaded. This means that the system needs to include a pdf generating library or something that can offer similar functionality. (An alternative might be a mechanism that created csv or similar files and made them available to download).
2) The user should be able to save any results (as a named set) into local storage. Local storage is a ring-fenced area of a client's system that is accessible to a browser, and is available for all of the major browsers. The user would be asked to name the set of results being stored, and then the data (and the parameters that generated it) would be stored locally as key-data pairs.
3) The user should be able to re-open any stored result set and redisplay both the results and the search / filter parameters. Because of the limitations of local storage the same browser would need to be used as was used when the record was created, and the same machine. Prior to selecting a particular set for redisplay, the system should be able to provide a list of the datasets available, each with a date-timestamp.
1) A link should be available to some user help pages, that outline the options available for searching and filtering, and describe how results can be saved. To be written once portal is in an advanced state of development, to allow for inclusion of screen shots.
2) A link should go through to an 'about' page, that simply describes
- the project - its participants and funders (can be taken from the wiki)
- the version of the portal, the technologies used to produce it, relevant dates (to be written)
- technical contact emails (to be added)
3) A 'Data sources and Contributing organisations' link that goes to a page listing the data sources used and the nature and amount of the data read from each. To be written and updated as the project proceeds.
4) A 'Contact us' link that allows users to contact the project team about including their data, and also allows people to report an error and / or request an update of data on an individual basis.
This is probably best implemented as a simple form that includes some standard subject headings, and which automatically generates an email to a project email address at ECRIN.
In the mock up the 'help' and 'about' links are in the header, the 'data sources' and 'contact us' links are in the footer, but the placement is not critical - as long as the links are somewhere the fixed header and footer regions.