Skip to content

Review Process for Linked Data

Andrew Fergusson edited this page Feb 20, 2023 · 3 revisions

This page is currently draft and not yet reviewed

Reviewing Linked Data Contributions

The Integrated Data Service (IDS) Dissemination Service reviews all Linked Data contributions in the same way regardless of the source of the data prior to publication on IDS Data Explorer. This is to ensure that the quality of the data is consistent and that the data is Findable, Accessible, Interoperable, and Reusable (FAIR).

We review the observational data, its metadata, and adopted code lists and taxonomies. We also review the data's publication process to ensure that it is repeatable and can be maintained by the data owner.

Our review process

We use csvcubed's built in inspect command to interrogate the metadata and data from a CSV-W. Using this commandline utility allows us to check with conformance to our Application Profile at a high level, and the various W3C standards used within it.

Overview of our process

flowchart TB

    start([Start])
    step1(Collection of CSV-W)
    step2(Inspect CSV-W)
    step3(Categorise issues)
    step4(Feedback to data owner)
    finish([Finish])

    start-- Download CSV-W -->step1
    step1-- commandline -->step2
    step2-- identify improvements -->step3
    step3-- consoldate feedback -->step4
    step4-- email -->finish
    step4-- update CSV-W if required -->start

We begin a review of a CSV-W by downloading the CSV-W locally to our machine with all of its dependencies. We then run the inspect command on the CSV-W to identify any issues with the metadata and data. As the inspect command uses the standards outlined in our Application Porfile to interrogate the CSV-W's data and metadata we can be ensure conformance to the standards and provide feedback on metadata encoding choices and code list reuse from this view.

Next we consolidate our feedback into a single markdown document, and return it to the data owner. Our feedback often includes comments on:

  • Improving the CSV itself (i.e. adopting tidy data)
  • The metadata describing the data set
  • Choices of code lists and other related datasets
  • Validation of dimensions, attributes, units, and measures and whether they are faithful to the data
  • Improving use of linked data by reusing of existing code lists and taxonomies

Our feedback often includes example code to address issues within the CSV, example json with suggestions to improve the metadata. We try to link to existing documentation where appropriate empowered to build high quality linked data themselves.

From this report the data publisher can update the CSV-W, and begin the process again. We find providing written feedback allows the data publisher to make changes to the CSV-W at their own pace, but for larger changes or more complex representation issues we are available to provide live support and training to ensure the data is aligned to standards and accurate.

An example of our asyncronous feedback can be found in a trial of using linked data for Faster Economic Indicators data publication.

Review rules for IDS-D

Within IDS-D we are republishers of data, as such we have strict division of responsibilities for metadata markup and updates to source data/CSV. These rules ensure that we faithfully represent the data we are republishing as Linked Data.

  1. We do not not adjust the observations within the source CSV unless we are harmonising units within the data and only if the conversion is lossless (i.e. no rounding or truncation of data).

    e.g., we will convert an observational column which is expressed in HH:MM:SS notation to a simple integer representation of seconds.

  2. We will unpick a combination time period column based on information within other columns. For example a column may contain dates expressed in the format YYYY-MM-DD with another column saying whether the value is the day or the beginning of the week starting on that date. From there we use csvcubed's templates for Date/Time periods to accurately represent the data. We consider this fair wrangling of the data.

  3. If we are responsibile for drafting the qube configuration file for the CSV-W, we will have another member of the team review the data before approving for publication.

  4. A final harmonisation review takes place before publication on the Integrated Data Service Data Explorer by a Senior or Lead Data Manager.

Our goal is to push all changes to the CSV back upstream to the data publisher, as well as the responsibility to maintain the qube configuration file. This ensures that the data is maintained by the data owner, who is best placed as subject matter experts to describe their data.

Arrange a data review

The Integrated Data Service Dissemination Service is available to review your official/national statistical CSV-W. If you would like to arrange a review please contact us at idps.dissemination@ons.gov.uk.

Clone this wiki locally