-
Notifications
You must be signed in to change notification settings - Fork 1
Review Process for Linked Data
This page is currently draft and not yet reviewed
The Integrated Data Service (IDS) Dissemination Service reviews all Linked Data contributions in the same way regardless of the source of the data prior to publication on IDS Data Explorer. This is to ensure that the quality of the data is consistent and that the data is Findable, Accessible, Interoperable, and Reusable (FAIR).
We review the observational data, its metadata, and adopted code lists and taxonomies. We also review the data's publication process to ensure that it is repeatable and can be maintained by the data owner.
We use csvcubed's built in inspect command to interrogate the metadata and data from a CSV-W. Using this commandline utility allows us to check with conformance to our Application Profile at a high level, and the various W3C standards used within it.
flowchart TB
start([Start])
step1(Collection of CSV-W)
step2(Inspect CSV-W)
step3(Categorise issues)
step4(Feedback to data owner)
finish([Finish])
start-- Download CSV-W -->step1
step1-- commandline -->step2
step2-- identify improvements -->step3
step3-- consoldate feedback -->step4
step4-- email -->finish
step4-- update CSV-W if required -->start
We begin a review of a CSV-W by downloading the CSV-W locally to our machine with all of its dependencies. We then run the inspect command on the CSV-W to identify any issues with the metadata and data. As the inspect command uses the standards outlined in our Application Porfile to interrogate the CSV-W's data and metadata we can be ensure conformance to the standards and provide feedback on metadata encoding choices and code list reuse from this view.
Next we consolidate our feedback into a single markdown document, and return it to the data owner. Our feedback often includes comments on:
- Improving the CSV itself (i.e. adopting tidy data)
- The metadata describing the data set
- Choices of code lists and other related datasets
- Validation of dimensions, attributes, units, and measures and whether they are faithful to the data
- Improving use of linked data by reusing of existing code lists and taxonomies
Our feedback often includes example code to address issues within the CSV, example json with suggestions to improve the metadata. We try to link to existing documentation where appropriate empowered to build high quality linked data themselves.
From this report the data publisher can update the CSV-W, and begin the process again. We find providing written feedback allows the data publisher to make changes to the CSV-W at their own pace, but for larger changes or more complex representation issues we are available to provide live support and training to ensure the data is aligned to standards and accurate.
An example of our asyncronous feedback can be found in a trial of using linked data for Faster Economic Indicators data publication.
Within IDS-D we are republishers of data, as such we have strict division of responsibilities for metadata markup and updates to source data/CSV. These rules ensure that we faithfully represent the data we are republishing as Linked Data.
-
We do not not adjust the observations within the source CSV unless we are harmonising units within the data and only if the conversion is lossless (i.e. no rounding or truncation of data).
e.g., we will convert an observational column which is expressed in
HH:MM:SSnotation to a simple integer representation of seconds. -
We will unpick a combination time period column based on information within other columns. For example a column may contain dates expressed in the format
YYYY-MM-DDwith another column saying whether the value is the day or the beginning of the week starting on that date. From there we use csvcubed's templates for Date/Time periods to accurately represent the data. We consider this fair wrangling of the data. -
If we are responsibile for drafting the qube configuration file for the CSV-W, we will have another member of the team review the data before approving for publication.
-
A final harmonisation review takes place before publication on the Integrated Data Service Data Explorer by a Senior or Lead Data Manager.
Our goal is to push all changes to the CSV back upstream to the data publisher, as well as the responsibility to maintain the qube configuration file. This ensures that the data is maintained by the data owner, who is best placed as subject matter experts to describe their data.
The Integrated Data Service Dissemination Service is available to review your official/national statistical CSV-W. If you would like to arrange a review please contact us at idps.dissemination@ons.gov.uk.