Imagine
- Define as a URL with a JSON API
- expressing ODMv2 via the semantic web
- self-descriptive datasets without bloat
- clinical data with precise meanings that represent it as a graph
- an accessible format for both humans and apps to use for specs, metadata, schema, and lookups
JSON-LD originated from W3C and ended up as Google's method of choice for linking up the web for SEO. As formats go, JSON-LD is as FAIR and interoperable as they come.
It is designed for human-readability in compacted form, while its expanded form resolves every term to an IRI for complete and unambiguous machine-readability.
A simple clinical dataset can be transformed into rich graph data by including a reference to its metadata in the form of a JSON-LD context
Dataset-JSON serves as:
- data container
- column description (optional)
- implicit link to other metadata via StudyOID and VersionOID
Addition of a single reference to its 'Define-LD' contextualises any Dataset-JSON:
- record exactly what was transferred
- add additional metadata around the transfer e.g. authors, timestamps, source, target
- explicit reference to Define content i.e. dataset specs, data transfer agreement
- explicit reference to Dataset and ODM Schema i.e. machine-readable API spec for apps and access
- map Dataset-JSON content to RDF
Define-LD is the manifest providing precise context to the transferred files, serving as:
- explicit link between to Define and constituent datasets
- mapping between ODMv1 Define (in JSON format) and graph
- mapping between ODMv2 Define (in JSON format) and graph
- preprocessing translator to understand variety of dataset formats and standards versions
- URL accessibility layer for server-based dataset metadata
Together they are more than the sum of their parts, contextualising the data explicitly.
Imagine a single line added to the top of your Dataset-JSON data transfer object
"@context" : "define.json"This could be also be handled over API for use cases that need a 'single source of truth', such as a shared repository
"@context" : "https://library.cdisc.org/study/COSA/BP12345"The API approach obviates the need for an accompanying file, addressing one of the criticisms of Define-XML and Dataset-XML
The accessibility of JSON, pointing to single source of truth that links to graphs:
that sounds like what both regulators (FDA, EMA, PMDA etc.) and application developers are looking for
How JSON-LD links Dataset-JSON content all the way to the semantic web:
-
Starting with Dataset-JSON
-
"@context" added to the top of transfers with a link to your Define-LD study spec
-
Define-LD is the JSON version of ODM/Define-XML. Its "@context" in turn links your study metadata to the Define-LD schema
-
The Define-LD Schema gives context to your spec in terms of ODM and schema.org. Your Dataset-JSON is now linked data that can be explored as part of the semantic web.
-
The Define-LD Schema is used to expand the your data into precise entity IRIs (internationalised resource identifiers)
-
Resulting JSON-LD is deserialised into RDF
-
RDF triples/n-quads can queried, and loaded to your meta-model or graph of your choice
- Very simple standard (so much like JSON that you would be forgiven for not realising that you are actually working with RDF)
- Adds linkages, semantics and context to existing JSON
- Adds basic schema support to JSON (SHACL still better)
- When loaded into a graph, enables joins and graph search
- Is how schema.org organises the web for consumption by Google and Bing, making it by far the most popular linked data transfer format
- String internationalization https://w3c.github.io/json-ld-syntax#string-internationalization
- Index maps give meaning to array entries https://w3c.github.io/json-ld-syntax/#index-maps
- Only expresses the triples contained within the document. To infer more it would have to be deserialized into graph
- It's quite new - CDISC has historically leaned heavily on XML.
Open-source Clinical Trial Tools
JSON-LD Demo Server Setup (this project)
Define via Manifest File Demo (this project)