diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..2013949 --- /dev/null +++ b/.gitignore @@ -0,0 +1,14 @@ +# Jekyll output +_site/ + +# Ruby/Bundler +Gemfile +Gemfile.lock + +# Editor backups +.DS_Store + +.sass-cache/ +.jekyll-cache/ +.bundle/ +vendor/ \ No newline at end of file diff --git a/Assignments/assignment1A.md b/Assignments/assignment1A.md deleted file mode 100644 index 41d31cb..0000000 --- a/Assignments/assignment1A.md +++ /dev/null @@ -1,130 +0,0 @@ -Assignment 1: Which variant of which gene predicts a positive prognosis in colorectal cancer -================= - -[HOME](https://DeniseSl22.github.io/SPARQLTutorials/) - -During this assignment, we will have a closer look at an example SPARQL query of Wikidata, called ["Which variant of which gene predicts a positive prognosis in colorectal cancer"](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Which_variant_of_which_gene_predicts_a_positive_prognosis_in_colorectal_cancer). We will first go through the basics of a SPARQL query. Second, we will find out how to execute the query and retain or share results. Last, we will expand the query and make other (small) changes, to understand the structure of a SPARQL query better, and see what other data is available in Wikidata. - -## What goes Where - -A SPARQL query consist out of several elements, which can be considered as building blocks. -We will use the following [example](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Which_variant_of_which_gene_predicts_a_positive_prognosis_in_colorectal_cancer), which is part of the example page of SPARQL queries for Wikidata. -We will exploring the example question called **"Which variant of which gene predicts a positive prognosis in colorectal cancer"** in more detail below. - -### First element: SELECT - -The first element we encouter in this example, is the so-called _result clause_, which identifies what information to return from the query. -This element starts with the word SELECT, and is then followed by two words with a questionmark in front of them: - -```sparql -SELECT ?geneLabel ?variantLabel -``` - -SELECT is used to indicate with variables from the (to follow) SPARQL query you want to visualise as a result (in other words: which variables we find relevant as output to answer our biological question). In this example, the name(s) of the gene(s) predicting a positive prognosis in colorectal cancer (?geneLabel), and the name of the variant(s) belonging to this gene/these genes (?variantLabel). - -### Second element: WHERE - -The second element we encouter, is the _query pattern_, which starts with the word WHERE, with the query itself enclosed in curly brackets: {} . - -```sparql -WHERE -{ - query -} -``` - -#### Variable Names - -Within these brackets, the data from an RDF is connected to variable names. We already encoutered two variable names, ?geneLabel and ?variantLabel (indicated by the questionmark). These variables can have any name you see fit. - -**Question1:** Which other variable names are present in the following query? (Answers can be found [here](../Answers/AnswersAssignment1.md)). - -```sparql -{ - VALUES ?disease {wd:Q188874} - ?variant wdt:P3358 ?disease ; # P3358 Positive prognostic predictor - wdt:P3433 ?gene . # P3433 biological variant of -} -``` - -#### Query Details - -In between the curly brackets in the query above, we can see three lines to describe the query we want to execute: -1. VALUES ?disease {wd:Q188874} -1. ?variant wdt:P3358 ?disease ; # P3358 Positive prognostic predictor -1. wdt:P3433 ?gene . # P3433 biological variant of - -##### Line 1 -The first line starts with VALUES; this allows you to queries multiple items, specified in the data block between the brackets at the end of this line. Multiple items should be separated with spaces. The variable after VALUES (in this case ?disease) will be completed with the items in the data block. In this case, we are only interested in "colorectal cancer", with 'Q188874' being the identifier of the entry in Wikidata (these always start with a Q) for **Colon Cancer**. To explain to which database this identifier belongs, we add the 'wd:' before the number of the identifier. - -##### Line 2 -The second line consists of two main parts, the left part is related to the query (which ends with a semicolon ';' ), the right is a comment, to explain what the line does (starting with a hash '#'). The left part is formed in the triplet structure (which we previously encountered in the presentation): - -```sparql -{ -?variant wdt:P3358 ?disease ; -} -``` - -In this case, we are looking for variant(s) (_subject_), which are related to disease(s) (_object_). The relationship (aka _predicate_) is defined as the center part of the triplet, with 'P3358' being the identifier of the property in Wikidata (these always start with a P) for **Positive prognostic predictor**. To explain to which database this relationship belongs, we add the 'wdt:' before the identifier of the property. Notice the small difference with line one; 'wd:' is used for entries (these can serve as subjects or objects), 'wdt:' for properties (or relationships). - -This line ends with a semicolon ';' to end our first triplet. - -##### Line 3 -The third line again consists out of two parts; we will only discuss the left had side, until the point '.' . - -```sparql -{ -wdt:P3433 ?gene . -} -``` -Even though this line does not directly look like a triplet (since there are only two element), it is read as a triplet. -A point '.' at the end of a triplet really notes the end, while a semicolon ';' notes that another triplet will follow, where the _subject_ may be ommited from writing. Therefore, we need to look back at line 2, to find out which _subject_ is belonging to the triplet in line 3: ?variant. This line of the SPARQL query is therefore read as: - -```sparql -{ -?variant wdt:P3433 ?gene . -} -``` - -In this case, we are defining how genes are related to the variants in Wikidata. We are connecting the variant(s) (_subject_) with genes (_object_) in Wikidata, with the following relationship(_predicate_): "P3433", which is also known as "biological variant of". - -### Third element: SERVICE - -Note that the ?geneLabel and ?variantLabel are not written down in the query, however the variables ?gene and ?variant are. This is possible with the last part of the query, the SERVICE element. By typing the word "Label" (including the capital!) behind the name of a variable, we can obtain the name of that variable, in stead of the identifier used by the RDF. A name makes much more sense to us humans, and allows us to interpret the results. - -```sparql -{ - SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } -} -``` - -Additional Remark: the SERVICE clause is not default SPARQL behaviour; it is part of the Wikidata SPARQL structure (like some of the visualisation options you see later on). Therefore, this statement will (most likely) not work when building a query -in any other database. The actual SPARQL query to retrieve labels without using SERVICE is explained [here](../Assignments/AddendumBioSb2019.md) - -### Full query - -When we combine the three elements above, we get the full query: - -```sparql -SELECT ?geneLabel ?variantLabel -WHERE -{ - VALUES ?disease {wd:Q188874} - ?variant wdt:P3358 ?disease ; # P3358 Positive prognostic predictor - wdt:P3433 ?gene . # P3433 biological variant of - SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } -} -``` - -We wanted an answer for the following question: "Which variant of which gene predicts a positive prognosis in colorectal cancer". - -**Question 2A:** Which item in the SPARQL query corresponds to the disease(s) being queried? - -**Question 2B:** Which item in the SPARQL query adds a name to the results? - -(Answers can be found [here](../Answers/AnswersAssignment1.md)). - -We will now look how to run this query on the data from Wikidata, and how we can save the results from that query in the [next exercise](../Assignments/assignment1B.md). - -[HOME](https://DeniseSl22.github.io/SPARQLTutorials/) diff --git a/Assignments/assignment1C.md b/Assignments/assignment1C.md deleted file mode 100644 index 411eff6..0000000 --- a/Assignments/assignment1C.md +++ /dev/null @@ -1,93 +0,0 @@ -[HOME](https://DeniseSl22.github.io/SPARQLTutorials/) - -## Change is Coming - -### More diseases: -We have now limited ourselves to only one disease, colorectal cancer. If we would like to add another disease, such as "breast cancer" (Q128581), we would need change the VALUES line in our query (original below): - -```sparql -{ - VALUES ?disease {wd:Q188874} -} -``` - -By adding the identifier (Q128581) for the Wikidata entry called "breast cancer" to the VALUES element, we can expand our query to include two diseases. Change the line depicted above in the SPARQL endpoint to the following and click the blue run button again: - -```sparql -{ - VALUES ?disease {wd:Q188874 wd:Q128581} -} -``` - -You should now see more results, compared to our previous endeavour. - -**Question 3:** How will the line above look, when we also want to add stomach carcinoma (Q18556832) to our list? - -(Answers can be found [here](../Answers/AnswersAssignment1.md)). - -### Which diseases? -Since we are obtaining more results by adding more diseases to our query, it would be great if we know to which disease which variant is related. In order to obtain the disease in the results, we should change the _result clause_ section of our SPARQL query: - -```sparql -SELECT ?geneLabel ?variantLabel ?disease -``` - -Click the play button again; there should now be three columns in your results panel... However, the disease column is only giving us the identifier from Wikidata, not the name of the disease. - -**Question 4:** How should the line above look, when we want to see the name of the disease in our results panel? - -(Answers can be found [here](../Answers/AnswersAssignment1.md)). - -### Easier querying: Adding diseases with entry search function -Finding the identifiers for each entry you are interested in, can be done very easily with the entry search function. If we would like to add the disease "ovarian cancer" to our list of diseases of interest, we could do the follwing: -1. In the SPARQL endpoint, find the VALUES line. -1. Click just before the last curly bracket '}' . -1. Type a space ' ', and then 'wd:' . -1. Now hit Ctrl and the spacebar on your keyboard simultaneously (Windows, for Apple: CMD in stead of Ctrl). -This should open up the entry search field of Wikidata (see image below). -![Search Entry query 1](../Images/Search_Entry_Wikidata.jpg) - -1. Type the words 'ovarian cancer' in the search field, which should trigger a search in all entries in Wikidata (see image below). -![Search Entry query 1](../Images/Search_Entry_Wikidata_Ovarian_Cancer.jpg) - -1. Click on the entry with identifier Q172341; this adds the identifier to your list of VALUES. -1. Run your query again. - -### Adding protein images -The SPARQL endpoint of Wikidata has several interesting data visualisation options; we will use one to add protein domain images for the genes we just queried. -1. Just above the SERVICE element, add the following line: - -```SPARQL -?gene wdt:P18 ?image . -``` -2. Click on the play button... What just happened? We had 13 variants, and now the results went down to 6?! - -Since not all genes have an image in Wikidata, we are only retrieving the ones that have an image. This can be avoided by using an OPTIONAL statement, such as: -```SPARQL -OPTIONAL{?gene wdt:P18 ?image }. -``` - -However, we are not seeing the images in our results panel. Every time we want to see a variable that we are querying, we need to add it to the SELECT statement. -1. Change the SELECT statement to the following: -```SPARQL -SELECT ?geneLabel ?variantLabel ?diseaseLabel ?image -``` -2. Click on the play button... We do not see the images directly, we do get a link to the images in a Table. If we want to actually see the images, we need to change the visualisation options of the SPARQL endpoint. -3. Directly under the Run button, there is an option called 'Table". Click on this option, and select the option "Image Grid": - -![Image Grid query 1](../Images/Image_grid_Wikidata.jpg) - -Now, the genes in Wikidata which have an image connected to them, are displayed. - -![Image example query 1](../Images/Images_genes_Wikidata.JPG) - -If you would like to have images for all the genes you queried, you can add these to Wikidata yourself. Since the data in Wikidata is built by community efforts, everyone can get involved. If you would like to know more about becoming a database editor and/or curator for Wikidata, ask one of the instructors for more information. - -## Next assignments: - -To continue, you can do one of the following: -1. Progress to [Assignment 2](../Assignments/assignment2A.md), where we will discuss another query in more detail -1. Stay with the current query to adapt it to your own needs. Several example questions to work on are given in this [additional assignment](../Assignments/assignment1D.md). - -[HOME](https://DeniseSl22.github.io/SPARQLTutorials/) - diff --git a/Images/Image-NewSnorqlInterface.png b/Images/Image-NewSnorqlInterface.png new file mode 100644 index 0000000..67c375b Binary files /dev/null and b/Images/Image-NewSnorqlInterface.png differ diff --git a/Images/plantmetwiki-logo.png b/Images/plantmetwiki-logo.png new file mode 100644 index 0000000..7c4beb8 Binary files /dev/null and b/Images/plantmetwiki-logo.png differ diff --git a/README.md b/README.md index c3db611..25705f7 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,110 @@ -# SPARQLing Biology: a beginners course. +# SPARQLing Plant Metabolic Pathways Wiki -This [SPARQLing Biology](index.md) workshop material in such a manner that it can be used in other workshops. +This repository contains the **SPARQLing Plant Metabolic Pathways Wiki** tutorial material, adapted from the original *SPARQLing Biology* workshop so that it can be reused in other workshops. -Read the latest version of the workshopmaterial online at [https://DeniseSl22.github.io/SPARQLTutorials/]. +- 🌱 Online tutorial: +- 🌐 PlantMetWiki SPARQL Explorer: -The material for this workshop is available under [CC-BY-SA 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/legalcode) licence. +This tutorial was adapted from the course materials available at +. -Authors: +License for this tutorial and source code: +[CC-BY-SA 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/legalcode) -* Egon Willighagen -* Marvin Martens -* Denise Slenter +## Credits +Authors of the original SPARQLing Biology material: -We would like to acknowledge the material provided at [https://github.com/egonw/fvtworkshop] by Egon Willighagen, Ruud Steltenpool and Lars Willighagen, which has been used to construct this workshop -(material is only available in Dutch). +* Egon Willighagen +* Marvin Martens +* Denise Slenter -* Part of this Material has been tested at the [BioSb conference](https://www.bigcat.unimaas.nl/sparqling-biology-breakout-session-at-biosb-2019/) taking place at the 3th of April 2019 in Lunteren. The specific material for this workshop can be found [here](https://bigcat-um.github.io/SPARQLTutorialBioSB2019/). +We would like to acknowledge the material provided at + by Egon Willighagen, Ruud Steltenpool and Lars Willighagen, which has been used to construct this workshop (material is only available in Dutch). + +Part of this material has been tested at the +[BioSB conference breakout session](https://www.bigcat.unimaas.nl/sparqling-biology-breakout-session-at-biosb-2019/) taking place on the 3rd of April 2019 in Lunteren. +The specific material for this workshop can be found at +. + + +--- + +## Serve this website locally for development (macOS, tested) + +These instructions assume: + +- macOS +- [Homebrew](https://brew.sh/) installed +- You are in this repository (e.g. `cd /path/to/SPARQLTutorials`) + +The site uses Jekyll with the GitHub Pages theme **`jekyll-theme-tactile`** and is best run via **Bundler**, so the local environment matches GitHub Pages. + +### 1. Install Ruby (via Homebrew) + +```bash +brew install ruby +``` + +### 2. Ensure Homebrew Ruby is on your PATH + +For Apple Silicon (M1/M2/M3): +``` +echo 'export PATH="/opt/homebrew/opt/ruby/bin:$PATH"' >> ~/.zshrc +source ~/.zshrc +``` + +For Intel Macs: +``` +echo 'export PATH="/usr/local/opt/ruby/bin:$PATH"' >> ~/.zshrc +source ~/.zshrc +``` + +You can check Ruby with: +``` +ruby -v +``` + +### 3. Install Bundler + +``` +gem install bundler +``` + +### 4. Install the site dependencies with Bundler + +From the repo root +(SPARQLTutorials): + +``` +cd /path/to/SPARQLTutorials +bundle install +``` + +This uses the Gemfile in the repository to install: +- jekyll +- jekyll-theme-tactile +- jekyll-seo-tag +- and any other required gems. + + +### 5. Serve the site locally +``` +bundle exec jekyll serve --port 4001 +``` + +Jekyll will print something like: +``` +Server address: http://127.0.0.1:4001/ +Server running... press ctrl-c to stop. +``` + +Open the URL in your browser (usually http://127.0.0.1:4000/). +You should see the tutorial rendered with the same tactile theme as on GitHub Pages. + + +## Feedback + +If you have feedback on this tutorial or find an issue, please open a GitHub issue in this repository: + +https://github.com/pathway-lod/SPARQLTutorials/issues \ No newline at end of file diff --git a/_config.yml b/_config.yml index 259a24e..9d8dce8 100644 --- a/_config.yml +++ b/_config.yml @@ -1 +1,13 @@ -theme: jekyll-theme-tactile \ No newline at end of file +theme: jekyll-theme-tactile +permalink: pretty +# ensure your custom CSS loads +plugins: [] +markdown: kramdown +kramdown: + auto_ids: true + +# made a collection of rendered pages +collections: + tutorial: + output: true + permalink: /Assignments/:name/ \ No newline at end of file diff --git a/_includes/head-custom.html b/_includes/head-custom.html new file mode 100644 index 0000000..a7612e5 --- /dev/null +++ b/_includes/head-custom.html @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/_includes/sidebar.html b/_includes/sidebar.html new file mode 100644 index 0000000..8157487 --- /dev/null +++ b/_includes/sidebar.html @@ -0,0 +1,95 @@ + + + +{%- comment -%} +------------------------------------------------------------ +3 Prev/Next buttons (auto, based on order) +This replaces page.prev/page.next front matter hardcoding. +------------------------------------------------------------ +{%- endcomment -%} + +{%- assign items = site.tutorial | sort: "order" -%} +{%- assign idx = nil -%} +{%- for p in items -%} + {%- if p.url == page.url -%} + {%- assign idx = forloop.index0 -%} + {%- endif -%} +{%- endfor -%} + +{%- if idx != nil -%} + {%- assign prev = items[idx | minus: 1] -%} + {%- assign next = items[idx | plus: 1] -%} + +
+ {%- if prev -%} + ← {{ prev.title }} + {%- endif -%} + {%- if next -%} + {{ next.title }} → + {%- endif -%} +
+{%- endif -%} + + +
+

+ The material for this tutorial is available under + + CC-BY-SA 4.0 International + . + If you have feedback on this documentation, please submit it as a + + GitHub issue + . +

+
\ No newline at end of file diff --git a/_includes/toc.html b/_includes/toc.html new file mode 100644 index 0000000..4074b02 --- /dev/null +++ b/_includes/toc.html @@ -0,0 +1,28 @@ +{%- assign html = include.html | default: page.content -%} + +{%- comment -%} +If we were passed Markdown (no + {%- assign parts = html | split: "" -%} + {%- assign title = pieces[1] | split: "<" | first | strip -%} + + {%- if id != "" and title != "" -%} +
  • + {{ title }} +
  • + {%- endif -%} + {%- endif -%} + {%- endif -%} + {%- endfor -%} + \ No newline at end of file diff --git a/_includes/topbar.html b/_includes/topbar.html new file mode 100644 index 0000000..f5bb459 --- /dev/null +++ b/_includes/topbar.html @@ -0,0 +1,19 @@ + \ No newline at end of file diff --git a/_layouts/docs.html b/_layouts/docs.html new file mode 100644 index 0000000..2db00e1 --- /dev/null +++ b/_layouts/docs.html @@ -0,0 +1,17 @@ +--- +layout: default +--- + +{% include topbar.html %} + +
    + + +
    + {{ content }} + +
    + +
    diff --git a/_tutorial/1.UnderstandingSPARQL.md b/_tutorial/1.UnderstandingSPARQL.md new file mode 100644 index 0000000..e3718d3 --- /dev/null +++ b/_tutorial/1.UnderstandingSPARQL.md @@ -0,0 +1,218 @@ +--- +layout: docs +title: "Understanding SPARQL Queries" +order: 10 +--- + +This page introduces the basic structure of a SPARQL query using a **real example from PlantMetWiki**. + +Rather than focusing on abstract syntax, we explain how a concrete biological question is translated into a SPARQL query, and how to interpret each part of the query and its results. + +By the end of this page, you should be comfortable: + +- reading a SPARQL query used in PlantMetWiki, +- understanding what biological question it answers, +- recognizing how pathway content is represented in RDF, +- following links from PlantMetWiki to external resources such as PlantCyc. + +**SPARQL endpoint**: +https://plantmetwiki.bioinformatics.nl/sparql + +**Graph used in all queries**: +`FROM ` + +We will work with the **α-solanine / α-chaconine biosynthesis pathway**, a well-known plant specialised metabolic pathway involved in glycoalkaloid production in *Solanum* species (e.g. potato and tomato). + +## Anatomy of a SPARQL query + +A SPARQL query consist out of several elements, which can be considered as building blocks. + +Our PlantMetWiki question + +> ***Which PlantCyc reactions are part of the α-solanine / α-chaconine biosynthesis pathway, and how can we validate them in PlantCyc?*** + +We will use this pathway URI throughout the tutorial: + +``` + +``` + +### SELECT — what do we want to see in the results? + +The SELECT clause defines what will be returned as results. + +For our question, we want: + • the reaction identifier (?reactionId) + • a clickable PlantCyc link (?plantCycReactionURL) + +```sparql +SELECT ?reactionId ?plantCycReactionURL +``` + +SELECT is used to indicate with variables from the SPARQL query you want to visualise as a result (in other words: which variables we find relevant as output to answer our biological question). + +### WHERE — how do we find that information? + +The second element we encouter in a SPARQL query, is the _query pattern_, which starts with the word WHERE, with the query itself enclosed in curly brackets: {} . + +The WHERE clause defines the graph pattern to match (triples in the form subject–predicate–object). + +For PlantMetWiki pathways, we already discovered the key predicates: + • gpml:hasInteraction (links a pathway to interactions) + • some interactions represent real PlantCyc reactions (e.g. RXN-10730) + • some interactions are GPML anchor helper nodes (contain _anchor_) and should not be linked to PlantCyc or interpreted as reactions + +```sparql +WHERE { + VALUES ?pathway { <...> } + ?pathway gpml:hasInteraction ?interaction . + ... +} +``` +This is a set of RDF triples (subject–predicate–object), just like in the Wikidata tutorial, but with PlantMetWiki predicates. + + +## Step-by-step interpretation of the query + +### Line 1 — VALUES (what are we querying about?) + +VALUES lets us “pin” the query to one (or multiple) specific items. + +```sparql +VALUES ?pathway { + +} +``` +You can add more pathways inside the braces later (separated by spaces) if you want to compare multiple pathways. + +### Line 2 — Retrieve interactions from the pathway + +This line uses the pathway as the subject and gets all linked interactions: + +```sparql +?pathway gpml:hasInteraction ?interaction . +``` + +### Line 3 — Turn an interaction URI into a PlantCyc reaction link + +PlantMetWiki does not use Wikidata’s label service. Instead, we often extract meaningful identifiers from URIs. + +1. Extract the part after /Interaction/: + +``` +BIND( + STRAFTER(STR(?interaction), "/Interaction/") + AS ?reactionId +) +``` + +2. Keep only “real” reactions and exclude anchor helper nodes: + +``` +FILTER(CONTAINS(?reactionId, "RXN-")) +FILTER(!CONTAINS(?reactionId, "_anchor_")) +``` + +3. Construct a clickable PlantCyc URL: + +``` +BIND( + IRI(CONCAT( + "https://pmn.plantcyc.org/PLANT/NEW-IMAGE?type=REACTION&object=", + ?reactionId + )) + AS ?plantCycReactionURL +) +``` + +This turns the extracted identifier into a clickable external link. + +#### Full query + +```sparql +PREFIX gpml: + +SELECT ?reactionId ?plantCycReactionURL +FROM +WHERE { + VALUES ?pathway { + + } + + ?pathway gpml:hasInteraction ?interaction . + + BIND(STRAFTER(STR(?interaction), "/Interaction/") AS ?reactionId) + + FILTER(CONTAINS(?reactionId, "RXN-")) + FILTER(!CONTAINS(?reactionId, "_anchor_")) + + BIND( + IRI(CONCAT( + "https://pmn.plantcyc.org/PLANT/NEW-IMAGE?type=REACTION&object=", + ?reactionId + )) + AS ?plantCycReactionURL + ) +} +ORDER BY ?reactionId +LIMIT 200 +``` + +### Listing pathway components (genes, metabolites) + +To see which data nodes (genes, metabolites) are present in the same pathway: + +``` +PREFIX gpml: + +SELECT ?dataNodeId +FROM +WHERE { + VALUES ?pathway { + + } + + ?pathway gpml:hasDataNode ?dataNode . + BIND(STRAFTER(STR(?dataNode), "/DataNode/") AS ?dataNodeId) +} +ORDER BY ?dataNodeId +LIMIT 200 +``` + +## A note on labels and identifiers + +Unlike Wikidata, PlantMetWiki does not provide a dedicated label service (SERVICE wikibase:label). + +Instead: + + • some readable information is stored directly (e.g. gpml:name, gpml:textLabel), + • otherwise, meaningful identifiers are extracted directly from URIs using string functions such as STRAFTER(). + +This approach is used consistently throughout the tutorial. + + +## Questions + +
    + Question 1: Which part of the query selects the pathway we want to investigate? +

    Answer:
    + VALUES ?pathway { <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/RC1000_r20251206224344> } +

    +
    + +
    + Question 2: Which line retrieves all interactions that belong to the pathway? +

    Answer:
    + ?pathway gpml:hasInteraction ?interaction . +

    +
    + +
    + Question 3: Why do we filter out _anchor_ interactions? +

    Answer:
    + Interactions that contain _anchor_ are GPML helper nodes used for drawing/connecting edges. They are not real PlantCyc reaction identifiers, so PlantCyc will not recognize them. +

    +
    + + + diff --git a/_tutorial/2.ExpandQueries.md b/_tutorial/2.ExpandQueries.md new file mode 100644 index 0000000..4900054 --- /dev/null +++ b/_tutorial/2.ExpandQueries.md @@ -0,0 +1,239 @@ +--- +layout: docs +title: "Exploring Species and Pathways" +order: 20 +--- + + +In the previous page, we focused on a **single plant metabolic pathway** and examined how reactions are represented and linked to PlantCyc. + +In this section, we take a step back and explore **PlantMetWiki as a collection**: +- Which plant species are represented? +- Which pathways exist per species? +- How can we navigate across species and pathways using SPARQL? + +This page introduces **exploratory queries** that help you understand the scope of the database before asking more detailed biological questions. + +**SPARQL endpoint** +https://plantmetwiki.bioinformatics.nl/sparql + +**Graph used in all queries** +```sparql +FROM +``` + +### How species are represented in PlantMetWiki + +Unlike Wikidata, PlantMetWiki stores species names directly as text literals, rather than numeric identifiers. + +This makes it easy to: + • read queries + • copy species names into VALUES blocks + • explore the database interactively + +Species information is attached to pathways using the predicate: `gpml:organism` + +So far, we have implicitly focused on a single species by querying a single pathway. +If we want to explore pathways from multiple species, we can do this by changing the VALUES line in our query. + +```sparql +{ + VALUES ?organism { "Solanum tuberosum" } +} +``` + +This restricts the query to pathways annotated for potato. + +### Discovering which species are available + +Before querying pathways for a specific plant, it is useful to know which species are present at all. + +The following query lists all species annotated in PlantMetWiki: + + +```sparql +PREFIX gpml: + +SELECT DISTINCT ?organism +FROM +WHERE { + ?pathway gpml:organism ?organism . +} +ORDER BY ?organism +``` + +This gives you a controlled vocabulary of species names that can be reused directly in other queries. + +### Listing pathways for a given species + +Once you know which species exist, you can retrieve the pathways associated with a specific plant. + +For example, to list pathways annotated for Solanum tuberosum (potato): + +``` +PREFIX gpml: + +SELECT ?pathway +FROM +WHERE { + ?pathway gpml:organism "Solanum tuberosum" . +} +LIMIT 200 +``` + +At this stage, the query returns pathway identifiers (URIs). + +## Making results more informative: pathway names + +To make the output easier to interpret, we can include pathway names when they are available. + +We extend the SELECT clause and add an OPTIONAL pattern: + +```sparql +PREFIX gpml: + +SELECT ?pathway ?pathwayName +FROM +WHERE { + ?pathway gpml:organism "Solanum tuberosum" . + OPTIONAL { ?pathway gpml:name ?pathwayName } +} +LIMIT 200 +``` + +Using OPTIONAL ensures that pathways without a name are still returned. + +## Comparing pathways across multiple species + +SPARQL allows you to compare species by listing them explicitly using VALUES. + +For example, to retrieve pathways for potato and Arabidopsis: +``` +PREFIX gpml: + +SELECT ?organism ?pathway ?pathwayName +FROM +WHERE { + VALUES ?organism { + "Solanum tuberosum" + "Arabidopsis thaliana" + } + + ?pathway gpml:organism ?organism . + OPTIONAL { ?pathway gpml:name ?pathwayName } +} +LIMIT 200 +``` + +This query makes the species explicit in the results, which is especially useful when comparing model plants with crop species. + + +### Questions + +
    + How would the VALUES line look if we also want to include Oryza sativa? + +

    Answer:
    + + VALUES ?organism {
    +   "Solanum tuberosum"
    +   "Arabidopsis thaliana"
    +   "Oryza sativa"
    + } +
    +

    +
    + +### Which species? + +Since we are now retrieving pathways from multiple species, it is useful to explicitly show the species in the results. +To do this, we modify the SELECT clause so that the organism is visible: + +```sparql +SELECT ?organism ?pathway +``` +If we also want to include the pathway name (when available), we can extend this further: + +```sparql +SELECT ?organism ?pathway ?pathwayName +``` + +And add the corresponding triple pattern: +``` +OPTIONAL { ?pathway gpml:name ?pathwayName } +``` + +### Updated query with pathway names +``` +PREFIX gpml: + +SELECT ?organism ?pathway ?pathwayName +FROM +WHERE { + VALUES ?organism { + "Solanum tuberosum" + "Arabidopsis thaliana" + } + + ?pathway gpml:organism ?organism . + OPTIONAL { ?pathway gpml:name ?pathwayName } +} +LIMIT 200 +``` + +### Questions + +
    + Which variable adds the species name to the results? +

    Answer:
    + ?organism, filled via ?pathway gpml:organism ?organism +

    +
    + +### Easier querying: discovering species in PlantMetWiki + +Unlike Wikidata, PlantMetWiki does not require numeric identifiers (such as Q-numbers). +Species names are stored directly as literals. + +If you are not sure which species are present in the database, you can list them: + +``` +PREFIX gpml: + +SELECT DISTINCT ?organism +FROM +WHERE { + ?pathway gpml:organism ?organism . +} +ORDER BY ?organism +``` + +This query gives you a controlled vocabulary of species that you can copy directly into a VALUES block. + +### Small expansion: count pathways per species + +We can also aggregate results to answer questions such as: + +Which species have the most pathways in PlantMetWiki? + +``` +PREFIX gpml: + +SELECT ?organism (COUNT(DISTINCT ?pathway) AS ?nPathways) +FROM +WHERE { + ?pathway gpml:organism ?organism . +} +GROUP BY ?organism +ORDER BY DESC(?nPathways) +``` + +### Notes on visualization + +Unlike Wikidata, the PlantMetWiki SPARQL endpoint does not provide built-in image visualizations. + +However, you can: + + • export results as tabless + • click through to PlantCyc reaction links (as shown in Assignment 1) + • use external tools (e.g. notebooks, R, Python) to visualize pathway statistics diff --git a/_tutorial/3.GeneClusterLinks.md b/_tutorial/3.GeneClusterLinks.md new file mode 100644 index 0000000..75bf736 --- /dev/null +++ b/_tutorial/3.GeneClusterLinks.md @@ -0,0 +1,179 @@ +--- +layout: docs +title: "Linking Pathways to Biosynthetic Gene Clusters" +order: 30 +--- + +Plant specialized metabolites are often produced by **biosynthetic gene clusters (BGCs)** — groups of physically co-located genes that together encode a metabolic pathway. + +PlantMetWiki provides explicit **cross-links between metabolic pathways and BGC resources**, allowing you to move from: + +- pathway-level knowledge +- to genomic context +- to specialized metabolite biosynthesis + +In this section, we explore how PlantMetWiki connects pathways to: + +- **plantiSMASH** predictions +- **MIBiG** curated BGCs +- external metadata describing gene clusters + +**SPARQL endpoint** +https://plantmetwiki.bioinformatics.nl/sparql + +**Graph used in all queries** +```sparql +FROM + +``` + +## What is a BGC cross-link in PlantMetWiki? + +A BGC cross-link connects: + + • a PlantMetWiki pathway + • to a gene cluster identifier + • originating from an external resource + +These links are derived from: + + • pathway annotations + • genomic metadata + • curated and predicted BGC databases + +PlantMetWiki does not duplicate BGC data; instead, it acts as a hub connecting pathways to specialized genomics resources. + +⸻ + +## Discovering all BGC cross-links + +To get an overview of how many pathway–BGC links exist, you can list all known cross-links: + +``` +SELECT ?pathway ?bgc +FROM +WHERE { + ?pathway ?predicate ?bgc . + FILTER(CONTAINS(STR(?bgc), "BGC")) +} +LIMIT 200 +``` + +This query reveals that: + • pathways may link to multiple BGCs + • BGC identifiers come from different external sources + + +## Linking pathways to plantiSMASH BGCs + +plantiSMASH predicts biosynthetic gene clusters directly from plant genomes. + +PlantMetWiki pathways can link to plantiSMASH BGC identifiers, allowing you to: + • move from pathway → genome + • inspect candidate gene clusters + • evaluate biosynthetic hypotheses + +Example query (from plantiSMASHLinks.rq): + +``` +SELECT ?pathway ?plantiSMASH_BGC +FROM +WHERE { + ?pathway ?p ?plantiSMASH_BGC . + FILTER(CONTAINS(STR(?plantiSMASH_BGC), "plantiSMASH")) +} +LIMIT 200 +``` + + +Each returned BGC identifier can be clicked through to explore: + • predicted cluster boundaries + • gene annotations + • domain architecture + + +## Linking pathways to curated MIBiG clusters + +MIBiG is a manually curated database of experimentally validated biosynthetic gene clusters. + +PlantMetWiki links pathways to MIBiG entries when: + • a pathway is supported by experimental evidence + • a known BGC has been described in the literature + +Example query (from MIBiGLinks.rq): + +``` +SELECT ?pathway ?mibig +FROM +WHERE { + ?pathway ?p ?mibig . + FILTER(CONTAINS(STR(?mibig), "mibig")) +} +LIMIT 200 +``` + +These links allow you to: + • trace pathways to experimentally validated gene clusters + • connect pathway knowledge with publications + • assess confidence in biosynthetic assignments + +## Combining multiple BGC sources + +Some pathways link to both predicted and curated clusters. + +You can retrieve all BGC-related links regardless of source: +``` +SELECT ?pathway ?bgc +FROM +WHERE { + ?pathway ?p ?bgc . + FILTER( + CONTAINS(STR(?bgc), "plantiSMASH") || + CONTAINS(STR(?bgc), "mibig") + ) +} +LIMIT 200 +``` + +This makes it possible to: + • compare predictions with curated knowledge + • identify gaps in experimental validation + • prioritize clusters for follow-up study + +## Pathway-centric view: BGCs for a specific pathway + +You can also start from a specific MIBiG BGC and ask what is the pathway that belongs to that BGC + +``` +PREFIX ro: +PREFIX wp: +PREFIX dc: +PREFIX dcterms: + +# Retrieve thalianol pathway +SELECT DISTINCT ?pw (STR(?titleLit) AS ?title) +FROM +WHERE { + ro:0000051 ?gene . + + ?interaction wp:participants ?gene ; + dcterms:isPartOf ?pw . + + ?pw dc:title ?titleLit . +} +ORDER BY ?title +``` + +## Why BGC cross-links matter + +By linking pathways to gene clusters, PlantMetWiki enables: + • genome-to-metabolite reasoning + • discovery of candidate biosynthetic loci + • comparison of predicted vs curated clusters + • integration with omics pipelines + +This makes PlantMetWiki especially useful for: + • plant specialized metabolism research + • natural product discovery + • functional genomics + • comparative pathway analysis \ No newline at end of file diff --git a/_tutorial/4.FederatedQueries.md b/_tutorial/4.FederatedQueries.md new file mode 100644 index 0000000..0b73ceb --- /dev/null +++ b/_tutorial/4.FederatedQueries.md @@ -0,0 +1,268 @@ +--- +layout: docs +title: "Federated Queries Across Linked Open Data" +order: 40 +--- + +One of the main strengths of SPARQL is that it allows **federated queries**: +a single query can combine data from multiple, independent knowledge bases. + +PlantMetWiki is designed to work *together* with existing Linked Open Data resources such as: +- **Wikidata** +- **ChEBI** +- **PubMed** + +In this section, we show how to move beyond PlantMetWiki alone and place plant metabolic pathways in a **broader biological knowledge graph**. + +**SPARQL endpoint** +https://plantmetwiki.bioinformatics.nl/sparql + +**Graph used in all queries** +```sparql +FROM +``` + +## What is a federated SPARQL query? + +A federated query uses the SERVICE keyword to send part of the query to a remote SPARQL endpoint. + +Conceptually: + • PlantMetWiki provides pathway context + • External endpoints provide chemical, biological, or literature metadata + • SPARQL stitches them together + + +``` +SERVICE { + ... +} +``` + +Each SERVICE block is evaluated remotely, and the results are merged with the local query. + +## Why federate from PlantMetWiki? + +PlantMetWiki focuses on: + + • pathways + • species + • biosynthesis + • gene clusters + +It deliberately does not duplicate: + + • chemical ontologies + • literature databases + • encyclopedic metadata + +Federation lets you: + + • enrich pathways with chemical identifiers + • connect metabolites to publications + • reuse authoritative external resources + +⸻ + +## Example 1 — Sending metabolites to Wikidata + +Many PlantMetWiki pathways contain metabolites with identifiers that are also known to Wikidata. + +Using a federated query, we can: + + 1. extract metabolite identifiers from PlantMetWiki + 2. send them to Wikidata + 3. retrieve additional metadata + +Example (from WikidataTest.rq): +``` +PREFIX gpml: + +SELECT ?metabolite ?wikidataItem +FROM +WHERE { + ?pathway gpml:hasDataNode ?metabolite . + + SERVICE { + ?wikidataItem ?p ?metabolite . + } +} +LIMIT 100 +``` + +This demonstrates the mechanism of federation, even before refining identifiers. + +## Example 2 — Linking metabolites via InChIKeys + +Chemical identifiers such as InChIKeys provide a robust bridge between databases. + +PlantMetWiki → InChIKey → Wikidata → ChEBI + +Example (from WikidataInChiKeys.rq): + +``` +SELECT ?metabolite ?inchiKey ?wikidataItem +FROM +WHERE { + ?metabolite ?p ?inchiKey . + FILTER(CONTAINS(STR(?p), "InChIKey")) + + SERVICE { + ?wikidataItem wdt:P235 ?inchiKey . + } +} +LIMIT 100 +``` + +This pattern allows you to: + + • unify chemical identities across resources + • avoid ambiguous names + • build reliable cross-database links + +⸻ + +## Example 3 — Federating to ChEBI + +ChEBI is the authoritative ontology for chemical entities of biological interest. + +Using InChIKeys or ChEBI IDs, you can retrieve: + + • chemical classifications + • roles (e.g. alkaloid, glycoside) + • ontology relationships + +Example (from FederatedMetabolitesChEBI.rq): +``` +SELECT ?metabolite ?chebi +FROM +WHERE { + ?metabolite ?p ?chebi . + FILTER(CONTAINS(STR(?chebi), "CHEBI")) + + SERVICE { + ?chebiItem wdt:P683 ?chebi . + } +} +LIMIT 100 +``` + +This enables ontology-aware pathway analysis without duplicating ChEBI locally. + + +## Example 4 — Linking pathways to publications (PubMed) + +Many pathways and gene clusters are supported by literature evidence. + +Using federated queries, you can: + + • extract PubMed IDs + • query Wikidata for article metadata + • retrieve titles, journals, and authors + +Example (from ListPubMedIDs.rq): +``` +SELECT DISTINCT ?pmid +FROM +WHERE { + ?pathway ?p ?pmid . + FILTER(CONTAINS(STR(?pmid), "pubmed")) +} +``` +Extended with federation (from WikidataLookupByInChIKeys.rq): + +``` +SERVICE { + ?article wdt:P698 ?pmid ; + rdfs:label ?title . + FILTER(LANG(?title) = "en") +} +``` + +This connects: + + • pathway → metabolite → publication + • enabling traceable biological evidence + + +## Example 5 — Bidirectional federation + +Federation does not have to start from PlantMetWiki. + +You can: + + • query Wikidata first + • then match results against PlantMetWiki + +Example (from SendInChiKeysToWikidata.rq): + +``` +SERVICE { + ?item wdt:P235 ?inchiKey . +} + +?metabolite ?p ?inchiKey . +``` + +This pattern is useful when: + + • starting from literature or chemical knowledge + • and asking whether PlantMetWiki contains related pathways + +⸻ + +Practical considerations + +Performance + + • Federated queries are slower than local queries + • Limit result sizes (LIMIT) + • Avoid unnecessary variables + +Stability + + • External endpoints may change + • Wikidata enforces rate limits + • Queries should be robust to partial results + +## Design philosophy + +PlantMetWiki intentionally stays lightweight: + + • no chemical ontology duplication + • no literature mirroring + • no monolithic data model + +Federation keeps the ecosystem modular and sustainable. + +⸻ + +## What you can do with federated queries + +By combining PlantMetWiki with external resources, you can: + + • trace metabolites from genome → pathway → chemistry → literature + • enrich pathway analyses with ontology information + • integrate PlantMetWiki into larger knowledge graphs + • support FAIR, reusable, interoperable workflows + +⸻ + +## Summary + +Federated SPARQL queries allow PlantMetWiki to function as: + + • a hub for plant metabolic pathways + • a connector between genomics, chemistry, and literature + • a first-class citizen of the Linked Open Data ecosystem + +This closes the loop from: +genes → pathways → metabolites → publications → knowledge + +| Tutorial section | Query file | +|-----------------|------------| +| Wikidata basics | `WikidataTest.rq` | +| InChIKey federation | `WikidataInChiKeys.rq` | +| ChEBI federation | `FederatedMetabolitesChEBI.rq` | +| PubMed links | `ListPubMedIDs.rq` | +| Reverse federation | `SendInChiKeysToWikidata.rq` | +| Advanced lookups | `WikidataLookupByInChIKeys.rq` | \ No newline at end of file diff --git a/Assignments/AddendumBioSb2019.md b/_tutorial/_archivedpages/AddendumBioSb2019.md similarity index 100% rename from Assignments/AddendumBioSb2019.md rename to _tutorial/_archivedpages/AddendumBioSb2019.md diff --git a/Assignments/assignment2A.md b/_tutorial/_archivedpages/AdditionalAnalyses.md similarity index 89% rename from Assignments/assignment2A.md rename to _tutorial/_archivedpages/AdditionalAnalyses.md index d603e6c..ebd49df 100644 --- a/Assignments/assignment2A.md +++ b/_tutorial/_archivedpages/AdditionalAnalyses.md @@ -1,7 +1,12 @@ +--- +layout: docs +title: "Additional analyses" +--- + Assignment 2: Find drugs for cancers that target genes related to cell proliferation ================= -[HOME](https://denisesl22.github.io/SPARQLTutorials/) +Return to [HOME](https://pathway-lod.github.io/SPARQLTutorials/) During this assignment, we will investigate another example SPARQL query of Wikidata, called ["Find drugs for cancers that target genes related to cell proliferation"](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Find_drugs_for_cancers_that_target_genes_related_to_cell_proliferation). We will first go through the basics of a SPARQL query. Second, we will find out how to execute the query and retain or share results. Last, we will expand the query and make other (small) changes, to understand the structure of a SPARQL query better, and see what other data is available in Wikidata. @@ -63,19 +68,19 @@ You can uncomment by removing the '#' sign, and run the query again. 1. Comment the line related to true positives again (from the assignment above). 1. Change the view from 'Table' to 'Scatter chart': -![Select Scatter Chart](../Images/Scatter_chart_example2.jpg) +![Select Scatter Chart](/Images/Scatter_chart_example2.jpg) The following graph should now appear (click on on of the coloured circles in the graph, to obtain the diseaseLabel): -![Select Scatter Chart](../Images/Scatter_chart_visualisation_example2.JPG) +![Select Scatter Chart](/Images/Scatter_chart_visualisation_example2.JPG) **Question 1A:** Which variables are depicted in which manner? **Question 1B:** What would change to the visualisation, if you switch the place of the variables ?geneLabel and ?biological_processLabel with one another? -(Answers can be found [here](../Answers/AnswersAssignment2.md)). +(Answers can be found [here](/Answers/AnswersAssignment2.md)). -In the [last exercise](../Answers/assignment2B.md) related to this assignment, we will look at expansion options for the query above. +In the [last exercise](/Answers/assignment2B.md) related to this assignment, we will look at expansion options for the query above. -[HOME](https://denisesl22.github.io/SPARQLTutorials/) +Return to [HOME](https://pathway-lod.github.io/SPARQLTutorials/) diff --git a/Assignments/assignment2B.md b/_tutorial/_archivedpages/BiologicalQuestionsWikidata.md similarity index 91% rename from Assignments/assignment2B.md rename to _tutorial/_archivedpages/BiologicalQuestionsWikidata.md index a5af293..1d97d16 100644 --- a/Assignments/assignment2B.md +++ b/_tutorial/_archivedpages/BiologicalQuestionsWikidata.md @@ -1,4 +1,7 @@ -[HOME](https://denisesl22.github.io/SPARQLTutorials/) +--- +layout: docs +title: "Answering Biological Questions on Wikidata" +--- ## Changing the Question @@ -35,4 +38,4 @@ You can probably think of some other questions you would like to ask to Wikidata of the query we are working on now, or find another example on Wikidata you want to understand and expand (ask your instructors for help if needed). -[HOME](https://denisesl22.github.io/SPARQLTutorials/) +Return to [HOME](https://pathway-lod.github.io/SPARQLTutorials/) diff --git a/Assignments/assignment1B.md b/_tutorial/_archivedpages/ResultsWikidata.md similarity index 76% rename from Assignments/assignment1B.md rename to _tutorial/_archivedpages/ResultsWikidata.md index 47bb268..8683d06 100644 --- a/Assignments/assignment1B.md +++ b/_tutorial/_archivedpages/ResultsWikidata.md @@ -1,22 +1,25 @@ -[HOME](https://DeniseSl22.github.io/SPARQLTutorials/) +--- +layout: docs +title: "Execute & retain results" +--- ## Run and Save Click on [this link](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Which_variant_of_which_gene_predicts_a_positive_prognosis_in_colorectal_cancer) to go to the example page of Wikidata. Below the Query titled "Which variant of which gene predicts a positive prognosis in colorectal cancer", click on the "Try it" button, which will open the following page: -![Wikidata SPARQL Endpoint](../Images/WikidaatSPARQL_endpoint.JPG) +![Wikidata SPARQL Endpoint](/Images/WikidaatSPARQL_endpoint.JPG) **Welcome to the SPARQL Endpoint of Wikidata!** Excecute the query by clicking on the blue play button. This will reveal the results of the query in a panel below the query editor: -![results query 1](../Images/Results_Query1_wikidata.JPG) +![results query 1](/Images/Results_Query1_wikidata.JPG) There are several options to work with the results of your query. To save your data, click on the Download button (red arrow in image below), and select the format you want to work with (CSV, TSV, JSON, HTML, SVG-image). To get a weblink to your results, click on the Link button (green arrow in image below). Last, there are also several code examples available (blue arrow in the image below), which could help construct a script to automate (several) queries, or combine the results of multiple queries in a workflow. Examples are available for: R, Python, Ruby, Perl, Java, JavaScript and many others! -![results query 1 Download](../Images/Results_Query1_wikidata_Download.jpg) +![results query 1 Download](/Images/Results_Query1_wikidata_Download.jpg) + +We will now make some changes to this query, to understand the structure of SPARQL even better, in the next page. -We will now make some changes to this query, to understand the structure of SPARQL even better, in the [next exercise](../Assignments/assignment1C.md). -[HOME](https://DeniseSl22.github.io/SPARQLTutorials/) diff --git a/Assignments/assignment1D.md b/_tutorial/_archivedpages/WikidataQueries.md similarity index 83% rename from Assignments/assignment1D.md rename to _tutorial/_archivedpages/WikidataQueries.md index b78aa3b..a41321a 100644 --- a/Assignments/assignment1D.md +++ b/_tutorial/_archivedpages/WikidataQueries.md @@ -1,3 +1,12 @@ +--- +layout: docs +title: "A more complicated query on Wikidata" +prev: "/Assignments/assignment2A/" +prev_title: "Previous page" +next: "/Assignments/assignment2A/" +next_title: "Next page" +--- + ### Additional questions Assignment 1: diff --git a/assets/css/docs.css b/assets/css/docs.css new file mode 100644 index 0000000..1f0fbb9 --- /dev/null +++ b/assets/css/docs.css @@ -0,0 +1,300 @@ +:root { + --pmw-green: #8aa34a; + --pmw-green-dark: #3a5b20; + --pmw-green-soft: #8fa35a; +} + +/* Global link color override */ +a { + color: var(--pmw-green); +} +a:hover { + color: var(--pmw-green-dark); +} + +/* ---------------- Topbar ---------------- */ +.pmw-topbar { + display: flex; + align-items: center; + gap: 12px; + margin: 12px 0 16px 0; +} + +.pmw-home { + display: inline-flex; + align-items: center; + gap: 10px; + text-decoration: none; + font-weight: 700; +} + +.pmw-home img { + height: 80px; +} + +.pmw-topbar-actions { + display: flex; + gap: 10px; + align-items: center; +} + +/* Topbar buttons */ +.pmw-btn { + background-color: var(--pmw-green); + color: #fff !important; + padding: 6px 14px; + border-radius: 6px; + font-size: 0.85rem; + font-weight: 600; + text-decoration: none; +} + +.pmw-btn:hover { + background-color: var(--pmw-green-dark); +} + +.pmw-btn-secondary { + background-color: transparent; + color: var(--pmw-green) !important; + border: 1px solid var(--pmw-green); +} + +.pmw-btn-secondary:hover { + background-color: var(--pmw-green-soft); + color: #fff !important; +} + +/* ---------------- Theme width / centering fixes ---------------- */ +/* Tactile centers a wrapper; make it responsive and less "pushed right" */ +.inner { + max-width: none !important; + width: min(1600px, calc(100% - 48px)) !important; + margin-left: auto !important; + margin-right: auto !important; +} + +#main_content { + width: 100% !important; + float: none !important; + margin: 0 auto !important; + padding-left: 0 !important; + padding-right: 0 !important; +} + +section { + padding-left: 0 !important; + padding-right: 0 !important; +} + +/* ---------------- Docs layout ---------------- */ +.pmw-layout { + display: grid; + grid-template-columns: clamp(220px, 24vw, 300px) minmax(0, 1fr); + column-gap: 24px; + align-items: start; + width: 100%; + margin: 0 auto; +} + +.pmw-content { + min-width: 0; +} + +/* ---------------- Sidebar ---------------- */ +.pmw-sidebar { + font-size: 0.90rem; + line-height: 1.35; + + position: sticky; + top: 16px; + align-self: start; + max-height: calc(100vh - 32px); + overflow-y: auto; +} + +/* Sidebar links */ +.pmw-sidebar a { + color: var(--pmw-green); +} + +.pmw-sidebar a:hover { + color: var(--pmw-green-dark); + text-decoration: underline; +} + +/* Active page (if you add class="active") */ +.pmw-sidebar a.active { + font-weight: 700; + color: var(--pmw-green-dark); +} + +/* Nav links wrap nicely */ +.pmw-nav a { + display: block; + white-space: normal; + word-break: break-word; +} + +/* Sidebar section titles */ +.pmw-nav-title, +.pmw-nav-section { + font-size: 0.75rem; + font-weight: 600; + text-transform: uppercase; + letter-spacing: 0.03em; + margin-top: 10px; + margin-bottom: 4px; + color: var(--pmw-green-dark); +} + +/* Sidebar prev/next buttons */ +.pmw-side-prevnext { + margin-top: 16px; + padding-top: 12px; + border-top: 1px solid rgba(0,0,0,0.12); + display: grid; + gap: 8px; +} + +.pmw-side-btn { + display: block; + padding: 6px 10px; + border-radius: 6px; + text-decoration: none; + background: rgba(110,130,59,0.12); + color: var(--pmw-green-dark); + font-weight: 600; +} + +.pmw-side-btn:hover { + background: rgba(110,130,59,0.18); +} + +/* License + feedback block IN sidebar (wrapped paragraph) */ +.pmw-license-inline { + margin-top: 14px; + padding-top: 10px; + border-top: 1px solid rgba(0,0,0,0.12); + font-size: 0.75rem; + line-height: 1.35; + color: #555; +} + +.pmw-nav-divider { + margin: 12px 0; + border-top: 1px solid rgba(0,0,0,0.12); +} + +.pmw-nav-muted { + font-size: 0.8rem; + color: #777; +} + +/* make TOC list compact */ +.pmw-toc ul { + margin: 6px 0 0 0; + padding-left: 16px; +} +.pmw-toc li { + margin: 4px 0; +} +.pmw-toc a { + display: block; + white-space: normal; + word-break: break-word; +} + +/* TOC nesting / depth */ +.pmw-toc { list-style: none; margin: 0; padding: 0; } +.pmw-toc li { margin: 4px 0; } + +/* H2 entries */ +.pmw-toc-l2 > a { + font-weight: 600; + font-size: 0.90rem; +} + +/* H3 entries (nested under H2 visually) */ +.pmw-toc-l3 { + padding-left: 14px; +} + +.pmw-toc-l3 > a { + font-weight: 400; + font-size: 0.82rem; + opacity: 0.95; +} + +/* Prevent any sidebar content from forcing layout wider */ +.pmw-sidebar, +.pmw-sidebar * { + min-width: 0; +} + +/* Force long strings to wrap (URLs, long tokens) */ +.pmw-license-inline { + overflow-wrap: anywhere; + word-break: break-word; +} +/* Prevent any sidebar content from forcing layout wider */ +.pmw-sidebar, +.pmw-sidebar * { + min-width: 0; +} + +/* Force long strings to wrap (URLs, long tokens) */ +.pmw-license-inline { + overflow-wrap: anywhere; + word-break: break-word; +} +.pmw-license-inline a { + overflow-wrap: anywhere; + word-break: break-word; +} + +/* ---------------- Global heading color override ---------------- */ +/* Fix: var(--pmw-dark-green) didn't exist */ +h1, h2, h3, h4, h5, h6 { + color: #333; +} + +/* ---------------- Optional generic buttons ---------------- */ +button, +.button, +a.button { + background-color: var(--pmw-green); + color: #fff !important; +} + +button:hover, +.button:hover, +a.button:hover { + background-color: var(--pmw-green-dark); +} + +/* ---------------- Footer tweak (theme footer) ---------------- */ +.site-footer { + margin-top: 0.5rem; +} + +/* ---------------- Responsive ---------------- */ +@media (max-width: 900px) { + .pmw-layout { + grid-template-columns: 1fr; + } + + .pmw-sidebar { + position: static; + max-height: none; + overflow: visible; + + width: auto; + max-width: none; + border-right: none; + padding-right: 0; + border-bottom: 1px solid rgba(0,0,0,0.12); + padding-bottom: 12px; + margin-bottom: 12px; + } +} + diff --git a/index.md b/index.md index 2d4b6ac..ef91b33 100644 --- a/index.md +++ b/index.md @@ -1,36 +1,105 @@ -SPARQLing Biology: a beginners course. -============================================================================================= +--- +layout: docs +title: "SPARQLing Plant Metabolic Pathways Wiki" +description: "Documentation and Tutorial for the PlantMetWiki SPARQL Explorer" +order: 0 +permalink: / +--- -[HOME](https://bigcat-um.github.io/SPARQLTutorialBioSB2019/) +## Summary +--------- +Plant Metabolic Pathways Wiki (PlantMetWiki) is an open online portal for querying linked specialized plant pathway information. PlantMetWiki is available in **Semantic Web format** as Resource Description Framework (**RDF**) and can be accessed via an easy-to-use **SNORQL user interface**. **Pre-written SPARQL queries** are available for users to execute or adapt to retrieve pathway information. **Federated queries** with other linked open data tools are supported, thereby expanding the [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) framework. + +By structuring characterized pathways knowledge as Linked Open Data, linking it to predicted biosynthetic clusters, and supporting federated querying, PlantMetWiki supports **hypothesis generation in plant biosynthesis and natural product discovery**. + + +## Using the SPARQL Explorer +--------- + +Visit our PlantMetWiki SPARQL interface at [plantmetwiki.bioinformatics.nl](https://plantmetwiki.bioinformatics.nl/). + +Follow the steps below to execute a pre-written query: + +1: **Select a query** from the list of example SPARQL queries. You can **adapt the query** by typing in the SPARQL Query box or from the source repository [pathway-lod/SPARQLQueries](https://github.com/pathway-lod/SPARQLQueries) + +2: Press the green **Query** button to execute your selected query. + +3: View the **result s** on the same page. -Program +4: You can **select your own** list of example queries from github, by adding the link and click the **refresh button**. + +

    + New Snorql Interface +

    + +* Update your SPARQL query from [this template]({{ "/ParticipantQueries/Example1/" | relative_url }}) + +## Download Results +--------- + +Output data is available for download in native RDF format (.ttl), TSV, CSV, and json. + +## Use Cases +--------- +### 1. Diving into Natural Products : and example from castor oil --------- -This workshop consists out of four parts: -* 30 minutes: Introduction to RDF and SPARQL ([presentation](/Presentation_introRDF.pdf)) -* 25 minutes: Gene variants in Wikidata: - * [Understanding the Basics](Assignments/assignment1A.md) - * [Execute the query and retain results](Assignments/assignment1B.md) - * [Expand and change Query](Assignments/assignment1C.md) - -* 25 minutes: Drug Targets in Wikidata - * [A more complicated query](Assignments/assignment2A.md) - * [Answering Biological Questions](Assignments/assignment2B.md) - -* 10 minutes: Recap - * Other Biological databases with RDF ([presentation](/Presentation_introRDF.pdf)) - * Update your SPARQL query [here](https://github.com/BiGCAT-UM/SPARQLTutorialBioSB2019/tree/master/ParticipantQueries) +### 2. Resolving Pathways across Species : an example in Capsicum +--------- -An [addendum](Assignments/AddendumBioSb2019.md) is available, where we added: -* Answers to questions asked during the tutorial. -* More information on where to find Biological and Chemical properties (aka relationships) to expand your query. -* More detailed explanation of the SERVICE statement (since this is not directly part of SPARQL, but constructed by Wikidata for easier querying). -The material for this workshop is available under [CC-BY-SA 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/legalcode) licence. +## Tutorial Pages : the SPARQL PlantMetWiki Explorer +--------- + +{% assign tutorials = site.tutorial | sort: "order" %} - +{% for p in tutorials %} +* [{{ p.title }}]({{ p.url | relative_url }}) +{% endfor %} + +## Resources +--------- + +* [Introduction to RDF and SPARQL](/Presentation_introRDF.pdf) by BiGCaT Maastricht University + +* Wikipathway ontology [The WikiPathways WP Ontology](https://vocabularies.wikipathways.org/) + +* [Guide to WikiPathways SPARQL Queries](https://www.wikipathways.org/sparql.html) + +* [The WikiPathways Semantic Web Portal](https://classic.wikipathways.org/index.php/Portal:Semantic_Web) + +## PlantMetWiki architecture +--------- + +PlantMetWiki is built as a modular Linked Open Data ecosystem. +The following repositories together form the data pipeline, infrastructure, +user interfaces, and documentation of the project: + +* **Cyc_to_wiki** – Data extraction and preparation from pathway databases + + +* **gpml-to-rdf** – Conversion of GPML pathway files into RDF + + +* **map-to-rdf** – Generation of RDF crosslinks with MIBiG and plantiSMASH + + +* **virtuoso-httpd-docker** – Triple store setup and Dockerized deployment of the PlantMetWiki SPARQL endpoint + + +* **Snorql-UI** – Web-based SPARQL query interface for PlantMetWiki + + +* **SPARQLQueries** – Curated example SPARQL queries for PlantMetWiki and federated endpoints + + +* **SPARQLTutorials** – Documentation and tutorial pages for learning how to query PlantMetWiki (this website) + + + +## Data availability +--------- - +Data related to PlantMetWiki is available at [Zenodo PlantMetWiki Community](https://zenodo.org/communities/plantmetwiki/). \ No newline at end of file