From 765d230fb597c0b0d17c76b4592f43c34469e4dd Mon Sep 17 00:00:00 2001
From: Neclow <neilclow24@gmail.com>
Date: Mon, 26 Jan 2026 14:40:27 +0000
Subject: [PATCH 1/2] docs(py): update demo notebook with new features and add
 CI

- Add get_common_ancestor and get_node_depth/get_node_depths examples
- Add robinson_foulds distance example
- Add tip about using get_node_depths for efficiency
- Add CI job to test demo notebook execution
- Add nbstripout pre-commit hook (keep outputs, strip execution numbers)
- Update optimisation section to reference demo_opt notebook

Closes #84
---
 .github/workflows/ci-python.yml |  16 +++++
 .pre-commit-config.yaml         |   7 +++
 docs/demo.ipynb                 | 107 ++++++++++++++++++++++++++++----
 3 files changed, 119 insertions(+), 11 deletions(-)

diff --git a/.github/workflows/ci-python.yml b/.github/workflows/ci-python.yml
index 1e6bb062..be2b1849 100644
--- a/.github/workflows/ci-python.yml
+++ b/.github/workflows/ci-python.yml
@@ -36,3 +36,19 @@ jobs:
         with:
           environments: py-phylo2vec
       - run: pixi run -e py-phylo2vec benchmark
+  notebooks:
+    name: Test main demo notebook
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v6
+      - uses: prefix-dev/setup-pixi@v0.9.3
+        with:
+          environments: py-phylo2vec
+      - name: Install package and dependencies
+        run: |
+          pixi run -e py-phylo2vec install-python
+          pixi run -e py-phylo2vec pip install nbconvert
+      - name: Execute demo notebook
+        run: |
+          cd docs
+          pixi run -e py-phylo2vec jupyter nbconvert --to notebook --execute demo.ipynb --output demo_executed.ipynb
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index c48681d1..04ec4d43 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -52,6 +52,13 @@ repos:
         entry: cargo fmt
         pass_filenames: false
 
+  - repo: https://github.com/kynan/nbstripout
+    rev: "0.8.1"
+    hooks:
+      - id: nbstripout
+        args: [--keep-output]
+        files: ^docs/
+
   - repo: https://github.com/python-jsonschema/check-jsonschema
     rev: "0.36.0"
     hooks:
diff --git a/docs/demo.ipynb b/docs/demo.ipynb
index fe0d2372..f0b3db06 100644
--- a/docs/demo.ipynb
+++ b/docs/demo.ipynb
@@ -11,6 +11,8 @@
     "* How to convert Phylo2Vec vectors to Newick format and vice versa\n",
     "* How to sample random trees with branch lengths (phylograms) as Phylo2Vec matrices\n",
     "* How to convert these matrices to Newick format and vice versa\n",
+    "* Tree traversal utilities: common ancestors and node depths\n",
+    "* Tree comparison metrics: Robinson-Foulds distance\n",
     "* Other useful operations on Phylo2Vec vectors\n",
     "\n",
     "Note that the current version of Phylo2Vec (1.x) relies on a core written in Rust, with bindings to Python and R. This comes with significant speed-ups, allowing manipulation large trees (up to ~100,000 to 1 million leaves). To become more familiar with Rust, we recommend this [interactive book](https://rust-book.cs.brown.edu/experiment-intro.html)."
@@ -731,25 +733,108 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.2 Optimisation\n",
+    "Another important tree distance metric is the [Robinson-Foulds (RF) distance](https://en.wikipedia.org/wiki/Robinson%E2%80%93Foulds_metric), which counts the number of bipartitions (splits) that differ between two tree topologies.\n",
     "\n",
-    "In the Phylo2Vec paper, we showcased a hill-climbing optimisation scheme to demonstrate the potential of phylo2vec for maximum likelihood-based phylogenetic inference.\n",
+    "Use `robinson_foulds` to compute the RF distance between two trees."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from phylo2vec.stats import robinson_foulds\n",
+    "\n",
+    "# Sample two random trees with 10 leaves\n",
+    "v1 = p2v.sample_vector(10)\n",
+    "v2 = p2v.sample_vector(10)\n",
+    "\n",
+    "print(f\"Tree 1: {repr(v1)}\")\n",
+    "print(f\"Tree 2: {repr(v2)}\")\n",
     "\n",
-    "These optimisation schemes (to be written in ```opt```) are not thoroughly maintained as difficult to test. One notable goal is to integrate [GradME](https://github.com/Neclow/GradME) into phylo2vec"
+    "# Compute RF distance\n",
+    "rf_dist = robinson_foulds(v1, v2)\n",
+    "print(f\"Robinson-Foulds distance: {rf_dist}\")\n",
+    "\n",
+    "# Normalized RF distance (range [0, 1])\n",
+    "rf_norm = robinson_foulds(v1, v2, normalize=True)\n",
+    "print(f\"Normalized RF distance: {rf_norm:.3f}\")\n",
+    "\n",
+    "# RF distance of a tree with itself is always 0\n",
+    "rf_same = robinson_foulds(v1, v1)\n",
+    "print(f\"RF distance (same tree): {rf_same}\")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
+   "source": "### 3.2. Tree traversal: common ancestors and node depths\n\nPhylo2Vec provides functions for tree traversal, such as finding the most recent common ancestor (MRCA) between two nodes and computing node depths.\n\nUse `get_common_ancestor` to find the MRCA between two nodes (similar to ape's `getMRCA` in R or ETE's `get_common_ancestor` in Python), and `get_node_depth` / `get_node_depths` to compute depths.\n\n- For **vectors**: topological depth is returned (all branch lengths = 1)\n- For **matrices**: actual branch lengths are used\n\n**Tip:** Use `get_node_depths` (plural) to compute all node depths at once, which is more efficient than calling `get_node_depth` repeatedly."
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": [
-    "### 3.3 Other utility functions"
+    "from phylo2vec.utils.vector import get_common_ancestor, get_node_depth, get_node_depths\n",
+    "\n",
+    "# Using v_fixed from earlier: [0, 2, 2, 5, 4, 1]\n",
+    "# Tree structure (7 leaves, so internal nodes are 7-12, root is 12):\n",
+    "#    ╭─┬╴0\n",
+    "#  ╭─┤ ╰─┬╴1\n",
+    "#  │ │   ╰╴6\n",
+    "# ─┤ ╰─┬╴4\n",
+    "#  │   ╰╴5\n",
+    "#  ╰─┬╴2\n",
+    "#    ╰╴3\n",
+    "\n",
+    "# Find MRCA of leaves 1 and 6 (they form a cherry, so MRCA is node 7)\n",
+    "mrca_1_6 = get_common_ancestor(v_fixed, 1, 6)\n",
+    "print(f\"MRCA of leaves 1 and 6: node {mrca_1_6}\")\n",
+    "assert mrca_1_6 == 7  # node 7 is the parent of leaves 1 and 6\n",
+    "\n",
+    "# Get the depth of this MRCA (topological, since v_fixed is a vector)\n",
+    "mrca_depth = get_node_depth(v_fixed, mrca_1_6)\n",
+    "print(f\"Depth of MRCA (node {mrca_1_6}): {mrca_depth}\")\n",
+    "assert mrca_depth == 3  # root(12) → 11 → 10 → 7, so depth is 3\n",
+    "\n",
+    "# Find MRCA of leaves 2 and 5 (MRCA is root, node 12)\n",
+    "mrca_2_5 = get_common_ancestor(v_fixed, 2, 5)\n",
+    "mrca_2_5_depth = get_node_depth(v_fixed, mrca_2_5)\n",
+    "print(f\"MRCA of leaves 2 and 5: node {mrca_2_5}, depth: {mrca_2_5_depth}\")\n",
+    "assert mrca_2_5 == 12  # root node\n",
+    "assert mrca_2_5_depth == 0  # root has depth 0\n",
+    "\n",
+    "# Get all node depths at once\n",
+    "all_depths = get_node_depths(v_fixed)\n",
+    "print(f\"All node depths: {all_depths}\")\n",
+    "assert all_depths[12] == 0  # root depth is 0"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### 3.3.1 Finding the number of leaves in a Newick"
+    "### 3.3 Optimisation\n",
+    "\n",
+    "In the Phylo2Vec paper, we showcased a hill-climbing optimisation scheme to demonstrate the potential of phylo2vec for maximum likelihood-based phylogenetic inference. We also contributed to [GradME](https://github.com/Neclow/GradME), a continuous extension of phylo2vec for gradient-based minimum evolution.\n",
+    "\n",
+    "These optimisation schemes (written in `phylo2vec.opt`) are demonstrated in the [demo_opt notebook](demo_opt.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.4 Other utility functions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 3.4.1 Finding the number of leaves in a Newick"
    ]
   },
   {
@@ -767,7 +852,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### 3.3.2 Removing and adding a leaf in a tree\n",
+    "#### 3.4.2 Removing and adding a leaf in a tree\n",
     "\n",
     "One might want to prune or add nodes in an existing tree (a common example is the subtree-prune-and-regraft operation).\n",
     "\n",
@@ -863,7 +948,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### 3.3.3 Applying and create an integer mapping from a Newick string\n",
+    "#### 3.4.3 Applying and create an integer mapping from a Newick string\n",
     "\n",
     "* Newick strings usually do not contain integers but real-life taxa (e.g., animal species, languages...). So it is important to provide another layer of conversion, where we can take in a Newick with string taxa, and convert it to a Newick with integer taxa, with a unique integer → taxon mapping."
    ]
@@ -984,7 +1069,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -997,10 +1082,10 @@
     }
    ],
    "source": [
-    "from glob import glob\n",
-    "\n",
     "import tempfile\n",
     "\n",
+    "from glob import glob\n",
+    "\n",
     "from phylo2vec.io._validation import FILE_EXTENSIONS\n",
     "\n",
     "\n",
@@ -1191,4 +1276,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
+}
\ No newline at end of file

From 1c033506f8fef85f2445020e3c245835a2b23a86 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Mon, 26 Jan 2026 14:42:13 +0000
Subject: [PATCH 2/2] style: pre-commit fixes

---
 docs/demo.ipynb     | 73 ++++++++++++++++++++++++++-------------------
 docs/demo_opt.ipynb | 62 +++++++++++++++++++-------------------
 2 files changed, 73 insertions(+), 62 deletions(-)

diff --git a/docs/demo.ipynb b/docs/demo.ipynb
index f0b3db06..b2476276 100644
--- a/docs/demo.ipynb
+++ b/docs/demo.ipynb
@@ -36,7 +36,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -177,7 +177,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -186,7 +186,7 @@
        "'((((0,2)9,4)10,(1,3)8)11,(5,6)7)12;'"
       ]
      },
-     "execution_count": 4,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -216,7 +216,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -225,7 +225,7 @@
        "[(1, 6), (4, 5), (2, 3), (0, 1), (0, 4), (0, 2)]"
       ]
      },
-     "execution_count": 7,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -247,7 +247,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -256,7 +256,7 @@
        "'(((0,(1,6)7)10,(4,5)8)11,(2,3)9)12;'"
       ]
      },
-     "execution_count": 8,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -285,7 +285,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -356,7 +356,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -396,7 +396,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -422,7 +422,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -446,7 +446,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -477,7 +477,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -503,7 +503,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -529,7 +529,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -555,7 +555,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -565,7 +565,7 @@
        "<IPython.core.display.Image object>"
       ]
      },
-     "execution_count": 2,
+     "execution_count": null,
      "metadata": {
       "image/png": {
        "width": 600
@@ -589,7 +589,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -616,7 +616,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -635,7 +635,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -769,7 +769,18 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "### 3.2. Tree traversal: common ancestors and node depths\n\nPhylo2Vec provides functions for tree traversal, such as finding the most recent common ancestor (MRCA) between two nodes and computing node depths.\n\nUse `get_common_ancestor` to find the MRCA between two nodes (similar to ape's `getMRCA` in R or ETE's `get_common_ancestor` in Python), and `get_node_depth` / `get_node_depths` to compute depths.\n\n- For **vectors**: topological depth is returned (all branch lengths = 1)\n- For **matrices**: actual branch lengths are used\n\n**Tip:** Use `get_node_depths` (plural) to compute all node depths at once, which is more efficient than calling `get_node_depth` repeatedly."
+   "source": [
+    "### 3.2. Tree traversal: common ancestors and node depths\n",
+    "\n",
+    "Phylo2Vec provides functions for tree traversal, such as finding the most recent common ancestor (MRCA) between two nodes and computing node depths.\n",
+    "\n",
+    "Use `get_common_ancestor` to find the MRCA between two nodes (similar to ape's `getMRCA` in R or ETE's `get_common_ancestor` in Python), and `get_node_depth` / `get_node_depths` to compute depths.\n",
+    "\n",
+    "- For **vectors**: topological depth is returned (all branch lengths = 1)\n",
+    "- For **matrices**: actual branch lengths are used\n",
+    "\n",
+    "**Tip:** Use `get_node_depths` (plural) to compute all node depths at once, which is more efficient than calling `get_node_depth` repeatedly."
+   ]
   },
   {
    "cell_type": "code",
@@ -839,7 +850,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -861,7 +872,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -875,7 +886,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 35,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -918,7 +929,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 36,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -927,7 +938,7 @@
        "True"
       ]
      },
-     "execution_count": 36,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -955,7 +966,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -988,7 +999,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -1027,7 +1038,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -1036,7 +1047,7 @@
        "True"
       ]
      },
-     "execution_count": 6,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1276,4 +1287,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
\ No newline at end of file
+}
diff --git a/docs/demo_opt.ipynb b/docs/demo_opt.ipynb
index 5467aa07..360c1486 100644
--- a/docs/demo_opt.ipynb
+++ b/docs/demo_opt.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "2ee97f3a",
+   "id": "0",
    "metadata": {},
    "source": [
     "# Phylogenetic tree inference using Phylo2Vec\n",
@@ -14,7 +14,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a9382e6e",
+   "id": "1",
    "metadata": {},
    "source": [
     "## 0. Imports & data"
@@ -22,8 +22,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
-   "id": "a6126425",
+   "execution_count": null,
+   "id": "2",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -46,7 +46,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "00d58991",
+   "id": "3",
    "metadata": {},
    "source": [
     "All available optimisation schemes can be found using `list_models`"
@@ -54,8 +54,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
-   "id": "8f92bcd5",
+   "execution_count": null,
+   "id": "4",
    "metadata": {},
    "outputs": [
     {
@@ -64,7 +64,7 @@
        "['GradME', 'HillClimbing']"
       ]
      },
-     "execution_count": 2,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -77,7 +77,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7247f975",
+   "id": "5",
    "metadata": {},
    "source": [
     "\n",
@@ -86,8 +86,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
-   "id": "f36acbfa",
+   "execution_count": null,
+   "id": "6",
    "metadata": {},
    "outputs": [
     {
@@ -97,7 +97,7 @@
        " <http.client.HTTPMessage at 0x7fce2453b940>)"
       ]
      },
-     "execution_count": 3,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -114,7 +114,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "60667e96",
+   "id": "7",
    "metadata": {},
    "source": [
     "## 1. Hill-climbing\n",
@@ -126,8 +126,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "id": "c6460eae",
+   "execution_count": null,
+   "id": "8",
    "metadata": {},
    "outputs": [
     {
@@ -242,7 +242,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a9e3e499",
+   "id": "9",
    "metadata": {},
    "source": [
     "`hc_result` contains several objects:\n",
@@ -254,8 +254,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
-   "id": "657dc4e3",
+   "execution_count": null,
+   "id": "10",
    "metadata": {},
    "outputs": [
     {
@@ -279,8 +279,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
-   "id": "94412310",
+   "execution_count": null,
+   "id": "11",
    "metadata": {},
    "outputs": [
     {
@@ -319,7 +319,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c7de1d99",
+   "id": "12",
    "metadata": {},
    "source": [
     "## 2. GradME\n",
@@ -339,8 +339,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "id": "8aeab120",
+   "execution_count": null,
+   "id": "13",
    "metadata": {},
    "outputs": [
     {
@@ -372,7 +372,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "039922d6",
+   "id": "14",
    "metadata": {},
    "source": [
     "`gradme_result` contains several objects:\n",
@@ -385,8 +385,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
-   "id": "256caad2",
+   "execution_count": null,
+   "id": "15",
    "metadata": {},
    "outputs": [
     {
@@ -410,8 +410,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
-   "id": "28072223",
+   "execution_count": null,
+   "id": "16",
    "metadata": {},
    "outputs": [
     {
@@ -450,8 +450,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
-   "id": "74fc53c7",
+   "execution_count": null,
+   "id": "17",
    "metadata": {},
    "outputs": [
     {
@@ -460,7 +460,7 @@
        "<Axes: >"
       ]
      },
-     "execution_count": 7,
+     "execution_count": null,
      "metadata": {},
      "output_type": "execute_result"
     },