Skip to content

Comments

feat: parse wfs for meta and layers infos#385

Open
abulte wants to merge 12 commits intodatagouv:mainfrom
ecolabdata:feat/wfs
Open

feat: parse wfs for meta and layers infos#385
abulte wants to merge 12 commits intodatagouv:mainfrom
ecolabdata:feat/wfs

Conversation

@abulte
Copy link
Contributor

@abulte abulte commented Feb 3, 2026

This add support for WFS services 🗺️ to hydra 🐙.

If a WFS is url-detected, it will fetch general info (WFS version, output formats) and layers info (names and supported projections). It will also compute detected_layer_name if a valid layer name is found in URL or resource title (synching title to catalog table is needed for that).

The general workflow is respected (this can be challenged) : WFS is handled as any other resource type, and then augmented by the scraped infos.

The scraped infos are stored in checks.ogc_metadata as JSONB. They are sent is to udata in analysis:parsing:ogc_metadata for every resource. I went with a full object since the layer structure has to be complex anyway (ie more than a list of strings).

There's a bit of future-proofing here: ogc_metadata is ready for other formats if needed (eg WMS).

owslib is used to parse the WFS XML: lib seems mature and it's always nice to avoid parsing XMLs manually 😬.

I have tested end-to-end with a local udata instance.

Bonus: a bit of refactoring in cli.py with _find_check helper.

❯ uv run udata-hydra analyse-ogc --url 'https://geobretagne.fr/geoserver/hlc/wfs?REQUEST=GetCapabilities&SERVICE=WFS&typename=qualite_eaux_baignade_station'
2026-02-10 15:59:12 dev.local owslib.feature.wfs100[39210] WARNING pyproj not installed
2026-02-10 15:59:13 dev.local udata-hydra[39210] ERROR Could not find a check linked to the specified URL
2026-02-10 15:59:13 dev.local udata-hydra[39210] DEBUG Starting OGC analysis for https://geobretagne.fr/geoserver/hlc/wfs?REQUEST=GetCapabilities&SERVICE=WFS&typename=qualite_eaux_baignade_station
2026-02-10 15:59:13 dev.local udata-hydra[39210] DEBUG OGC analysis complete for https://geobretagne.fr/geoserver/hlc/wfs?REQUEST=GetCapabilities&SERVICE=WFS&typename=qualite_eaux_baignade_station: 10 layers found
2026-02-10 15:59:13 dev.local udata-hydra[39210] INFO OGC analysis completed successfully.
2026-02-10 15:59:13 dev.local udata-hydra[39210] DEBUG {
  "format": "wfs",
  "version": "2.0.0",
  "layers": [
    {
      "name": "hlc:classement_voirie_po",
      "default_crs": "EPSG:3948"
    },
    {
      "name": "hlc:classement_voirie_li",
      "default_crs": "EPSG:3948"
    },
    {
      "name": "hlc:dechetteries_pt",
      "default_crs": "EPSG:3948"
    },
    {
      "name": "hlc:lieuxcovoiturage_hlc_pt",
      "default_crs": "EPSG:2154"
    },
    {
      "name": "hlc:mediatheque_hlc_pt",
      "default_crs": "EPSG:3948"
    },
    {
      "name": "hlc:pav_zoneinfluence_parcelle_po",
      "default_crs": "EPSG:3948"
    },
    {
      "name": "hlc:dechetpav_pt",
      "default_crs": "EPSG:3948"
    },
    {
      "name": "hlc:qualite_eaux_baignade_station",
      "default_crs": "EPSG:3948"
    },
    {
      "name": "hlc:voirie_statistiques_par_ville",
      "default_crs": "EPSG:3948"
    },
    {
      "name": "hlc:sd_observation_terrain_flore_rare_hlc_pt",
      "default_crs": "EPSG:3948"
    }
  ],
  "output_formats": [
    "application/gml+xml; version=3.2",
    "GML2",
    "KML",
    "SHAPE-ZIP",
    "application/geopackage+sqlite3",
    "application/json",
    "application/vnd.google-earth.kml xml",
    "application/vnd.google-earth.kml+xml",
    "application/vnd.ogc.fg+json",
    "application/x-gpkg",
    "csv",
    "dxf",
    "geopackage",
    "geopkg",
    "gml3",
    "gml32",
    "gpkg",
    "gpx",
    "json",
    "mif",
    "ods",
    "tab",
    "text/csv",
    "text/xml; subtype=gml/2.1.2",
    "text/xml; subtype=gml/3.1.1",
    "text/xml; subtype=gml/3.2",
    "xlsx"
  ],
  "detected_layer_name": "hlc:qualite_eaux_baignade_station"
}

Fix ecolabdata/ecospheres#892
Related ecolabdata/ecospheres#846

@abulte
Copy link
Contributor Author

abulte commented Feb 3, 2026

TODO

  •  final review of tests
  • should we keep urn:ogc:def:crs:EPSG::3948 for projections or a more standard EPSG::3948? Short codes are probably better for our usages (frontend libs and QGIS), and it's supported by owslib.
  • use more hints for detection (resource.format)
  • future-proof naming (ogc vs wfs)

@abulte abulte marked this pull request as ready for review February 4, 2026 15:57
Copy link
Contributor

@maudetes maudetes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It works nicely 🎉
Main main comment is on the scope of the hydra detection 😛

Copy link
Contributor

@maudetes maudetes left a comment

metadata["output_formats"] = list(output_formats.get("values") or [])

# Extract feature type information
for name, layer in wfs.contents.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, the feature type names get prefixed with a namespace? Ex for this resource, the layer name, parsed here is sa:ZRE_FXX, which doesn't match with the title.
But it isn't prefixed in the layer name in WMS GetCapabilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I added a match on detected layer without namespace (if and only if one match, to avoid namespace collision) 0052167.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detected layer name includes the namespace, it will probably work better when queried from frontend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add title in _insert_resource_into_catalog

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abulte
Copy link
Contributor Author

abulte commented Feb 10, 2026

@maudetes I changed the wording from "feature" to "layer" since layer is more generic if we ever add support for other OGC services e30c416.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crawl des couches et formats WFS dans Hydra

2 participants