feat: parse wfs for meta and layers infos#385
Conversation
|
TODO
|
maudetes
left a comment
There was a problem hiding this comment.
Nice! It works nicely 🎉
Main main comment is on the scope of the hydra detection 😛
There was a problem hiding this comment.
So cool!
For future reference, working examples on layer detection:
- https://www.data.gouv.fr/datasets/abondance-de-griset-spondyliosoma-cantharus-observee-lors-des-campagnes-scientifiques-sur-la-periode-2005-2009-1?resource_id=933757a9-0b8e-4acf-bb23-0efe24078ee4
- https://www.data.gouv.fr/datasets/disponibilite-temps-reel-des-parkings-mel-3?resource_id=2e250524-c8b0-4e3d-b0d8-049a966105f3
- https://www.data.gouv.fr/datasets/i4-etablissement-des-canalisations-electriques?resource_id=f66a84ba-759f-42f3-8abc-b00336a79b92
non-working examples:
- https://www.data.gouv.fr/datasets/zones-de-repartition-des-eaux-zre-metropole?resource_id=3b1ab5f0-0fb0-4fe5-be77-2eaa6ecbe3ba
- almost every WFS on the first page of https://www.data.gouv.fr/datasets/search?format=wfs (that don't have any layer information)
| metadata["output_formats"] = list(output_formats.get("values") or []) | ||
|
|
||
| # Extract feature type information | ||
| for name, layer in wfs.contents.items(): |
There was a problem hiding this comment.
Somehow, the feature type names get prefixed with a namespace? Ex for this resource, the layer name, parsed here is sa:ZRE_FXX, which doesn't match with the title.
But it isn't prefixed in the layer name in WMS GetCapabilities.
There was a problem hiding this comment.
Interesting! I added a match on detected layer without namespace (if and only if one match, to avoid namespace collision) 0052167.
There was a problem hiding this comment.
The detected layer name includes the namespace, it will probably work better when queried from frontend.
There was a problem hiding this comment.
Can you add title in _insert_resource_into_catalog
This add support for WFS services 🗺️ to hydra 🐙.
If a WFS is url-detected, it will fetch general info (WFS version, output formats) and layers info (names and supported projections). It will also compute
detected_layer_nameif a valid layer name is found in URL or resource title (synching title to catalog table is needed for that).The general workflow is respected (this can be challenged) : WFS is handled as any other resource type, and then augmented by the scraped infos.
The scraped infos are stored in
checks.ogc_metadataas JSONB. They are sent is to udata inanalysis:parsing:ogc_metadatafor every resource. I went with a full object since the layer structure has to be complex anyway (ie more than a list of strings).There's a bit of future-proofing here:
ogc_metadatais ready for other formats if needed (eg WMS).owslibis used to parse the WFS XML: lib seems mature and it's always nice to avoid parsing XMLs manually 😬.I have tested end-to-end with a local udata instance.
Bonus: a bit of refactoring in
cli.pywith_find_checkhelper.Fix ecolabdata/ecospheres#892
Related ecolabdata/ecospheres#846