-
Notifications
You must be signed in to change notification settings - Fork 48
Description
In #193, a set of traffic_* streams were added to the tap, with a customised metadata property, which deselects them if no catalog was passed as input to the tap.
Unfortunately, when running the tap with poetry run tap-github --config /tmp/tmpmt8fq0pn/tmp7896kkwh.json --test=schema
with this config (which does not seem to matter much, the main thing being the test=schema cli option):
{"metrics_log_level": "error", "auth_token": "<mytoken>", "additional_auth_tokens": [], "rate_limit_buffer": 1000, "start_date": "2021-05-24 13:44:42.693145", "skip_parent_streams": true, "repositories": []}the tap issues invalid SCHEMA messages like:
{
"type": "SCHEMA",
"stream": "traffic_pageviews",
"schema": {"properties": {}, "type": "object"},
"key_properties": ["repo", "org", "timestamp"]
}Specifically, properties is empty, so downstream targets cannot lookup the key_properties.
The line that causes the problem is here https://github.com/MeltanoLabs/tap-github/pull/193/files#diff-06dc9c6115cbc069ce355913de0c101fedf6956d6f6b4873c5112434596934d3R2260
I have not dug into the details yet, but it looks like the schema production does not correctly take the selection metadata into account.
Pinging @edgarrmondragon as you suggested that code, and you might have a fix for it :)
I also think the sdk should not allow a tap to produce invalid records like this. Is there a way to test against it without causing too much overhead? Obviously, we could validate each record before sending it out, but that might be a bit heavy ;)
Interestingly there's a test for this _test_replication_keys_in_schema but it does not validate against the schema messages that are sent.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status