Skip to content

Invalid SCHEMA messages are produced for deselected streams  #212

@laurentS

Description

@laurentS

In #193, a set of traffic_* streams were added to the tap, with a customised metadata property, which deselects them if no catalog was passed as input to the tap.

Unfortunately, when running the tap with poetry run tap-github --config /tmp/tmpmt8fq0pn/tmp7896kkwh.json --test=schema

with this config (which does not seem to matter much, the main thing being the test=schema cli option):

{"metrics_log_level": "error", "auth_token": "<mytoken>", "additional_auth_tokens": [], "rate_limit_buffer": 1000, "start_date": "2021-05-24 13:44:42.693145", "skip_parent_streams": true, "repositories": []}

the tap issues invalid SCHEMA messages like:

{
"type": "SCHEMA", 
"stream": "traffic_pageviews",
"schema": {"properties": {}, "type": "object"},
"key_properties": ["repo", "org", "timestamp"]
}

Specifically, properties is empty, so downstream targets cannot lookup the key_properties.

The line that causes the problem is here https://github.com/MeltanoLabs/tap-github/pull/193/files#diff-06dc9c6115cbc069ce355913de0c101fedf6956d6f6b4873c5112434596934d3R2260

I have not dug into the details yet, but it looks like the schema production does not correctly take the selection metadata into account.

Pinging @edgarrmondragon as you suggested that code, and you might have a fix for it :)

I also think the sdk should not allow a tap to produce invalid records like this. Is there a way to test against it without causing too much overhead? Obviously, we could validate each record before sending it out, but that might be a bit heavy ;)
Interestingly there's a test for this _test_replication_keys_in_schema but it does not validate against the schema messages that are sent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions