Skip to content

Inconsistent results caused by nondeterministic sort order #1

@cyrill-k

Description

@cyrill-k

First of all, thank you for running the GRIP service and providing it for free to the public and help researchers such as myself to analyze BGP behavior in the wild!

I recently tried to build a script that fetches suspicious BGP events from the GRIP API (https://api.grip.inetintel.cc.gatech.edu/json/events). If I understand correctly, the service running the API is based on this repository.

The issue I encountered is that the returned results were not consistent among multiple identical requests. Looking at this repository, my assumption is that the results are not deterministically sorted. Sorting is based on the view_ts parameter, but if multiple events have the same view_ts parameters, the order of these events in the query result is undefined, resulting in different orders for different invocations.

  • Concrete Example of the issue:
    Running the following query 10 times, always returns an event with view_ts = 1653825300, but the event id may change between submoas-1653825300-11351=42960 and submoas-1653825300-132721=58678.

    for x in $(seq 1 10); do wget --quiet -O - "https://api.grip.inetintel.cc.gatech.edu/json/events?event_type=submoas&start=1999&full=true&min_duration=300&max_duration=300&min_susp=80&max_susp=80&length=1&ts_start=2024-11-04T11:30:00&ts_end=2024-11-04T11:30:00" | python3 -m json.tool - | grep '"id"\|"view_ts"'; done

    The script returns the following output for me:

            "id": "submoas-1653825300-11351=42960",
                    "view_ts": 1653825300
            "view_ts": 1653825300
            "id": "submoas-1653825300-11351=42960",
                    "view_ts": 1653825300
            "view_ts": 1653825300
            "id": "submoas-1653825300-11351=42960",
                    "view_ts": 1653825300
            "view_ts": 1653825300
            "id": "submoas-1653825300-11351=42960",
                    "view_ts": 1653825300
            "view_ts": 1653825300
            "id": "submoas-1653825300-132721=58678",
                    "view_ts": 1653825300
            "view_ts": 1653825300
            "id": "submoas-1653825300-11351=42960",
                    "view_ts": 1653825300
            "view_ts": 1653825300
            "id": "submoas-1653825300-132721=58678",
                    "view_ts": 1653825300
            "view_ts": 1653825300
            "id": "submoas-1653825300-11351=42960",
                    "view_ts": 1653825300
            "view_ts": 1653825300
            "id": "submoas-1653825300-132721=58678",
                    "view_ts": 1653825300
            "view_ts": 1653825300
            "id": "submoas-1653825300-132721=58678",
                    "view_ts": 1653825300
            "view_ts": 1653825300
  • Proposed solution:
    When defining the sort order for the elasticsearch backend, also define a tiebreaker (the event id) as a second key for events with identical view_ts for sorting, as suggested in the elasticsearch doc (https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after):

    "..., we recommend that you include a tiebreaker field in your sort. This tiebreaker field should contain a unique value for each document. If you don’t include a tiebreaker field, your paged results could miss or duplicate hits."

    The following change in line https://github.com/InetIntel/grip-api-v2/blob/main/app/elastic.py#L339 should solve the issue:
    Replace 'sort': "view_ts:desc" with 'sort': ["view_ts:desc", "id:asc"]

    Unfortunately, since I do not have access to the database and the elasticsearch backend, I cannot test whether this change will effectively solve the issue or not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions