Skip to content

Optimized Graph Caching #43

@bridgesign

Description

@bridgesign

The current graph implementation relies heavily on iterating through lists. Given the size of the graph is not in $10^6$ order, this should ideally be okay. However, python is particularly not good at handling for-loops and known for its slow speed when iterating through lists. Given that we need to do sweeps in a local region for partial information, which is a function of the geometry of the area and not the connectivity of the graph, it makes sense to create a cache that will keep limited information about the surrounding geography.

Another possible improvement is to write the actual implementation in C and dispatch via python. Given that it is not necessarily clear how batched queries can be made, and the fact that there are no benchmarks for simple hash-map based optimizations, this might turn out to be a premature optimization.

The issue will be divided into:

  • Developing a benchmark method which recreates actually seen run times for queries from sensors
  • Create simple hash-map based optimizations in python itself and check its effectiveness
  • Analyze the requirement for compiled graph implementation

Developing the benchmark

  • Neighbor access benchmark: For a given graph, the access time for neighbors of a given node should be reduced
  • Edge access benchmark: Given a node, minimize the time required to obtain the incoming and outgoing edges
  • Area sweep benchmark: Given a node or point (x,y), a direction, angle and distance, find all entities in the section for the graph.

For developing the benchmark, we use the rich-bench. A copy of the original graph implementation is added along with the new implementation. Then individual functions will simulate the final output.

For ensuring that the results are not affected by graph structure, the tests will happen on multiple randomly generated graphs with final result being the mean of all values. We will only care for the mean time.

Preliminary Graph Specific Optimization

Neighbor access and Edge access are specific to graphs but the Area Sweep is a more general problem of data chunking.

Initially, neighbor access can be solved directly if edge access is solved. This can be done directly using a HashMap. The extra memory overhead will be minimal for moderately sized graph (< 1 million edges) as we will be using ids in place of actual objects. But when considering very large graphs, we will run into issues as it will become important to chunk the hashmap itself.

For this reason, a slightly general but still highly specific to Area Sweep benchmark chunking system will be developed and integrated with the create_iter methods in the graph to get some initial structure.

Future of the Optimization

The initial chunking method will become the base for developing the memory engine and its custom stores for Gamms specific data.

Metadata

Metadata

Assignees

Labels

CoreDevelopment related to core functionalityHighHigh priority

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions