Skip to content

processCollision calls defineAndReserveAlias but can't cancel previously-called (concurrent) call #62

@Poikilos

Description

@Poikilos

:edit: There is an underlying issue where if there is a used alias, the collision is not processed before we finalize our reservation, so this is a much larger issue with regard to threading (defineAndReserveAlias method requires send, delay, and receieve in the same method, so this should probably be changed to a set of queued tasks triggered by a runlevel) and has been renamed. For more details see "Tasks" below and #62 (comment).

:edit: This issue was renamed, from "...collision occurs..." to "no response to alias reservation..." after the root issue of it not working (race condition) was identified, as explained in the following comments:

If I use a wired LAN connection to use internet, and Wi-Fi to connect to a Command Station, the result is collisions ("alias collision in CanFrame header").

  • If I unplug the ethernet cable, the problem goes away.
  • This makes development somewhat awkward. For example, I unplug ethernet to test the program, then have to plug it back in to look up issues or documentation.
  • If this is possible to avoid, we should change the examples and/or openlcb module to help people be able to be online while using LCC in general:
    • For example, so they can download a firmware update from the internet then send it, using the OpenLCB proposed standard firmware update.
    • Users may not know disconnecting from internet will solve collisions and may get stuck.

OS: Windows 11

Steps to reproduce issue:

  • Connect computer to Command Station's Wi-Fi.
  • Connect computer's LAN port to router with internet.
  • Run examples_gui (Optional)
  • Choose IP address/hostname, Far Node ID
    • Choose one on the Wi-Fi connection.
      • I only tested with a Command Station in AP mode.
  • Run example_memory_length_query.
    • collision loop occurs (See log below)
    • issue also affects example_frame_interface, example_node_impletmentation and possibly others
    • Unplugging ethernet cable doesn't immediately halt the loop, but unplugging it before starting the example does solve the problem.
log SR: :X107037B2N050101010301; SR: :X17050951N; SR: :X16101951N; SR: :X15010951N; SR: :X14301951N; SR: :X10700951N; SR: :X10701951N050101010301; SR: :X10702951N; RR: :X10702DE0N; RL: CanFrame header: 0x10702DE0 [] SR: :X10701951N050101010301; RR: :X10703DE0N050101010301; RL: CanFrame header: 0x10703DE0 [5, 1, 1, 1, 3, 1] RR: :X1705032EN; RL: CanFrame header: 0x1705032E [] RR: :X1610132EN; RL: CanFrame header: 0x1610132E [] RR: :X1501032EN; RL: CanFrame header: 0x1501032E [] RR: :X1430132EN; RL: CanFrame header: 0x1430132E [] RR: :X1705032EN; RL: CanFrame header: 0x1705032E [] RR: :X1610132EN; RL: CanFrame header: 0x1610132E [] RR: :X1501032EN; RL: CanFrame header: 0x1501032E [] RR: :X1430132EN; RL: CanFrame header: 0x1430132E [] RR: :X1070032EN; RL: CanFrame header: 0x1070032E [] RR: :X1070132EN050101010301; RL: CanFrame header: 0x1070132E [5, 1, 1, 1, 3, 1] collide WARNING:root:alias collision in CanFrame header: 0x1070132E [5, 1, 1, 1, 3, 1], we restart with AMR and attempt to get new alias SR: :X10703842N050101010301; SR: :X170507B2N; SR: :X161017B2N; SR: :X150107B2N; SR: :X143017B2N; SR: :X107007B2N; SR: :X107017B2N050101010301; SR: :X107027B2N; RR: :X1070232EN; RL: CanFrame header: 0x1070232E [] SR: :X107017B2N050101010301; RR: :X1070332EN050101010301; RL: CanFrame header: 0x1070332E [5, 1, 1, 1, 3, 1] RR: :X17050071N; RL: CanFrame header: 0x17050071 [] RR: :X16101071N; RL: CanFrame header: 0x16101071 [] RR: :X1070100EN06010000C358; RL: CanFrame header: 0x1070100E [6, 1, 0, 0, 195, 88] RR: :X1070032EN; RL: CanFrame header: 0x1070032E [] RR: :X1070132EN050101010301; RL: CanFrame header: 0x1070132E [5, 1, 1, 1, 3, 1] collide WARNING:root:alias collision in CanFrame header: 0x1070132E [5, 1, 1, 1, 3, 1], we restart with AMR and attempt to get new alias SR: :X10703951N050101010301; SR: :X17050DC3N; SR: :X16101DC3N; SR: :X15010DC3N; SR: :X14301DC3N; SR: :X10700DC3N; SR: :X10701DC3N050101010301; SR: :X10702DC3N; RR: :X1070232EN; RL: CanFrame header: 0x1070232E [] SR: :X10701DC3N050101010301; RR: :X1070132EN050101010301; RL: CanFrame header: 0x1070132E [5, 1, 1, 1, 3, 1] collide WARNING:root:alias collision in CanFrame header: 0x1070132E [5, 1, 1, 1, 3, 1], we restart with AMR and attempt to get new alias

Tasks

  • There is now a Dispatcher with a socket handler loop (this can be refactored slightly to not even require threading as I described, and example scripts could be reduced to utilize it, or separate threaded and unthreaded example scripts could be made, which I can make at that point)
  • The callback provided to CanPhysicalLayerGridConnect is now equeues a CanFrame rather than writing a string to the port/socket directly, so that the actual writing can be done in the same loop as recv to prevent overlap.
    • Which also makes it symmetrical to the frame received handler, which already accepted a CanFrame rather than a string
  • There is now a PortInterface, so we can eventually make a serial port example of Dispatcher.
  • recv is now non-blocking, and before it is called, and queued _sends take priority (since there may be a good amount of traffic on some LCC networks and we don't want to miss sends except ones marked as duplicate aliases).
  • The encoding is still done by the Physical layer, as an "encoder" attribute is set to the cpl (CanFramePhysicalLayerGridConnect instance) and its interface superclass now has an encodeAsString method
    • the socket handler loop uses that encoder (PhysicalLayer)'s encodeAsString method to get a string to send, after ensuring the alias was not marked as a duplicate.
    • alias is now stored redundantly as a public property so there is no need for decoding at the time of sending (property instead of attribute, so we could potentially have code in the getter that unpacks the alias from the bytearray)
  • If a collision occurs, the alias is added to a duplicate alias list so that frames using the old alias are ignored by the socket handler loop.
  • renamed receiveString and receiveChars to pushString and pushChars since they are actually low-level on-receive handlers, and to distinguish them from actual port/socket receive callstacks.
  • Make sure we wait for the sendFrames queue to be empty last CID of the 4-cid sequence to be sent (socket send and flush) before the delay, so that we can be sure that responses will actually be sent by remote nodes in the expected timespan.
  • [-] move stack wiring to constructors
    • moved linkPhysicalLayer construction code (required to operate, does not require OS calls/etc) to constructor
    • moved some initialization from constructor to connection method(s)
    • [-] remaining aspects will have to be decided in:
  • if bytearray is used to construct a CanFrame, decode it to set own alias #68

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions