Skip to content

Conversation

@laudares
Copy link
Contributor

@laudares laudares commented May 24, 2021

Changing the script so a connection directed to a master reconnects upon failure (due to failover/switchover). The problem, as I left wrote as a comment in the code, 'a recently promoted/demoted server may still be transitioning and initially reply with the previous role' - that's valid both ways. For example, note how it switches to "read-mode" just after failing over:

$ ./HAtester.py 5000
 Working with:   MASTER - 192.168.1.13
     Inserted: 2021-05-24 22:09:43.613525

 Working with:   MASTER - 192.168.1.13
     Inserted: 2021-05-24 22:09:44.624923
(...)
Trying to connect
Unable to connect to database :
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
(...)
Trying to connect
 Working with:    REPLICA - 192.168.1.13
     Retrieved: 2021-05-24 22:10:01.804913

 Working with:    REPLICA - 192.168.1.13
     Retrieved: 2021-05-24 22:10:01.804913

 Working with:    REPLICA - 192.168.1.13
     Retrieved: 2021-05-24 22:10:01.804913

Trying to connect
 Working with:   MASTER - 192.168.1.11
     Inserted: 2021-05-24 22:10:10.888396

It is more critical when doing reads and then the replica is promoted as master and does a write (!):

$ ./HAtester.py 5001
 Working with:    REPLICA - 192.168.1.11
     Retrieved: 2021-05-24 22:20:50.672920

 Working with:    REPLICA - 192.168.1.11
     Retrieved: 2021-05-24 22:20:50.672920
(...)
Working with:    REPLICA - 192.168.1.11
     Retrieved: 2021-05-24 22:20:50.672920

 Working with:   MASTER - 192.168.1.11
Trying to connect
 Working with:    REPLICA - 192.168.1.12
     Retrieved: 2021-05-24 22:20:50.672920

 Working with:    REPLICA - 192.168.1.12
     Retrieved: 2021-05-24 22:20:50.672920

Trying to connect
 Working with:   MASTER - 192.168.1.11
     Inserted: 2021-05-24 22:21:53.403129

 Working with:   MASTER - 192.168.1.11
     Inserted: 2021-05-24 22:21:54.414794

 Working with:   MASTER - 192.168.1.11
     Inserted: 2021-05-24 22:21:55.426355

 Working with:   MASTER - 192.168.1.11
     Inserted: 2021-05-24 22:21:56.437823

 Working with:   MASTER - 192.168.1.11
     Inserted: 2021-05-24 22:21:57.447859

 Working with:   MASTER - 192.168.1.11
     Inserted: 2021-05-24 22:21:58.458605

 Working with:   MASTER - 192.168.1.11
     Inserted: 2021-05-24 22:21:59.468775

Trying to connect
 Working with:    REPLICA - 192.168.1.12
     Retrieved: 2021-05-24 22:21:59.468775
(...)

I'm sending the pull request but I'm not 100% confident this is the way.
Alternatively, we can put a "sleep" of a few seconds to avoid this - unless there is a different/transparent way we can prevent writes to a read-only connection (to port 5001).

…pon failure (due to failover/switchover). The problem, as I left wrote as a comment in the code, 'a recently promoted/demoted server may still be transitioning and initially reply with the previous role' - that's valid both ways
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant