You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The agent Kontena::WebsocketClient#connect_client uses a separate defer thread to call into ws.read(...), and forward calls to the actor thread proper using actor.on_*. These are sync calls, which means that they will block websocket reads until the actor call returns. The actor thread is responsible for sending messages, such that the actor send_message tasks will call Kontena::Websocket::Client#send . If the actor thread is blocked on a call to send, then the read thread's sync actor calls will also block.
This causes a deadlock when the ws.on_pong block is called with the client mutex held (bug: kontena/kontena-websocket-client#17), and the block does a sync actor.on_pong call. This can race with a simultaneous actor send_message calls to Kontena::Websocket::Client#send, which blocks the actor thread while acquiring the websocket client mutex held by the read thread.
Testing with an very short ping_interval makes this easy to repro:
This results in an indefinite deadlock where the agent stops responding to server pings, and the server disconnects the node. The agent is deadlocked on the websocket client driver mutex, and never sees this:
W, [2017-07-31T06:45:53.268525 #21] WARN -- WebsocketBackend: Close node XI4K:NPOL:EQJ4:S4V7:EN3B:DHC5:KZJD:F3U2:PCAN:46EV:IO4A:63S5 connection after 5.00s timeout
I, [2017-07-31T06:45:53.283937 #21] INFO -- Agent::NodeUnplugger: Disconnected node development/core-01 connected at 2017-07-31 06:45:09 UTC
The text was updated successfully, but these errors were encountered:
The agent
Kontena::WebsocketClient#connect_client
uses a separatedefer
thread to call intows.read(...)
, and forward calls to the actor thread proper usingactor.on_*
. These are sync calls, which means that they will block websocket reads until the actor call returns. The actor thread is responsible for sending messages, such that the actorsend_message
tasks will callKontena::Websocket::Client#send
. If the actor thread is blocked on a call tosend
, then the read thread's sync actor calls will also block.This causes a deadlock when the
ws.on_pong
block is called with the client mutex held (bug: kontena/kontena-websocket-client#17), and the block does a syncactor.on_pong
call. This can race with a simultaneous actorsend_message
calls toKontena::Websocket::Client#send
, which blocks the actor thread while acquiring the websocket client mutex held by the read thread.Testing with an very short
ping_interval
makes this easy to repro:This results in an indefinite deadlock where the agent stops responding to server pings, and the server disconnects the node. The agent is deadlocked on the websocket client driver mutex, and never sees this:
The text was updated successfully, but these errors were encountered: