Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.4] Agent websocket client deadlock on pong #2649

Closed
SpComb opened this issue Jul 31, 2017 · 0 comments
Closed

[1.4] Agent websocket client deadlock on pong #2649

SpComb opened this issue Jul 31, 2017 · 0 comments

Comments

@SpComb
Copy link
Contributor

SpComb commented Jul 31, 2017

The agent Kontena::WebsocketClient#connect_client uses a separate defer thread to call into ws.read(...), and forward calls to the actor thread proper using actor.on_*. These are sync calls, which means that they will block websocket reads until the actor call returns. The actor thread is responsible for sending messages, such that the actor send_message tasks will call Kontena::Websocket::Client#send . If the actor thread is blocked on a call to send, then the read thread's sync actor calls will also block.

This causes a deadlock when the ws.on_pong block is called with the client mutex held (bug: kontena/kontena-websocket-client#17), and the block does a sync actor.on_pong call. This can race with a simultaneous actor send_message calls to Kontena::Websocket::Client#send, which blocks the actor thread while acquiring the websocket client mutex held by the read thread.

Testing with an very short ping_interval makes this easy to repro:

D, [2017-07-31T06:45:21.278613 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.279037 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.444282 #1] DEBUG -- Kontena::WebsocketClient: -> on_pong
D, [2017-07-31T06:45:21.444439 #1] DEBUG -- Kontena::WebsocketClient: server ping 0.00s of 5.00s timeout
D, [2017-07-31T06:45:21.444508 #1] DEBUG -- Kontena::WebsocketClient: <- on_pong
D, [2017-07-31T06:45:21.487942 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.488942 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.744240 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.754605 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.831239 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.827051 #1] DEBUG -- Kontena::Workers::WeaveWorker: skip start for container=/kontena-etcd without overlay_cidr
D, [2017-07-31T06:45:21.826360 #1] DEBUG -- Kontena::Workers::LogWorker: /kontena-etcd/{}
D, [2017-07-31T06:45:21.831440 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.831747 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.836063 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.836793 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.837544 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.838410 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.838601 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.839547 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.846389 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.853654 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.877812 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.880655 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.886907 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.903893 #1] DEBUG -- Kontena::RpcClient: waited 0.1s of 30.0s until: request /containers/event has response wth id=524267646 yielded Array
D, [2017-07-31T06:45:21.910731 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.911128 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.911606 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send
D, [2017-07-31T06:45:21.911811 #1] DEBUG -- Kontena::WebsocketClient: <- ws.send
D, [2017-07-31T06:45:21.923233 #1] DEBUG -- Kontena::RpcClient: waited 0.1s of 30.0s until: request /containers/save has response wth id=1666634018 yielded Array
D, [2017-07-31T06:45:21.953551 #1] DEBUG -- Kontena::WebsocketClient: -> on_pong
D, [2017-07-31T06:45:21.956256 #1] DEBUG -- Kontena::WebsocketClient: -> ws.send

This results in an indefinite deadlock where the agent stops responding to server pings, and the server disconnects the node. The agent is deadlocked on the websocket client driver mutex, and never sees this:

W, [2017-07-31T06:45:53.268525 #21]  WARN -- WebsocketBackend: Close node XI4K:NPOL:EQJ4:S4V7:EN3B:DHC5:KZJD:F3U2:PCAN:46EV:IO4A:63S5 connection after 5.00s timeout
I, [2017-07-31T06:45:53.283937 #21]  INFO -- Agent::NodeUnplugger: Disconnected node development/core-01 connected at 2017-07-31 06:45:09 UTC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant