Momentary disconnections can lock streams in a zombie state #9
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
If a disconnection is short enough, the middleware can process the reconnection before the disconnection, resulting in the status endpoint not reporting the stream is live until the streamer disconnects and reconnects again.
This is likely exacerbated by setting
BlockDuplicateStreamName
tofalse
in OvenMediaEngine as a stream key re-use will process a disconnect and a reconnect at the same time, possibly out of order. Need to confirm.One potential fix for this is to ask the API for a canonical stream list every state change, but that's presented problems in the past too where OvenMediaEngine won't actually report a stream is live until it receives the first video frames, so a handshake that's slow enough can result in the middleware asking for the stream list and getting an incorrect one.
Another possibility is a regular heartbeat to keep the stream list correct. This has the bonus of being able to detect in the middleware if the stream server dies, and that can be used for alerting/status endpoint stuff.
A third option if this is caused by stream key re-use is to detect a stream trying to be opened while it's already open, then applying some logic to anticipate the immediate disconnection after.
Part of this might be an upstream issue.
Going to grab 0.18.0 and try to replicate against that.
Still seeing this in 0.18.0
So I need to dive into this.
Found the issue. The admission webhook that manages the stream list in cache has a bug here
# Remove or add the changed stream, as appropriate
if cherrypy.request.update_opening and cherrypy.request.update_stream not in stream_list:
stream_list.append(cherrypy.request.update_stream)
if not cherrypy.request.update_opening and cherrypy.request.update_stream in stream_list:
stream_list.remove(cherrypy.request.update_stream)
We check for dupes on add, but not remove. As a result we'll only ever have one of a stream in the list at a time, and will remove it on any closure.
Any connection attempt to OME causes an open, then a subsequent close if it fails. So we need to store multiple opens and closes, then collapse this list to a set when evaluating it.