For everyone here who seems to be working with websockets, how do you deal with ...

jhgg · on Feb 13, 2017

At work, we dispatch messages to clients using websockets (on a pretty massive scale).

>How do you deal with the potential that a browser goes offline and misses messages?

It depends on how long the browser goes offline for is it a transient disconnect (< 2 minutes) or is it an extended period of being offline. We deal with those two specific cases differently. In the first case, we use a sequence based system wherein each message contains an incremented counter. The server also holds a buffer of the last N messages sent to the client (N calculated by the expected velocity of messages being sent to the client). If the client disconnects, the server will keep that buffer alive for a few minutes (accumulating new messages in the buffer as well). The client can then reconnect, telling the server what the sequence number of the last message it received was. The server can then replay missed messages. In the case of an extended disconnect - we'd treat this as a fresh connection, where we do a full re-sync of state. The client then relies on the real-time event stream to keep its state updated.

>Do you have a way to cluster websocket servers so that events are propagated to all clients?

Yes. Our cluster is built in Erlang/Elixir and consists of several components for fanning out messages. For example, we're able to fan-out 1 message to ~25,000 clients in <0.1ms. (The use-case here is in a massive chat-room - we're able to fan out a message to all the connected users quickly).

AwesomeBean · on Feb 13, 2017

I'm working on a browser game, that does all communication to the game server through WebSockets.

Here I have the clients ping the server every 2 seconds, and if I haven't received a message with in 10 seconds (including other messages than ping) I consider the client dead.

Each socket is assigned to an individual player, and if that player opens a second connections the old one dies.

To be honest - In regards to the missed messages and order of events, I just cross my fingers and hope TCP does that for me.

In regards to the clustering, I have my map split up into sections, so most of the messages that needs to be send, are only to the people in that sector. So I rarely send a message to all players.

I'm still experimenting, so I'll probably still have a lot of edge cases that I'll have to cover. But for now it seem to work well.

jasonl99 · on Feb 13, 2017

I answered this on a different thread, but I've specifically designed things for this scenario. The framework has a class named WebObject. Once instantiated, it stays persisted on the server, and can take on additional subscribers.

At any time, "updates-applied" browser version of the current state of the object can be created with a call to the #content method.

So when a "updatable" occurrence occurs on the browser, you send an update method withe the changes. The changes are packaged up into messages that sent to each subscriber, where they modify the dom.

It's a bit of a reversal, probably, with how people use frameworks that dynamically change content. I make the assumption that the server's instance is the only keeper of object state. Changes made on clients either change object state or they don't; if they are not sure, they send an event back to the server which handles it.

A @WebObject can also have properties that are themselves WebObjects, and the current card game actually has three: ChatRoom, GameStats, and GameObserver.

This pushes the responsibility for rendering content directly where it belongs: on the object where the rendering occurred in the first place.

But it has the side benefit of keep object state, too. The CardGame doesn't have to worry about the state of the ChatRoom (though it can observer and send events to and from it).

The bottom line is that there are only two ways the browser rendering could be wrong: 1) Some packets were missed, or 2) The developer didn't send the correct events, or sent them in the wrong sequence.

The first case could be solved by creating a delivery confirmation layer over the objects.

The second case is probably a little more dicey. The more data contained in a single updatable chunk, the harder it is determine when state changed. That's helped tremendously by the idea of nesting WebObjects (A Room has CardGames which have Players and Games, which have Decks and Cards....each of which take care of updating themselves)

Matthias247 · on Feb 13, 2017

That's a good and important question. I think the solution depends on your application requirements. If all you need to do is get the updated current state in the browser then it's enough if the browser resubscribes after each connect. It would then get pushed the new current state and can display it. If you are not only interested in a current state but all state transitions (or events that describe them) then it gets harder, since you would need to store all the events somewhere on server side, and only remove them once a browser has acknowledged that they have been consumed. In such a model the difference to fetching the events via HTTP would probably not be too big.

mr_luc · on Feb 13, 2017

Yeah - if I had to handle requirements more towards the painful end of the scale I would probably say Agent + Event Store. Ie, ephemeral process(es) representing the user on the server side, with the ability to cache up to a certain amount of outgoing messages, respond to user's acks, and also to be wound down if the client hasn't consumed for too long. The Agent is a natural place for decisions about how, or if, to attempt to get the user 'caught up.'