Reliable UDP communication for merging events in a distributed system
I've just started working on a distributed multiplayer game. Having had very little experience in this area, I'm just going with designs that seem reasonable to me, which doesn't necessarily mean that they are. :)
At the moment I have a history-merging system: every event that occurs which affects the state of the system is logged and added to the end of a list. When a event comes in from the network which has happened, say at some point in the past (since the network has some delay), the log is used to roll back the system, apply this new event, and then the events that were "unapplied" are reapplied. I'm not sure how efficient this will end up being, but the idea seems to work.
I do realize that some events won't be mergeable: for example, if a later event arrives through the network, it may seem nonsensical until an earlier event arrives. My plan is to allow unmergeable events for a while, but if they stick around for too long, to abort with an error -- or eventually request a full system update from some other peer.
Anyway, having implemented at least the framework for this, it seemed like a bit of a waste to use TCP: as you can see, the design above by its nature handles out-of-order packets just fine. So I thought I would give UDP a try.
The trouble is that while the system can handle out-of-order packets, it can't deal with dropped ones. I need to have reliable UDP communication, so to speak. I've come up with an idea that I think will work, which seems to share some characteristics with the vector clock algorithm (Vector clock - Wikipedia, the free encyclopedia). Thoughts about its viability would be appreciated.
Remember that I have a distributed network here, so there might be a lot of peers connected to other peers: -- but in general, each peer is only connected to a small number of peers relative to the total number. Otherwise the number of connections would quickly grow out of hand. Anyway, the idea is this:
- Each peer (interconnected game client) will generate events, and send out these events with an increasing ID associated with each, to the peers it has a direct connection to.
- When a peer receives a packet from a neighbour, it records that this neighbour knows this packet for sure (for this must surely be the case, if it sent it). Then any neighbours which may not know about the packet for sure are forwarded this packet as well.
- Every once in a while (the delay is somewhat randomized to prevent network congestion), each peer sends to its neighbours the "solid history count" that it knows about for every peer in the network. This is the highest ID, for each peer, for which every packet numbered this ID or lower has been received. [I'm under the understanding that TCP uses something similar.]
- When these solid history updates are received, if some time has passed and a peer doesn't appear to have received a packet, the packet will be resent.
I think this will deal with the reliability issue. New packets are immediately sent on to all neighbours (this could lead to a fair amount of redundancy, but there's not much to be done about that); if an update from a neighbour reveals it hasn't received the packet after a while, it will be resent.
Hope that made sense. Just wondering if there are any thoughts about this design. (I've started implementing it, but made the mistake the first time around of unconsciously assuming a global packet numbering. Silly me.)