RELP - The Reliable Event Logging Protocol

This is the specification for the reliable event logging protocol, called "RELP".

Version: 0.0.1
Date:  2008-03-19
Author: Rainer Gerhards

Copyright (C) 2008 by Rainer Gerhards and Adiscon GmbH. Released under the terms of the GNU FDL.

Use Cases

These uses cases are not really a part of the specification. I have added them to provide some insight into implementation specifics. These use cases may be removed (or moved to another part of the doc set) some time in the future. In the mean time, the serve a valuable purpose when thinking about protocol features.

High Latency Environment

We have a high latency environment that is otherwise quite reliable (satellite link). We have a high traffic load. To cover this, a large
transmission window has been selected (let's say 1000 messages). Now the client sends a burst of messages but then  has nothing to do and falls asleep. Then, the server starts sending acks. These acks are not taken off the wire by the client (as it is inactive - rsyslog case). Eventually, the tcp window fills. Thus the server can no longer send acks. This does not immediately pose any problems, as the server adds the ack frames to its internal send queue.

If the client comes out of hibernation, it receives all previous sent frames, freeing the server's tcp send window. Thus, the server can continue to send data from its send queue (and drain it). The client, having received the server's acks, will not stall because its own RELP window has been cleared by the acks.

The situation is  more complicated if the server intends to shut down while the client is in hibernation. To do so, the server usually sends a "adviseclose" command followed by a "close" command. However, neither of these commands can be sent because the client does not take them off the wire. The close operation is now stalled. The only option left to the server is an unconditional termination of the session. While everything that has already been sent to the client will be picked up by it when it comes out of hibernation, anything left in the server's send queue will be lost in this case. As such, acks are potentially lost. This will lead to message duplication as the client assumes that the frames unacked at time of force-close where not processed (there is no other safe assumption). Consequently, the client will re-send these frames in the next session.

A cure to this situation is to have the client concurrently listen to server requests, e.g. by running a receiver on a separate thread. The RELP protocol does not demand this behaviour. However, it can be used to solve the above server stall. Let's assume this is being done. Do we now have a sufficient guarantee that the server does not stall? Under normal conditions, it is a safe assumption that the client will be able to receive all frames sent by the server. So a buildup of server queues should, if at all, happen only for a short instant. However, if something goes wrong on the transmission line (especially in high-latency cases), there may be a somewhat extended period of time in which the server can not send acks (but only in a magnitude of a few seconds at most). If the server intends to shutdown during such a period, a short timeout may enable it to avoid a fore-shutdown. However, there are still cases thinkable where a force shutdown may be required. These are deemed to be highly unlikely.

It is the protocol implementer's choice if the slight less chance of a server force-shutdown justifies the addition of a background thread to a program that otherwise doesn't need one. From a protocol point of view, a force-shutdown is a valid operation. Also, while it causes potential message duplication, it can not cause message loss.