Low
level protocol
There isn't a project I've come along
that didn't invent a
proprietary protocol to transport data over e.g. a serial
line or
similar. I have yet to find the optimal protocol, i guess that's why
developers invent new ones, but I have some hints for people that wish
to invent their own.
If you wish to find some official
directions have a look at the OSI model,
or the SLIP
protocol.
This article will contain a lot of
ideas to play with and some recommendations.
Problem
Generalizing
we have a client symbol set including e.g. all the bytes
we want to transfer plus maybe a few extras for e.g. start
and end
of message. These
client symbols need to be transferred over a communication channel with
the ability to transfer another set of symbols, e.g. bytes only but no
meta-symbols, or just a subset of byte values like printable ASCII.
An example would be a serial line (RS232C
or similar) being able to transfer 8 bit bytes. Or maybe 7 bit bytes
excluding NUL that is consumed by the device driver and XON, XOFF that
are used for software handshake. It may also be e.g. a TCP/IP socket
or a parallel port. From a
software point of view we see a stream of bytes though theoretically
our bytes could be 16 bit or 9-bit or whatever.
Over the channel two or more conputer systems need to exchange
data in some kind of messages, e.g. commands + parameters or responses
containing data.
In
some cases not all byte codes may be transmitted (e.g. nulls or most
significant bit is consumed by underlying channel). In other
cases not
all codes need be transferred.
I assume the channel is bidirectional but
implementing a unidirectional protocols can be done in a similar way
only with less opportunity to recover from errors in transmission.
Usually the communication is a bottleneck, either bulk transfer speed
or responsiveness. Thus we want to make the protocol efficient in using
little overhead for communications. Also the protocol will be used on
embedded systems possibly with limited computing power so it shouldn't
require too much number crunching power.
Well
defined responsibilities
In
order to make a good protocol design i is important to look at the
requirements one at a time and separate the responsibilities. It is
easy to want a simple protocol and want to start coding immediately,
but in general you will risk implementing the same function in many
places and code will be unmanageable.
I
recommend to layer the communication structure. Separate at least the
message packaging from the data in the messages. The layer that sends
and receives the messages should not be considered with the data in the
messages and if there is some acknowledge of messages in the
package send/receive layer that should be handled in the same layer and
not have to inspect the packages.
I have chosen to divide the problem in to the following areas:
|
Message
separation
Somehow the messages need to be extracted from the stream of bytes.
There are two basic ways to do this:
- Encoded length of message
One or more bytes in
beginning of message is defined as length specifier. Thi sway
the
receiver knows how many bytes to read.
len=3 |
byte 0 |
byte 1 |
Byte 2 |
Encoded
length requires an error
free transmission to be useful. A single bit error may cause length to
be significantly misinterpreted and can consume several following
messages
and be very hard to re-sync. Imagine having a 32-bit message length and
the most significant bit is flipped. One obvious solution to this is of
course to have some error detection codes in the (fixed length) header
containing the length, like a header CRC used in IP protocol.
If
sync is lost for some reason it is difficult to re-sync the
communication stream. There is no way to identify when the next message
starts if count is the only indicator of message length. Possibly a
timeout can serve as framing to identify next message. Or vast numbers
of NULs causing many zeros in the current message plus a number of zero
length messages could do. Anyway we're approaching an inter-message
framing protocol.
A
special case of the latter is when the message type implicitly defines
the length of the message, e.g. when parameters to command 'A' are
always
4 bytes.
CmdA |
4 bytes data |
CmdB |
6 bytes data |
This
is a simple implementation but does not separate messages
from contents and each message must handle errors in transmissions.
Better separate the two and handle trasnmission errors in general and
command/parameter errors in a different level.
Here
we use a special code meaning that we are no tin a message.
Detection of this framing code indicates end of message but it i
sbeneficial to allow more than one framing codes in a row to e.g.
indicate both start and stop of the mesasge (filter out nois eon line).
framing |
message1 |
framing |
message2 |
framing |
Inter-message framing has the drawback that the symbol used
for framing cannot be used in the message but has to be replaced by
some other code. SLIP
protocol has a minimalistic implementation of this.
Using
both methods is usually not good since if there is a mismatch one
method must take priority over the other making the unprioritized just
a waste of bandwidth.
In general the inter-message framing
approch is preferred. A variant is to use a start marker
and a different end marker:
The benefit of this is that any symbols outside
START/STOP may be transmitted with a different protocol, e.g. as text.
This way you can have normal ASCII communication with e.g. a terminal
or a printer but symbols between START and END are considered valid for
a special protocol. Optimally the terminal/printer should be
insensitive
to the symbols between the markers but with a terminal a human will
usually be good at sorting it out. Or you have a software driver that
intercepts the messages and passes on the text.
One kind of framing may be a duration of idle
communication. This wastes no byte codes but may cost performance. Most
channels are buffered, often in hardware, and timeouts may need to
be long for idle time to be detected. Maybe an embedded system can
handle the tiing but a PC with an operating system cannot unless it is
a hard real time OS.
Actually with RS232C communications there are some options to send
symbols outside the normal 8-bit byte range:
- Idle time
- Break character
- Parity used as 9th bit
In
general these are difficult to use in the PC world so this article
assumes we have only a set of 8-bit bytes at hand. If working with
embedded to embedded communications feel free to experiment with these.
Next
>
|