4. Error Correcting and Detecting Codes

4.1 Introduction

Information stored in a computer memory must eventually be processed in a program or communicated externally. During these operations it is possible that the information may be altered and hence corrupted. Binary information is distinct from analogue information in that it is generally more resistant to such degradation. Thus, because of ever present statistical signals termed noise and addressed fully in the companion course, it is true to say that all analogue information is in principle in error whenever it is propagated. In contrast, if the noise level is significantly less than the difference between levels distinguishing the binary states, ie a 0 from a 1, then no error in the digital information occurs at all. On the other hand it is also true that if because of a high noise transient or any other cause, a bit of information is altered, the effect may be catastrophic, since the significance of the bit in general depends upon the code. The general theory of errors is formulated for communication, with a symmetrically noisy channel being defined as one for which the probability of an error in which a 1 is converted to a 0 equals the probability that a 0 is converted to a 1, and is designated p. An error of order r is one in which r bits have been altered. In a word of N bits, the probability of an error of order r is

Clearly in practice systems are not used unless p<<1, and the probability of an error of order r decreases rapidly with r. For this reason, the case of one-bit errors is of prime importance.

4.2 General Strategy

Consider a word X={x_i,i=1,N}. Then define the word Y=E_j(X) to be {y_i,i=1,n} where

Thus E_j is an operator complementing the j^th bit of the word, inducing an error of order one. For convenience the operator when designated without a subscript will be taken as a generic error of order one operator. Note that the Hamming distance between the two words X and Y is one, since by necessity only one bit is different.

The general strategy is to decompose R_N into two sets according to

The first subset, L comprises the words of the code. Note that it is not possible to use the entire register set for the code. A word X belonging to L is referred to as a legal code word. All words not belonging to this subset comprise the rest of the register set and form the set of illegal code words. Operation on any member of L by E must produce a member of the illegal set for successful error detection. This is then accomplished by a criterion by which set membership is easily determined. An error correction code is more complex, requiring not only a signature for an illegal code word, but a method of establishing the legal code word from which it is generated.

4.3 Fundamental criteria

The minimum Hamming distance between pairs of words in L is referred to as the code distance d_min. Consider a code designed to detect errors of order one. Since the Hamming distance between a word X and the word E(X), generated by the occurrence of a first order error is one, then clearly the code distance must exceed one. This guarantees that for any X in L, written XdL, E(X) belongs to the complementary set and hence is identifiable as an illegal code word. This can easily be generalized. The Hamming distance between a word and one generated from it by an error of order n is n, since n bits have been altered. Thus if X is a legal code word, there exists a set of illegal code words at a Hamming distance of n. The criterion which must be satisfied for a code designed to detect errors of order n is d_min$n+1. Formally, if L is an error detecting code of order n, and XdL, YdL, then d(X,Y)$n+1.

A distance criterion also is in effect for error correcting codes. Consider two legal code words X and Y. Then at a distance n from each is a set of associated words generated by the occurrence of an error of order n. Now if sufficient space is reserved between X and Y then the subset of words X and E⁽ⁿ⁾(X), S(X) will not overlap that generated from Y, S(Y). In this case an unambiguous identification can be made. Any word not in L and in S(X) is an illegal code word arising from the legal code word X. Since the distances from each word which include members of the set are n, the total distance between X and Y must exceed 2n. The criterion for the design of an n^th order correcting code is d_min$2n+1.

4.4 Relation between Hamming distance and weight.

The Hamming distance between two words counts the number of corresponding bits which differ between two words. The distance is independent of the order in which the corresponding bits occur. The weight of a word is the number of bits which are in the 1 state and also is independent of the position of the bits. Consider the words X with w(X)=w₁ and w(Y)=w₂ where w₁>w₂. Two new words with the same weights and distance may be generated by permuting corresponding bits. In particular consider a set of permutations which alter X to a word with ones in the w₁ most significant bits.

The two limiting cases are illustrated in the diagram provided the sum of the weights is less than the word length. In (a) to each of the w₂ ones in Y there is a corresponding one in X. Therefore the remaining ones in X correspond to zeros in Y and hence each of these contributes one to the Hamming distance. Thus the minimum distance is w₁-w₂.

In (b) each of the ones in either word corresponds to a zero in the other and the maximum possible distance is w₁+w₂. If the sum of the weights exceeds the word length, N then this is no longer possible, since there can be no more than N-bits which differ between the words. In this case there must always be ones corresponding between the two words. This case is most easily resolved by recognizing that the argument can be applied to the zeros. That is for the maximum distance each of the zeros in either word corresponds to a one in the other so the maximum distance is 2N-(w₁+w₂).

Consider two words with distance d. The smallest change that can be made to one of the words which retains the weight is a change in two bits, a one to a zero and a zero to a one. Hence this will result in a change in the distance by two. Summarizing then, the possible distances between words of weights w₁ and w₂ satisfy

A special case requiring further consideration arises when the words have the same weight so that w₁=w₂. While it is certainly true that as indicated by Eqn(4.5) the minimum possible distance is zero, this case is a trivial one for it corresponds to the distance between a word and itself. A more meaningful result then is that the minimum distance between distinct words of the same weight is two.

4.5 Constant weight codes

From the above discussion, the subset of R_N consisting of all words of a given weight comprises a code with minimum distance 2. This automatically guarantees that such a code will be able to detect an error of order one, based upon the criterion for d_min. This result is also intuitively obvious since complementing a single bit must alter the weight by one. Thus the illegal words generated from members of the code subset automatically do not belong to the subset. The situation with errors of order two is ambiguous however. If the error results from complementing two bits in the same state, ie two zeros become one or vise versa, then the weight of the word is altered by two and the error will be detected. On the other hand, if the bits are originally in opposite states, the weight will not be changed, and the error will generate a new word which is an acceptable legal code word belonging to L. An example is the biquinary code used at one time by IBM employing 7-bit words of weight two. The code is shown in the following table.

Integer

X

0

01 00001

1

01 00010

2

01 00100

3

01 01000

4

01 10000

5

10 00001

6

10 00010

7

10 00100

8

10 01000

9

10 10000

An error of order 1 would change the weight to 1 or 3. An error of order two could change the weight to 0 or 4 or leave the weight unchanged. The latter case would not be detected. However, only about 5% of such errors lead to no change in weight.

This code is very inefficient in the sense that it only uses 10 of the 128 members of R₇, and of the 21 members of the subset of words of weight 2. The subsets for weights 3 or 4 each contain 35 members so the most efficient constant weight code would still only use under 30% of the register set.

4.6 Constant parity codes.

Parity, p(X) is an attribute of a word X derived from its weight w(X) as p(X)=w(X)mod2. Thus all words of even weight have p=0 and all words with odd weight have p=1. From the definition of weight the parity of X={x_i,i=1,N} is

Note that this means that the parity of a word is the XOR of all the bits in the word.

If E_N and O_N are the sets of all N-bit words with even and odd parity respectively, then R_N=E_NcO_N. Each of the subsets contains half the register set. Notice that each subset is comprised of words differing in weight by multiples of 2, ie 0,2,4 etc. Thus the minimum distance between words in a subset is 2 and either becomes suitable as a code for detecting errors of order 1.

Two possible codes thus exist, even parity with L=E_n or odd parity with L=O_N. Each of these has the same coding capacity as R_N-1. The requirement of fixed parity introduces a constraint so that only N-1 of the N bits may be varied independently. By convention these are taken to be the N-1 least significant bits and comprise the message bits. The most significant bit is referred to as the parity bit x₁=p. For the even parity code the value of p is determined by

Thus if the number of ones in the N-1 LSB is odd, p=1 guaranteeing an even weight. Similarly if the number of ones in the message is even, p=0 and the weight is even. Thus the code word for even parity is constructed by forming the code for the desired information using an N-1 bit word. To this is concatenated the parity bit p as MSB to form the N-bit word of even parity. The even parity code includes the word of weight 0, F. In some systems, such as punched paper tape, a zero is indistinguishable from no record. In these cases the odd parity code is preferred, since the corresponding word is the word of weight 1 with MSB=1. In general construction of odd parity code words follows the same scheme as for even parity with the sole exception that p is determined by the complement of the expression on the right side of Eqn(4.7).

4.7 Block Data

Since the code distance for parity based codes is 2, error correction is not possible on a word by word basis. Such correction may be extremely important in a control setting where each word might initiate an action. However if a large amount of data in the form of a sequence of words is being transferred, to be interpreted following completion of the transfer then a block data code can be constructed with some error correcting capability. In this case the data is transferred in blocks. The structure is X_j={x_ij,i=1,N},j=1,M where M is the number of words in the block. The MSB of each word x_1j,j=1,M shown shaded in the diagram is the parity bit. The word X_M is a special word called the longitudinal parity. For an even parity code, each bit of this word satisfies

The array may now be considered as forming words in either direction. Thus a row is an N-bit word with a parity bit while a column is an M-bit word with a parity bit. The message bits form an array. If the bit x_kl is in error then the parities of the k^th column and l^th will be odd. This uniquely identifies the array address of the culprit which may then be complemented to recover the correct message.

4.8 The Hamming Error Correcting Code

The Hamming code is capable of correction of errors of order one so the minimum distance of the subset must be 3 or greater. The code is based upon an extension of the parity concept to a set of bits of the word, the check bits. The code is designed so that the binary coded integer formed from the n check bits is either 0 if the word received is correct or is the position of the bit in error. Thus the check bits can address 2ⁿ-1 bit positions. If the word length is N, then the number of check bits must satisfy

For a word X={x_i,i=1,N} those bits for which i=2^r,r=0,1,2,3,... are check bits while the remaining bits are message bits. Note that unless N=2ⁿ-1, then there will be bits which cannot play a role in the code. For example for n=3, the Hamming code word involves 7 bits, 3 check bits and 4 message bits. In a byte oriented system the extra bit cannot be used in the code arrangement. In order to accommodate such cases the word must be divided into fields, and the Hamming code applied to the appropriate field. In this case the byte is decomposed into a 7-bit Hamming code field and a 1 bit field consisting of the LSB.

Consider first the trivial case of N=3 which requires n=2 check bits. Thus there is only one message bit, x₃. The only two words in R₃ with the required Hamming distance of 3 between them is F=000 and U=111, so these comprise L. Operation with E_i,i=1,2,3 on F or U produces a word of weight 1 or 2 respectively. The results are given in the table.

000

111

001

110

010

101

100

011

Clearly it is thus possible by inspection to correct for a one bit error since any word of weight 1 arose from the legal code word 000 and any word of weight 2 from 111. However, it is necessary to devise a generic algorithm which can be extended to larger n and N.

The check bits are designed to provide the address of the bit in error, 1,2 or 3 in this example. In constructing the legal code word they are predetermined in value by the nature of the message bits in general, although there is only one in this case. Consider the 2-bit word A=a₁a₂ where a₁=x₂r x₃,and a₂=x₁r x₃. For the legal code words A=00. For each other word in the table A forms the binary code for the address of the incorrect bit. For example consider 101. Then a₁=0r1=1 and a₂=1r1=0 giving A=10 and I₁₀(A)=2, the address of the incorrect bit. The two check bits act as parity bits for words formed from sections of the entire word. A 0 address is a signature for a legal word.

Consider next the case of n=3, N=7. The check bits are now x₁,x₂ and x₄. These are shown shaded in the figure. Also shown are those bits connected to a given check bit because their address contains the binary address of that check bit. For example address 5 is composed of 4+1 so x₅ is connected to x₄ and x₁ while address 6 is 4+2 so x₆ is connected to x₄ and x₂.

The address is now a three bit word A=a₁a₂a₃ formed according to

In constructing the word the check bits are set equal to the XOR of the bits to which they are connected guaranteeing an address of 0 for the correct word.

For example consider that the message 1101 is to be sent. The structure is shown in the figure.

Since bits 3,5, and 7 are all 1, then their XOR or mod2 sum is 1 so x₁=1

guaranteeing a₃=0 for the word sent.

Similarly the sequences of bits 3,6,7 and 5,6,7 each have an even number of ones and hence 0 mod2 sums so check bits x₂ and x₄ will be 0. The code word sent is thus 1010101. Now suppose in transmission the bit 3 is altered so the message received is 1000101. Now from Eqn(4.10) this bit appears only in the expressions for a₂ and a₃ each of which will now be 1, while the MSB address bit is unaltered since x₃ does not control its value. The word A will thus be 011 indicating the error address of 3. The same logic applies to any one bit error.

In the 7-bit case, 4 bits vary independently and 3 are then determined. Thus L is isomorphic with R₄ with 16 code words out of the 128 7-bit words. Note that an alteration of one bit in the message automatically alters at least 2 check bits so that the code distance is 3 in keeping with the general requirement. If two bits of the message are altered a non-zero address will be generated, so that the code can be used for detecting errors of order up to 2. Of course correction is not possible in this circumstance.

4.9 Cyclic codes

4.9.1 Polynomial Representation

Cyclic codes make use of a procedure also employed in other applications in which binary words or streams of bits in general are represented by polynomials. Let X={x_i,i=1,N}. Then a polynomial in the dummy variable z, p(z|X) may be defined as

For example if X=10101101, p(z|X)=1+z²+z⁴+z⁵+z⁷. The algebra used is based upon mod2 arithmetic. Thus if Y={y_i,i=1,N} then

Note that in particular

Multiplication by z^r is easily seen to both lengthen the word to N+r bits and to shift the word right by r-bits, so that the product represents a word created by concatenation of an R-bit word of all zeros to the left of the original word. Using the above example if p(z|A)=z³p(z|X)=z³+z⁵+z⁷+z⁸+z¹⁰ then A=00010101101.

4.9.2 Cyclic Codes

Consider as always subdivision of the set R_N into an N-bit code L_N and L_N, the set of illegal words. Cyclic codes use a rather esoteric definition of L_N. A generating polynomial G(z)is first defined. Then if p(z|X)=F(z|X)G(z) by definition X e L_N. The code thus comprises all words represented by polynomials which have the generating polynomial as a common factor.

The following typifies the usual procedure. Assume the N-bit word is decomposed into a check bit field comprising the K most significant bits and a message field consisting of the N-K least significant bits. A word Y is formed by concatenation of the K-bit word of weight zero with the message field. This word is represented by the polynomial

where the message bits are from x_K+1 to x_N.

A generating polynomial G(z) of order K is chosen. Then in general G(z) will not be a factor of the polynomial so

where X is defined as the word represented by the polynomial

where the second equality follows from Eqn(4.13). In other words division of the polynomial representing Y will in general leave a remainder polynomial. Adding this remainder to the polynomial produces a new polynomial for which G(z) is automatically a factor and hence a code word. Since the order of G(z) is K then the order of R(z) must be #K-1 and hence the word represented only has non-zero bits in the check bit field.

Consider the example of N=6 and K=2 with G(z)=1+z+z².Then the message field consists of the 4 least significant bits. Consider the message 1011. Then Y=001011 and p(z|Y)=z²+z⁴+z⁵. Long division gives p(z|Y)=(z+z³)G(z)+z so p(z|X)=z+z²+z⁴+z⁵ and the code word X is 011011.

4.9.3 Cyclic Redundancy Check

Consider the transmission of a word X represented by p(z|X). If an error in transmission occurs then the word recieved will be represented by the polynomial

where e(z) is the error polynomial. Thus unless e(z) is a multiple of G(z), q(z)/G(z) will exhibit a non-zero remainder, and q(z) will be detected as an illegal word.

Integer	X
0	01 00001
1	01 00010
2	01 00100
3	01 01000
4	01 10000
5	10 00001
6	10 00010
7	10 00100
8	10 01000
9	10 10000