Algorithm for obtaining a self-synchronising M-ary code enabling data compression

7
Algorithm for obtaining a self-synchronising M-ary code enabling data compression M.D. Mirkovic, DSc Prof. I.S. Stojanovic, DSc Indexing terms: Algorithms, Codes and decoding, Mathematical techniques Abstract: An algorithm for obtaining a self- synchronising M-ary code (M ^ 2) enabling the compression of data from a stationary discrete memoryless source is proposed. After presenting the code algorithm, its properties are analysed and the implementation of the code is described. The code proposed is compared to the Huffman code with regard to the average code-word length, the possibility of self synchronisation and the com- plexity of hardware realisation. Although for certain sources the code proposed is equal or nearly equal to the Huffman code regarding data compression, in general it is less efficient. However, its property of being self synchronising, and its relatively simple hardware realisation, make this code valuable for practical applications. 1 Introduction Numerous schemes have been proposed for the reduction of redundancy at the source output, thus decreasing the average number of code symbols needed to represent a message. Variable-length codes are generally superior in this sense to the codes of fixed-length code words [1, 2]. Among them, the most efficient is the Huffman code [1, 2]. However, some other essential performances of this and other variable-length codes, such as the complexity of their implementation or the problem of synchro- nisation, including the reduced immunity to errors, are certainly disadvantageous when compared to the fixed- length codes. In attempting to overcome this problem, an M-ary code is proposed. Although this code is for certain sources equal or nearly equal to the Huffman code regarding average code-word length, it is in general less efficient. However, it is self synchronising and in addition its hardware realisation is remarkably simpler. These facts, together with the achieved data compression as argued in Section 3, present the valuable practical advan- tages of the code proposed. It is necessary to add that although most communica- tion channels are nowadays intended for binary signal Paper 5184E (Cl, C2), first received 5th February and in revised form 12th November 1986 Dr. Mirkovic is with the Elektronska Industrija Institut, ul. Batajniiki put 23, 11000 Belgrade, Yugoslavia Prof. Stojanovic is with the Faculty of Electrical Engineering, Belgrade University, Belgrade, Yugoslavia transmission, the M-ary systems begin to be more and more encountered in practice. As an example, different types of modems can be mentioned [3, 4, 5]. Therefore, the proposed M-ary code appears to be useful. 2 Construction of the code The proposed code is applied to the encoding of the output from a stationary discrete memoryless source of messages. All messages form a finite message ensemble containing D different messages. The algorithm used for the construction of the code, illustrated by an example in Appendix 8, can be described as follows: (i) All source messages u l5 u 2 , ..., u D are arranged in the list L in order of the decreasing probabilities P(Uj) of their occurrences, so that for the list L: L = {u l9 u 2 ,..., u D } the following relation holds: P(u 2 ) P(u D ) (1) (2) (ii) The list L is then divided into the sublists L l5 L 2 ,..., L z , the number of which is z = M - 1 (3) where M is the size of the coding alphabet. All messages from eqn. 1 are now classified in the z sublists according to the rule described under the step (iv) of this Section, so that the sublists can be written as follows: pl L t = {u u , u 2i , ..., u gi , ..., u pi } Lj= {iij,u 2J ,...,u g j,... t u q j} (4) L z = {u lz , u 2z , ..., u gz , ..., u qz } In this transcription, u gi is the gth message in the ith sublist and u pl , u p2 ,..., u pi represent the end messages in the first group of sublists G P = L 1 , ..., L u while u qj , u qju+ n> • • •' u qz > a r e t n e en< ^ messages in the second group of sublists G q = Lj•,, L j+1 ,..., L z . As it is seen, in the first group G p , each of the sublists contains an equal number of p messages, while in the second group G q this number is q. (iii) The number of messages in each sublist, p in the first group and q in the second, depends upon whether the ratio D/z is or not an integer. Two different cases can 112 IEE PROCEEDINGS, Vol. 134, Pt. E, No. 2, MARCH 1987

Transcript of Algorithm for obtaining a self-synchronising M-ary code enabling data compression

Page 1: Algorithm for obtaining a self-synchronising M-ary code enabling data compression

Algorithm for obtaining a self-synchronising M-arycode enabling data compression

M.D. Mirkovic, DScProf. I.S. Stojanovic, DSc

Indexing terms: Algorithms, Codes and decoding, Mathematical techniques

Abstract: An algorithm for obtaining a self-synchronising M-ary code (M ^ 2) enabling thecompression of data from a stationary discretememoryless source is proposed. After presentingthe code algorithm, its properties are analysed andthe implementation of the code is described. Thecode proposed is compared to the Huffman codewith regard to the average code-word length, thepossibility of self synchronisation and the com-plexity of hardware realisation. Although forcertain sources the code proposed is equal ornearly equal to the Huffman code regarding datacompression, in general it is less efficient.However, its property of being self synchronising,and its relatively simple hardware realisation,make this code valuable for practical applications.

1 Introduction

Numerous schemes have been proposed for the reductionof redundancy at the source output, thus decreasing theaverage number of code symbols needed to represent amessage. Variable-length codes are generally superior inthis sense to the codes of fixed-length code words [1, 2].Among them, the most efficient is the Huffman code[1, 2].

However, some other essential performances of thisand other variable-length codes, such as the complexityof their implementation or the problem of synchro-nisation, including the reduced immunity to errors, arecertainly disadvantageous when compared to the fixed-length codes.

In attempting to overcome this problem, an M-arycode is proposed. Although this code is for certainsources equal or nearly equal to the Huffman coderegarding average code-word length, it is in general lessefficient. However, it is self synchronising and in additionits hardware realisation is remarkably simpler. Thesefacts, together with the achieved data compression asargued in Section 3, present the valuable practical advan-tages of the code proposed.

It is necessary to add that although most communica-tion channels are nowadays intended for binary signal

Paper 5184E (Cl, C2), first received 5th February and in revised form12th November 1986Dr. Mirkovic is with the Elektronska Industrija Institut, ul. Batajniikiput 23, 11000 Belgrade, YugoslaviaProf. Stojanovic is with the Faculty of Electrical Engineering, BelgradeUniversity, Belgrade, Yugoslavia

transmission, the M-ary systems begin to be more andmore encountered in practice. As an example, differenttypes of modems can be mentioned [3, 4, 5]. Therefore,the proposed M-ary code appears to be useful.

2 Construction of the code

The proposed code is applied to the encoding of theoutput from a stationary discrete memoryless source ofmessages. All messages form a finite message ensemblecontaining D different messages.

The algorithm used for the construction of the code,illustrated by an example in Appendix 8, can be describedas follows:

(i) All source messages ul5 u2, . . . , uD are arranged inthe list L in order of the decreasing probabilities P(Uj) oftheir occurrences, so that for the list L:

L = {ul9 u 2 , . . . , uD}

the following relation holds:

P(u2) P(uD)

(1)

(2)

(ii) The list L is then divided into the sublists Ll5

L 2 , . . . , Lz, the number of which is

z = M - 1 (3)

where M is the size of the coding alphabet. All messagesfrom eqn. 1 are now classified in the z sublists accordingto the rule described under the step (iv) of this Section, sothat the sublists can be written as follows:

pl

Lt = {uu, u2i, ..., ugi, ..., upi}

Lj= {iij,u2J,...,ugj,...tuqj} ( 4 )

L z = { u l z , u 2 z , . . . , u g z , . . . , uqz}

In this transcription, ugi is the gth message in the ithsublist and upl, up2,..., upi represent the end messages inthe first group of sublists GP = L1, . . . , Lu while uqj,uqju+ n> • • •' uqz > a r e t n e en<^ messages in the second groupof sublists Gq = Lj•,, Lj+1,..., Lz. As it is seen, in the firstgroup Gp, each of the sublists contains an equal numberof p messages, while in the second group Gq this numberis q.

(iii) The number of messages in each sublist, p in thefirst group and q in the second, depends upon whetherthe ratio D/z is or not an integer. Two different cases can

112 IEE PROCEEDINGS, Vol. 134, Pt. E, No. 2, MARCH 1987

Page 2: Algorithm for obtaining a self-synchronising M-ary code enabling data compression

arise: (a) if the ratio D/z is an integer, then

D

z(5)

thus yielding only one group of sublists, and (b) if theratio D/z is not an integer then two groups of sublists, Gp

and Gq, appear. In the first group there will be i sublistseach containing p = \D/z~\ messages, while in the secondthere will be (z — i) sublists each containing q — \_D/z\messages, i being given as

—-Lf> (6)

In this consideration the assignment of integral values toq and q is ensured by use of the 'floor of x' function L*J,which is defined as the greatest integer less than or equalto x. Similarly, fxl is the smallest integer greater than orequal to x.

(iv) The rule governing the construction of the rthsublist from the ensemble of messages, eqn. 1, is given by

This means that the {kz + r)th message from the ensem-ble, eqn. 1, becomes the (k + l)th message in the sublistLr from eqn. 4. In eqn. 7, k represents the variable param-eter, which in each sublist takes all integral valuesbetween the following limits:

0 ^ k ^ p - 1 for 1 ^ r

O^k^q- 1 for j ^r(8)

This procedure of construction of sublists as described byeqns. 7 and 8 ensures that all messages in each sublist arearranged in order of the decreasing probabilities of theiroccurrences.

When the sublists, eqn. 4, are constructed as explainedin the four steps above, the messages are encoded usingthe prescribed coding alphabet consisting of as many dif-ferent symbols ar (r = 1, 2, . . . , z) as the number of sub-lists plus the symbol a0 common to all sublists. In thisway, according to eqn. 3, M different symbols form thecoding alphabet:

A = {a0, ax, a2, . . . , az} (9)

The construction of code words is very simple: thesymbols au a2, . . . , ar, . . . , a2, are assigned to the mes-sages at the first positions in the sublists Llt L2 , . . . ,Lr,..., Lz, respectively; the code words for the messagesat the second positions in the sublists are obtained byadding the symbol a0 in front of the symbol assigned tothe corresponding sublist, i.e. aoai, a0a2, . . . , aoaz; thecode words for the messages at the third positions areobtained by adding two symbols a0, i.e. aoaoau a0a0a2,..., aoaoar and so on.

In this way, it is seen that the last symbol ar in thecode word determines the sublist Lr which the messagebelongs to and the position of the message in this sublistis specified by the number of symbols a0 preceding thesymbol ar.

Hence, it can be concluded that each sublist of mes-sages Lr has its own subset of code words:

r = {ar, aoar, aoaoar, ... (10)

Evidently, any two subsets, say Kr and Ks for r ^ s aremutually exclusive.

3 Properties of the code

To assess the quality of the code proposed in the follow-ing, its relevant properties are discussed.

First, an inspection of subsets of code words given byeqn. 10 indicates that this code is the prefix conditioncode, for in any of these subsets no code word is a prefixof another code word [6]. Such a code has the propertyof being uniquely decodable since for each sourcesequence of finite length, the sequence of code symbolscorresponding to that source sequence is different fromthe sequence of code symbols corresponding to any othersource sequence.

In addition, it is necessary to say that the prefix condi-tion codes are distinguished from other uniquely decod-able codes by the end of a code word always beingrecognisable, so that decoding can be accomplishedwithout the delay of observing subsequent code words.For this reason, prefix condition codes are called instan-taneous codes.

To prove that the code proposed is uniquely decod-able, it is necessary that the Kraft inequality [6, 7]:

p. qM (ID

fc=i

applied to the subset Kr in eqn. 10, be satisfied.In this inequality, nkr is the length of the kth code

word in the subset Kr and M is the alphabet size. For thesubset Kr from eqn. 10, M = 2 and the upper summationlimit, whether p or q, is determined according to step (iv)from Section 2.

Now, if eqn. 11 is applied to 10, it becomes

p. 9(12)

thus showing that the code proposed is uniquely decod-able. Evidently, this has to be verified for all subsets, eqn.10, i.e. for 1 ^ r ^ z.

To determine how efficient the proposed code isregarding data compression, it is appropriate to compareit to the Huffman code as the most efficient one. For thispurpose, the average code-word length of the proposedcode is easily found from the consideration in Section 2[8] as

= I P(ur) + {p- q)ps = l r = z ( s -

D

I P(ur) (13)

Since an analytical relation between hk and the averagecode-word length of the Huffman code hH is not possibleto formulate in general, different sources have beenanalysed and the results obtained are presented in Tables1, 2 and 3.The first parameter in the first two Tables, the number ofthe source messages D, is 8 and 13, respectively, i.e. thesame as in Huffman's classical paper [2]. In Table 3, thisparameter is much greater and amounts to D = 24. Thesize of the coding alphabet is M = 2, 3 and 4. To charac-terise the message sources, the measure of roughness MRproposed by A. Sinkov [9], has been applied. This thirdparameter is defined as

MR = Y P(ud) -d=l D

(14)

where P(ud) is the probability of occurrence of themessage ud. From eqn. 14 it is easy to see that MR = 0for a source where message probabilities are uniformlydistributed. In other words, MR is a measure of how

IEE PROCEEDINGS, Vol. 134, Pt. E, No. 2, MARCH 1987 113

Page 3: Algorithm for obtaining a self-synchronising M-ary code enabling data compression

much a given distribution diverges from the uniform dis-tribution.

On the basis of the results presented in Tables 1 and 2,it can be concluded that for the constant D, the larger

Table 1: rik and riH for five different sources (0 = 8) and dif-ferent sizes of the alphabet M

D P,(i/rf) P2(ud) P3(ud) PA(ud) P5(ud)

8

MR

M = 2pH M = 3

M = 4M = 2

pk M = 3M = A

0.250.180.150.120.100.090.080.03

0.033

2.861.861.53

3.581.981.53

0.220.200.180.150.100.080.050.02

0.037

2.801.851.47

3.251.901.47

0.350.220.170.110.060.040.030.02

0.093

2.551.531.31

2.651.631.31

0.400.220.160.100.060.030.020.01

0.124

2.411.501.25

2.441.531.25

0.450.220.120.080.080.040.0250.005

0.153

2.311.491.24

2.321.491.24

Table 2: nk and riH for t w o dif ferent sources ( 0 = 13) anddif ferent sizes of the alphabet M

D PAud) P2(ud)

13

MR

M = 2riH M = 3

M = A

M = 2pk M = 3

M = 4

0.220.190.120.080.060.060.060.050.050.040.030.020.02

0.044

3.422.171.70

4.442.501.87

0.320.170.100.090.080.060.040.040.030.020.020.020.01

0.088

3.031.991.58

3.642.171.67

MR and M are, the closer hk is to hH. On the other hand,if D increases (Table 3) hk approaches hH more slowly forthe same M. From this Table, it can be concluded thatthe efficiency of the code proposed in the case of largesets (D = 24) increases too; however, significantly onlywhen the alphabet size is larger than 2, i.e. for M-arysystems. For the binary systems, M = 2, the averagecode-word lengths for the majority of sources are greaterthan in the case of the Huffman code. However, it is pos-sible to find sources for which the average code-wordlengths of the proposed code and the Huffman code areequal. As an example, the source with the probability ofoccurrence of messages satisfying the relation

(15)1

can be quoted.In conclusion, it can be said that the proposed code in

principle is less efficient than the Huffman code and towhat extent depends upon the source itself.

Another property of the code deserving to be con-sidered is its behaviour regarding the synchronisation ofthe code.

Bearing in mind the general form of a code word a0 a0

• • • ar, for 1 ^ r ^ z, it is immediately clear that this codeis the self-synchronising code, because each code word isalways recognisible by its end.

Table 3: nk and nH for the source (0of the alphabet M

24) and different sizes

P(ud)

24

MR

MriH M

M

Mnk M

M

= 2= 3= 4

= 2= 3= 4

0.1100.1010.0940.0900.0760.0760.0680.0610.0550.0480.0410.0360.0300.0250.0210.0160.0160.0150.0150.0020.0010.0010.0010.001

0.0287

4.1572.5942.653

6.8803.9832.653

This important feature also means that the code hasan increased immunity to transmission errors.

So, it could happen that, due to the transmissionimpairments, the symbol ar±e, instead of the symbol ar,is received. Then, the decoder output will result in anincorrect code word aoao • • • ar±e instead of the correctone a0 a0 • • • ar. However, the code-word length remainsunchanged, only one code word is wrong and the syn-chronisation is not lost.

Another type of error appears when instead of thesymbol a0, any other symbol a,- is received or, wheninstead of the end symbol ar the symbol a0 is received. Inthe first case, the decoder output will result in two incor-rect code words aoao • • • aJf a0 • • • ar instead of thecorrect one aoao • • • ar. In the latter, instead of twoadjacent code words aoao • • ar, aoao • • as, only onelonger, incorrect code word aoao • • • aoao ••• as willappear. However, in both these examples, as soon as theend symbol is received, the transmission regularly goeson.

The problem of code synchronisation and the assess-ment of expected synchronisation delay have been dis-cussed in detail in References 10 and 11. It has beenshown that a large number of the Huffman codes are selfsynchronising, although some of them do not possess thisproperty. The proposed code can be considered betterthan Huffman codes because the moment when the dis-turbance ends, only this disturbed code word is lost andimmediately after it the synchronisation is maintained.

Finally, the hardware realisation of the code should bementioned. From the consideration in Section 4, it is seen

114 IEE PROCEEDINGS, Vol. 134, Pt. E, No. 2, MARCH 1987

Page 4: Algorithm for obtaining a self-synchronising M-ary code enabling data compression

that the standard units are applied in the implementationof the coder. No problems, similar to those encounteredin the implementation of the Huffman code and indealing with the buffering between source and channel,can arise [12].

If microprocessors were applied, the hardware realis-ation would be easier for each code including theHuffman code. However, the fact that the hardwarerealisation of the Huffman code for M ^ 2" is particularlydifficult [13, 14] still remains.

When considering the hardware realisation ofvariable-length codes, the most delicate functions to beperformed are certainly the translation of the fixed-lengthcode words into variable ones and vice versa in the coderand the decoder, respectively, as well as the separation ofa group of symbols forming a code word in the decoder.Since the proposed code has self-fixing boundaries in thecode words, i.e. whenever a symbol which is not a0 isseen, it is the end of a code word, it is clear that there isno problem in carrying out the above functions.

Obviously, this is not the case with the Huffman code.

4 Implementation of the code

The implementation of the proposed code can beexplained through describing the realisation of the coderand the decoder. Their block diagrams are shown in Figs.1 and 3. The essential unit in them is the basic memoryBM, identical for both schemes.

All possible contents of messages represented in thebinary form of fixed code-word length are stored in thebasic memory BM. This memory is a modular memory inwhich the number of memory modules is equal to thenumber of sublists z as given by eqn. 4. The number ofregisters in each module Mr is the same as the number ofmessages in the corresponding sublist Lr. The length ofeach register is n, the total number of messages beingD = 2". The contents of the messages from every sublistare stored in the corresponding module in order of theirnonincreasing probabilities. Thus, the most probablemessage is found at the first address of the module andthe less probable at the last. In other words, the contentsof each memory module is a faithful replica of the corre-sponding sublist.

4.1 CoderThe encoding procedure begins by comparing the givenrandom output from the message source BSS (Fig. 1)simultaneously with the contents of the first addresses ofall memory modules of the BM. If the identity for someaddress is confirmed, say for example the qth from Mr,then the signal Sar of the duration 7^, representing thecorresponding symbol ar, is sent to the line from the gen-erator GSflr. If the identity is not found, then the signalSao of the duration 7^, representing the common symbola0, is transmitted from the generator GSao to the line.Further, the comparison is continued simultaneouslywith the second addresses of all memory modules duringthe second signalling interval of the duration 7^. Thisprocedure goes on until the moment when the identity isfound.

Evidently, in this way each message is determined bytwo parameters: one is the symbol ar designating thesublist Lr or the memory module Mr which the messagebelongs to and the second is the number of the commonsymbols a0 preceding the symbol ar and designating theaddress of the module Mr in which the message is stored.

In the signal space, (Fig. 2) each message point isdetermined by its signal coordinates Sao Sao Sao and

The role and the functioning of the individual blocksin Fig. 1 are common and do not deserve a particularexplanation except, perhaps, the memory-address registerMAR. Every time the contents of the outputs of theinput/output register plus decoder (IOR + D) and theBM are not the same, the state of the MAR is increasedfor one, thus enabling the comparison of the contents atthe next address in all memory modules in the BM. Onthe other hand, when the outputs of the IOR + D andthe BM coincide, the logic 'one' of the output register(OR) causes the erasure of the MAR, thus resetting thecomparison of the next message with the contents at thefirst addresses of the BM.

4.2 DecoderThe essential units of the decoder (Fig. 3) are the signaldetector SD, the binary counter of the number ofsymbols a0 with the decoder BCao, the input/outputregister IOR and the basic memory BM. The latter isidentical to the coder memory BM.

The signal detector has z + 1 outputs. When the signalof the prescribed waveform representing a correspondingcode symbol is received, the logic 1 appears at the corre-sponding output. This is also present at the inputs to theANDr (r = 1, 2, . . . , z) circuits, thus indicating thememory module Mr which the incoming message belongsto.

The address of this message in the Mr is determined bythe binary counter with the decoder BCao. It counts thenumber of symbols a0 between any two consecutivesymbols ar. When the next a0 appears, the logic 1 at theoutput of the BCao is shifted to the next output, thusspecifying the position of the input/output register IORat which the code word terminating the symbol a, is tobe placed. In this way, the AND circuits determine theaddress of the message represented by the received codeword.

The contents of the received message through the OR2

circuit and the register R are stored in the buffer memory.From this, the messages are delivered with the rate corre-sponding to the source rate.

5 Conclusion

An algorithm for obtaining a variable-length code is pro-posed. It is used for the encoding of the output from astationary memoryless discrete source containing a finiteset of messages. After analysing the properties of thecode, the following comments can be made.

The comparison between this code and the minimum-redundancy Huffman code has been chosen for the evalu-ation of the efficiency of the code regarding datacompression. Three groups of sources have beenanalysed: the first with 8, the second with 13, and thethird with 24 messages per source. Several message prob-ability distributions have been considered. For all sourcesthe average code-word lengths, for the Huffman hH andfor the proposed code hk, have been calculated. From theresults given in Tables 1 and 2, it can be concluded thatthe more message probabilities differ from the uniformdistribution, i.e. the larger MR, the closer hk approachesto hH. The same effect appears when the alphabet size Mis increased. Thus, for the source of the first group, theprobabilities of which are given in the second column ofTable 1, for the alphabet size M = 2, 3 and 4, the ratio of

1EE PROCEEDINGS, Vol. 134, Pt. E, No. 2, MARCH 1987 115

Page 5: Algorithm for obtaining a self-synchronising M-ary code enabling data compression

hH to hk is 0.86, 0.97 and 1.0, respectively. The similareffect is noticed for other examples in Table 1. However,this effect of approaching the efficiency of the Huffman

real transmission paths are considered, the differencehk — hH will be surely less than the theoretical value. Thereason stems from the fact that the code proposed is self

AND AND

sync.

to the transmission line

Fig. 1 Block diagram of the coder

IOR + DGS0,MARRDr

SPSRBSSBFMBMMrCLr

BCRCIG

= input/output register plus decoder= generator of signal representing the symbol a= memory address register= rth register with decoder= serial/parallel shift register= binary symbol source= buffer memory= basic memory= rth module= contents of sublist Lr

= binary counter« reversible counter= impulse generator

U l3

"12

"11

U 2 3

"22

"21

U 33

U 32

"31

U 43

"42

"41

Fig. 2 Signal space in which a message, say w32 is represented by thesignals SaStt, Sai, each of them having the duration Ts

code becomes weaker when the message set is larger,which is seen in Tables 2 and 3.

It is worth mentioning that the above has been dis-cussed for ideal transmission paths. However, when the

116

synchronising, while all Huffman codes do not possessthis property.

The second important point to be underlined is thefact that the proposed code is self synchronising. In addi-tion to the obvious advantages, such as the ease of codersynchronisation and the increased immunity from trans-mission errors, this fact has a recurrent effect when effi-ciency is considered. Namely, for non self-synchronisingcodes, additional information for synchronisation has tobe sent, thus prolonging the average code word. There-fore, this must be taken into account when making theabove comparison.

At last, the simplicity of the hardware realisationshould be added to the advantages the code possesses.

In this way, taking all together, the final conclusioncan be made that the proposed code is valuable for prac-tical applications.

6 Acknowledgment

The authors wish to thank Professor Erik Dagless andan unknown referee for their helpful suggestions inimproving the paper.

7 References

1 BHARATH, K.K.: 'A comparison of two source encoding schemes',IEEE Trans., 1978, COM-26, pp. 887-892

2 HUFFMAN, D.A.: 'A method for the construction of minimum-redundancy codes', Proc. Inst. Radio Eng., 1952, 40, (9), pp. 1098-1101

3 FIELDING, R.M., BERGER, H.L., and LOCHEAD, D.L.: 'Per-formance characterization of a high data rate MSK and QPSKchannel'. 3rd International Conference on Communications,Chicago, 1977, Vol. 1, pp. 42-46

IEE PROCEEDINGS, Vol. 134, Pt. E, No. 2, MARCH 1987

Page 6: Algorithm for obtaining a self-synchronising M-ary code enabling data compression

4 TOSHIO, M., HORIHIKO, M., and TOSHIHIKO, N.: 'Transmis-sion characteristics on M-ary coherent PSK signal via a cascade onN bandpass hard limiters', IEEE Trans., 1976, 24, (5), pp. 540-545

5 HUANG, T.S.: 'Easily implementable suboptimum runlengthcodes', IEEE International Conference on Communications, SanFrancisco, 1975, pp. 7.8.7-7.8.11

6 GALLAGER, R.G.: 'Information theory and reliable communica-tion' (John Wiley and Sons, Inc., 1968)

7 FANO, R.M.: 'Transmission of information' (MIT Press, 1961)8 MIRKOVIC, M.D.: 'An encoding method for reducing the neces-

sary bandwidth'. Doctorial thesis, Faculty of Electrical Engineering,University of Belgrade, 1982 (in Serbian)

9 SINK.OV, A.: 'Elementary cryptanalysis — a mathematicalapproach' (Random House, Inc., 1968)

10 FERGUSON, T.J.: 'Self-synchronizing Huffman coder', IEEETrans., 1984, IT-30, (4), pp. 687-693

11 MAXTED, J.C., and ROBINSON, J.P.: 'Error recovery for variablelength codes', ibid., 1985, IT-31, (6), pp. 794-801

12 GALLAGER, R.G.: 'Variations on a theme by Huffman', ibid., 1978,IT-24, (6), pp. 668-766

13 WELLS, M.: 'File compression using variable length encoding',Comput. J., 1972, 4, pp. 308-313

14 ABRAHAM, P.: 'A universal coding scheme for sources withunknown statistics'. 7th. Convention on Electrical and ElectronicEngineering, Tel-Aviv, Israel, 1971, pp. 40-50

8 Appendix

The construction of the proposed code, as explained inSection 2, can be illustrated by the following example.

Suppose to have a finite set of D = 7 messages withthe corresponding probabilities of their occurrences. Thesize of the coding alphabet is chosen to be M = 4.

According to step (i) from Section 2, all messagesshould be arranged in the list L in order of decreasingprobabilities of their occurrences, so that the list L is

= {ul5 u2, . . . , u7} (16)

for which the probabilities of occurrences satisfy the con-dition

Following step (ii) from Section 2, the number of sublistsz, according to eqn. 3 will be

z = M - 1 = 3. (18)

The number of messages in each sublist is determinedaccording to step (iii) of Section 2. Since D/z = | is not aninteger, two groups of sublists appear (case (b)): the firstone containing p = \D/z~\ = fll = 3 messages and thesecond one with q = lD/z] = [3J = 2 messages.

The number of the sublists in the first group will be

i = D- - z = 7 - 2 • 3 = 1 (19)

and those in the second group

j = z - i = 3 - 1 = 2 (20)

According to eqn. 4 from Section 2, the following sublistswill appear:

{«11»«21. «3l}

{"12. "22}

{"13»"23}

(21)

Now, following instructions given in Section 2, step (iv), itis possible to express any list r from eqn. 21 in terms ofmessages from the original set L given by eqn. 16.

According to eqns. 7 and 8 it will be: If 1 ^ r < i = 1,k is given b y O < / c ^ p — 1 = 2 , consequently for r = 1,k = 0, 1, 2. If j < r ^ z, i.e. if 2 ^ r ^ 3, k is given by0 ^k ^ q — 1 = 1, consequently for r = 2, k = 0, 1; forr = 3, fc = 0, 1. Now, according to the rule, expr. 7, corre-spondence between messages from list L given by eqn. 16and those given by eqn. (21) are

r = 1 u, -*u,i

(22)

(23)

(24)

The messages in the sublists obtained Lx, L2 and L3 arenow encoded using the prescribed alphabet consisting ofas many different symbols ar (r = 1, 2, 3) as the numberof sublists plus one symbol a0 as explained in Section 2.Following these instructions, the alphabet

A = {a0, al5 a2, a3} (25)

is used to obtain finally the coded sublists of messages:

t •>

1 — \^1> "0^1 ' "O^O^IJ

L2 = {a2,a0a2) (26)

L3 = {a3, aoa3}

" 4

" 7

"2

"5

"3

"6

- • " 2 1

- • " 3 1 J

- • " 1 2

- " 2 2 ^

- • " 1 3

- • " 2 3 J

Li = {"1,

^ 2 = { " 2 ,

^3 = ("3,

U 4 , U7}

"5}

" 6 )

IEE PROCEEDINGS, Vol. 134, Pt. E, No. 2, MARCH 1987 117

Page 7: Algorithm for obtaining a self-synchronising M-ary code enabling data compression

IEE P R O C E E D I N G S , V o l . 134, PC. E, N o . 2, M A R C H 1987