A Single-Chip CMOS Speech Synthesis Chip

8/7/2019 A Single-Chip CMOS Speech Synthesis Chip

1/3

SESSION XIX: S P E E C H P R O C E S S I N G

FAM 19.3: A Single-Chip CMOS Speech Synthesis Chip*Kazuo Inoue, Kenji Wakabayashi, Yoshinobu Yoshikawa, Shigeaki Masuza wa, Kenji Sano and Seiji Kimura

Sharp Corp.

Nara, Japan

A CMOS SPEECH SYNTHESIZER LSI circuit, organized as aspecial purpose mic rocomputer containing program ROM, RAM,32K of speech dat a ROM and a D/A converter on a single chip,and supporting speech synthesis techniques, will be reporte d. Byusing data compression techniques' based on adaptive di fferenti alpulse code modulation ( ADPCM), the chip is able to generatehigh quality speech, reproducing the natural inflec tion and into-nation of the original speaker. It produces about 30 words ofspeech from 32K of inte rnal data ROM. Moreover, additiona lROM, up to a maximum of 128K, may be added without anyinterface circuits to increase the vocabulary.

Any voice, adult or child, male or female, can be synthesizedand the chip can also synthesize music. This system used com-prehensive da ta compression techniques, based on sampling andcoding of the speech signal at twice its highest frequency.

The waveform of unvoiced utterances is encoded using a zerocross technique with added amplitude information consisting oftwo bits per word. This variable amplitude has been shown t oprovide valuable auditory information.

The compression techniques of voiced utterances is based onthe subjective discarding of redundant speech informat ion.

Redundan t pitch periods and redundant phonemes are removedand representative pitch periods are extrac ted fro m successivewaveforms t o replace N similar periods.

To obtain large enough values of N, while still maintainingthe correct envelope, amplitude information and peak values areencoded separately and interpolated. This contributes to im-proved da ta compression and as a result the average value of Nmay be as large as 1 3 for voiced waveforms.

An adaptive differential pulse code modulation technique isemployed fo r the encoding of the representative pitch periods.Differences in amplitude between successive samples are encodedaccording to a 4b ADPCM rule. The quantitizi ng unit value foreach pitch period, which is also encoded in 4b, is selected tomaximize the signal-to-noise ratio between the original and th edecoded signal. Using this method a 25dB S / N ratio was ob-tained. Figure 1 shows the original and the regeneratedwaveforms using these techniques.

are stored in the 32K data ROM, Figure 2 shows the ROM dataformat for both voiced and unvoiced utterances.

A special purpose 8 b microcomputer was adopted for thespeech synthesizer, whose archite cture, instruction set, arit hmeticand address capabilities were functiona lly optimized to performthe above synthesis techniques.

The chip is self-contained and includes all of th e circuitsrequired to regenerate the voice signal on a single chip. A micro-

A distinction is made between voiced and unvoiced utteranc es.

The speech data condensed by these compression techniques

-

*Japanese Patent No. Tokukaisho 55-1 11 995.

compute r design is more useful than a hard-wired chip becauseit can be tailored to handle a variety of time domain data com-pression techniques such as PCM, DPCM, DM or more complexcompression techniques by the relatively simple procedure ofchanging the program.

Figure 3.

to the regenerating algorithm (tha t is, demodulation of theADPCM data, repeating of the representative pitch period,amplitude interpolation and demodulation of the zero-cross dataoccupies 1 K bytes of control program ROM area. The compres-sed speech dat a were stored in the 32K of da ta ROM.

The chip performs signal processing on the data in the 32KROM under the control of the control program a nd regeneratesthe speech signal.

has the ability t o perform fast data transfers and arithme ticoperations. To achieve these goals a 33-ins truction set using longbit instruct ions was adopted. As a result, speech sampled at 8kHzcould be generated using an 8ps instruc tion cycle time. The useof long bit length instructions which offer fast operation rathe rthan fast cycle time arebet ter from th e point of view of powerdissipation and chip design.

Almost all of the circuits except the RAM, 1 / 0 latch, DAC,

etc., have been designed as ratioless dynamic type CMOS; theseoccupy approximately 90% of the total chip area. This designmethod serves to reduce power dissipation and to minimize chipsize.

The chip can be brought into the standby mode during thenon-generation of speech b y a halt instruction. In this mode theoscillator and system clock signal are halted and the only powerdissipated is a very low leakage current. However, because of thei rstatic design, the RAM and latch circuits are held.

The chip has 8 terminals for cont rol or key input and 6 ter-minals for outpu t. Additionally, t he chip can be directlyconnected to up to 128K of externa l ROM.

An 8b D/A converter consisting of a register ladder networkbuilds the analog signal and feeds it to the preamplifier.

Table 1 summarizes the main featu res and performance of thechip.

The chip has been fabricat ed using metal-gate CMOS techno-logy and ab out 33,000 transistors are integrated in an area 5 .lmmby 5.01mm.

A block diagram of the synthesizer chip is shown i n

The contro l program which recreates the speech according

For this kind of data processing, i t is important that t he chi

Figure 4 shows a microphotograph of the chip.

Acknowledgments

The authors would like to acknowledge and thank K. Yamaof Osaka City University for his assistance in the development of thesynthesis algorithm. They also wish to thank K. Okano, CorporDirector and Group General Manager for his helpful advice o n thisproject.


2/3

[See page 337 for Figures 2, 3.3

ControlInstruction SetInstruction Cycle TimeROM Capacity

RAM Capacity110 Input Port

Output PortD l 0 PortAdditional ROM

8 bits oarallel35 iniiuctions8 ps (TYP)4KB. for sDeech dataI KB . for control programming24 B.8 terminals

6 terminals8 terminals16KB. max

D IA Converter 8 bits (Ladder Network)Sampling Frequency

Technology Metal gate OSNumber of Transistors -33,000Power Supply 2.7 'u 5.5 VPower Dissipation

of Speech 8 kH z (TYP)

Stand by Power 3 pW (at 3V )Operational Power 4.5 mW

Die SizePackage

TABLE 1-Summary of hardware features and performance.

5.10 x 5.01 mm48 pins flat package

. . ., iri i _ _ c - J -

Compressed waveform 4 repeats 8 repeatss lng zero crossing

v

9 repeats

(b ) Regenerated Waveform

FIGURE 1-An example of the original waveform andregenerated waveform using the speech synthesis techniques.

FIGURE 4-Microphotograph of the CMOS speech synthesischip.


3/3

(a) Unvoiced Data

AMPLITUDE INFORMATIONFO R ZERO CROSSING

NUMBEROF SAMPLEFOR CROSSINGI 1/ 0 l / O i / o 110 I

i 1~' 1 DATA FOR ZERO CROSSING

(b) Volced Data

INUMBER OF REPEATIENVELOP SLOPE, 1 FOR +

0 FOR -FIRST QUANTIZINGUNIT VALUE110 ( 1 )

I } 0. NOT INC. (FOR f )1 INCREASE IN QUANTIZING UNIT VALUE

j 1 ADPCM DATAJ

I '' (4 ) j ADPCM(4) j

FIGURE 2-Condensed speech data format in data ROM.

FIGURE 3-Block diagram of the CMOS speech synthesischip.

A Single-Chip CMOS Speech Synthesis Chip

Documents

Transcript of A Single-Chip CMOS Speech Synthesis Chip