Index-Based Selective Audio Encryption

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 3, APRIL 2010 215

Index-Based Selective Audio Encryptionfor Wireless Multimedia Sensor Networks

Honggang Wang, Member, IEEE, Michael Hempel, Member, IEEE, Dongming Peng, Member, IEEE,Wei Wang, Member, IEEE, Hamid Sharif, Senior Member, IEEE, and Hsiao-Hwa Chen, Fellow, IEEE

Abstract—Wireless multimedia sensor networks (WMSNs)support many acoustic applications for audio surveillance, animaltracking/vocalization, human health monitoring, etc. However,resource constraints in sensor networks (such as limited batterypower, bandwidth/computation capability, etc.) pose challengesfor the quality and security of audio data transmission and pro-cessing. The security is a critical issue since audio informationcan be accessed or even manipulated in WMSNs. In order toensure security, audio quality and energy efficiency, we proposean index-based selective audio encryption scheme for WMSNs.The scheme protects data transmissions by incorporating bothresource allocation and selective encryption based on modifieddiscrete cosine transform (MDCT). In this proposed scheme,the audio data importance is leveraged using the MDCT audioindex, and wireless audio data transmission proceeds with energyefficient selective encryption. The simulation results show that theproposed approach offers a significant gain in terms of energy ef-ficiency, encryption performance and audio transmission quality.

Index Terms—Audio streaming, modified discrete cosine trans-form, security, wireless sensor network.

I. INTRODUCTION

T HE advancements in wireless multimedia sensor networks(WMSNs) enable a large number of sensor-based audio

applications such as acoustic surveillance, animal trackingand health monitoring. There are several major challenges indeploying acoustic sensors in a wireless multimedia sensornetwork. Small sensors have resource constraints in terms ofmemory, computation capability, bandwidth availability, andbattery power. A good strategy is to reduce the amount ofstreaming audio data with the help of efficient audio codingschemes. Several audio codecs (such as MP3, MPEG2 AAC,MPEG4 AAC, TwinVQ, and Dolby AC3) have widely been

Manuscript received May 23, 2009; revised December 01, 2009. First pub-lished January 26, 2010; current version published March 17, 2010. This workwas supported in part by the U.S. National Science Foundation Grant for wire-less sensor networks research (No. 0707944) and in part by a Taiwan NationalScience Council Grant (No. NSC98-2219-E-006-011). The associate editor co-ordinating the review of this manuscript and approving it for publication wasDr. Qian Zhang.

H. Wang is with the Department of Electrical and Computer Engi-neering, University of Massachusetts, Dartmouth, MA 02747 USA (e-mail:[email protected]).

M. Hempel, D. Peng, and H. Sharif are with the Department of Computerand Electronics Engineering, University of Nebraska-Lincoln, Lincoln, NE68588-0417 USA (e-mail: [email protected]; [email protected];[email protected]).

W. Wang is with the Department of Electrical Engineering and ComputerScience, South Dakota State University, Brookings, SD 57007 USA (e-mail:[email protected]).

H.-H. Chen is with the Department of Engineering Science, National ChengKung University, Tainan City 701, Taiwan (e-mail: [email protected]).

Digital Object Identifier 10.1109/TMM.2010.2041102

used for digital audio encoding, and most of them are basedon modified discrete cosine transform (MDCT). However,none of those traditional audio codecs takes into account therequirements of WMSNs applications: security and resourceefficiency. In this paper, we propose an audio encoding schemethat addresses these concerns by identifying and encrypting themost important portions of the audio stream during compres-sion using a resource-efficient cross-layer design approach.

The error-prone wireless channel environment causes packetlosses, which degrade the audio quality at the receiver. Ourapproach to network resource allocation is to utilize the net-work resources as efficiently as possible, while at the same timemaximizing the quality and robustness of the received audiodata. Security is another critical issue for audio transmissionsin WMSNs, posing another design challenge in balancing theconstrained energy and computational resources for secure in-formation transmission. Our proposed selective encryption withunequal resource allocation for real-time audio streaming is akey technology to addressing jointly these challenges, namelytransmission quality, security, and energy efficiency.

The selective encryption scheme presented in this paperencrypts only the important audio data in order to achieveboth real-time performance and energy efficient transmissionin WMSNs. This work was motivated mainly by the followingobservations. First, it is not necessary to protect unimportantaudio data in a resource-limited WMSN due to the fact thatthe audio information could not be recovered completely evenif the non-encrypted portion were intercepted by a third party.Second, most traditional encryption algorithms, such as ad-vanced encryption standard (AES) algorithm, are too complexand may induce a severe delay in small sensor nodes. Forexample, the encryption time of each 128-bit block using theAES algorithm is about 1.8 ms on a MicaZ platform accordingto a recent study reported in [22]. In contrast, our selectiveencryption approach encrypts only the data containing the mostimportant information to significantly reduce the processingdelay and energy consumption. Since the encrypted data rep-resent the most important portion of the entire audio stream,it deserves additional protection in wireless transmission byemploying an unequal resource allocation scheme. It is alsonoted that traditional information security schemes such asencryption were designed at the application layer, and thequality-of-service (QoS) issues were handled at the link layer.To overcome the problems (such as excessive delay) associatedwith this separation in functionalities, the approach proposed inthis paper merges the information security and QoS schedulinginto one unified algorithm, offering a cross-layer solution forsecure audio transmission over WMSNs.

1520-9210/$26.00 © 2010 IEEE

216 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 3, APRIL 2010

TABLE INOTATIONS AND SYMBOLS USED IN THE ANALYSIS

The rest of this paper is outlined as follows. In Section IIwe conduct a literature review and highlight our contributionsin design of an energy-efficient selective encryption schemefor audio data transmission over WMSNs. Section III estab-lishes an optimization model for the selective audio encryptionwith unequal network resource allocation. Section IV detailsthe proposed approach for achieving high energy efficiencywhile still maintaining audio transmission quality and security.Section V shows the simulation results demonstrating theperformance gain in comparison with existing approaches ofwireless audio transmissions, followed by the conclusionspresented in Section VI. The major notations and symbols usedin this paper are listed in Table I.

II. LITERATURE REVIEW

All audio signal compression methods can be classifiedinto lossy and lossless compression schemes. For instance,“Shorten” was an early lossless codec format. The newerlossless codec schemes include Free Lossless Audio Codec(FLAC), Apple’s “Apple Lossless”, MPEG-4 ALS, Monkey’sAudio, and TTA. Some audio encoding algorithms feature acombination of a lossy encoding format and a lossless correc-tion algorithm, such as MPEG-4 SLS (Scalable to Lossless),WavPack, and OptimFROG DualStream. Lossy compres-sions are most popular for streaming media over the Internet,satellite and wireless networks, and can be achieved in bothtransform domain and time domain. In the time domain, thelinear predictive coding (LPC) used for speech encoding isa source-based encoding scheme. In the transform domain,MDCT determines what information in an audio signal isperceptually irrelevant, i.e., the information that is less sig-nificant to the human perception of sounds. The audibility ofspectral components is determined by calculating a maskingthreshold. The MDCT-based compression approach is a majorcompression scheme in the audio stream encoding and lays afoundation for many popular audio codecs such as MP3 andMPEG-4. Although there are many existing research activitiesabout the MDCT codec, the study of MDCT-based audio trans-mission over sensor networks is missing. In this paper, basedon our study on the MDCT based codec we aim at providing

an energy-efficient audio codec with security protection forwireless audio transmissions over WMSNs.

The audio quality degradation due to packet loss is a chal-lenging issue in secure audio transmission over error-pronewireless channels. In order to reduce the packet loss andimprove the audio transmission quality, the methods basedon forward error correction (FEC) [1] are widely used to addredundant bits in the packet so that erroneous bit informationcan be identified and corrected at the receiver, eliminating theneed to discard packets with bit errors and thus improving theoverall stream quality at the cost of significant increase in dataand energy overhead. The transmission repeating mechanism[3] is an effective approach to achieve a good audio qualitywith reduced complexity.

However, there is a lack of studies on unequal error protec-tion (UEP) to improve the audio quality and energy efficiencyfor generic audio codecs such as MDCT. In [11], a perceptuallycontrolled UEP scheme was proposed for transmitting audioover IP networks. In [12], the authors proposed to utilize ReedSolomon (RS) FEC codes to provide UEP for audio streams.The scheme includes two components. One is unequal frameprotection where more protection is provided for the header ofa frame. The second component is to assign more protectionto the most significant bits of quantized data through unequalsample protection. In [1] and [13], the authors considered a UEPor media-specific FEC scheme. Since digital audio streams ex-hibit non-uniform perceptual importance, UEP can be used formultimedia transmission over wireless channels. In [15], the au-thors proposed a collaborative transmission scheme based onmultiple path routing. The image transmission quality gains areachieved through unequal quality path selection and UEP on im-portant image regions (overlap region).

To address the security issues in wireless audio transmission,encryption methods can be employed. In this paper, we pro-pose a selective encryption with unequal resource allocation foraudio streaming over WMSNs in order to jointly achieve en-ergy efficiency, data quality and security performance. Insteadof encrypting the entire audio stream as normally does, our pro-posed scheme encrypts only a perceptually relevant fraction ofthe audio stream’s transform coefficients while the remainingportion is transmitted unprotected. This scheme makes encryp-tion feasible for power-constrained real-time multimedia ap-plications while still maintaining a robust security protection.The nature of selective encryption is to protect only a subset ofthe bit stream. In the literature, some partial encryption tech-niques have also been developed for encrypting MPEG com-pressed image and video efficiently [5]–[8]. In [5], the authorproposed a partial encryption method called “Aegis”, whichonly encrypts “I” frames for all MPEG group of pictures (GOP)in an MPEG video stream. In [6], the selective encryption onthe macro-blocks of MPEG video bit stream achieved a signif-icant processing power efficiency compared to the full encryp-tion approach. A partial encryption based on set partition in hier-archical tree (SPHIT) was proposed in [7] and [8] to reduce theencryption and decryption delay for communication networks.However, only very few studies of selective encryption wereconducted for compressed audio. In [9], the authors presenteda partial encryption of speech signals that were compressed by

WANG et al.: INDEX-BASED SELECTIVE AUDIO ENCRYPTION FOR WIRELESS MULTIMEDIA SENSOR NETWORKS 217

Fig. 1. � index MDCT transformation importance.

a widely used algebraic code-excited linear-prediction (CELP)technique using G.729 codecs. To the best of our knowledge noextensive investigation have been conducted about selective en-cryption specific to the audio stream over a WMSN which isconstrained by limited power and resource.

III. SELECTIVE AUDIO ENCRYPTION MODELING

A. MDCT-Based Audio Encryption With Index Importance

The MDCT [19] is a transformation algorithm based on DCTand is performed on consecutive blocks of a larger dataset wheresubsequent blocks are overlapped such that the second half of ablock coincides with the first half of the next block. This over-lapping feature helps to avoid the artifacts stemming from theblock boundaries found in traditional DCT and thus makes theMDCT especially attractive for audio signal compression appli-cations and codecs, such as MP3, AC-3, Ogg Vorbis, and AAC.The MDCT formula is expressed as

(1)

where indicates the transform coefficient which containsthe most important audio information in the transform domain.Qualitatively, the higher value of the index , the less signifi-cant information the transform coefficient contains. Therefore,as shown in Fig. 1, the index corresponds to the level of im-portance of MDCT. The value corresponds to the number ofsamples in each audio data block. Please refer to [19] for moredetails on MDCT computation.

B. Selective Encryption With Unequal Resource Allocation

The first step in our proposed selective encryption scheme isto identify the important bits that need to be encrypted withinthe MDCT audio stream. After the MDCT transformation is ap-plied to the audio samples, the transform coefficients can besorted according to the value of index . These coefficientsare arranged into different packets according to their impor-tance levels. Assume that the packet size is , and there are

audio samples. These transformationcoefficients are packetized into different packets, which are rep-resented by . The most important coeffi-cients are always grouped into the first priority packet, and less

Fig. 2. Resource allocation with selective encryption.

important coefficients are grouped into the subsequent packetswith less priority. For each audio packet, its importance orderis determined based on the transform coefficient index. The se-lective encryption scheme always chooses the first packetsto encrypt while additional network resources are allocated toprotect these packets. In addition, the packet header containsthe control information such as the start delimiter and the enddelimiter, etc. Without this information the decoder will be un-able to reassemble the packets and recover the audio stream.Therefore, the header of each packet includes important bitsthat should be necessarily encrypted and protected in transmis-sion. Fig. 2 shows a cross-layer framework for the MDCT-basedaudio stream transmission over WMSNs. Since the encryptedpackets contain only important coefficients, it is critical to pro-tect the transmission of these packets in error-prone wirelesschannels. In the proposed architecture, we model the energyconsumption at the link layer, where four resource factors in-cluding transmission rate, power control, maximum retransmis-sion retry limit, and frame size are considered as shown in thefollowing equation, or

(2)

A related study was presented in our previous work [9]. Hereis a short summary equation given in (3) that shows the detailrelationship for energy consumption modeling, or

(3)

where is the energy consumption, the channel state, thetransmission power, the receiver power, the frame size,the header size, the symbol rate, the constellation size,the frame error rate, and the maximum retransmissionretry limit.

Considering the distortion reduction for each MDCT trans-form coefficient index, we form a quality performance metricfor the overall audio transmission. The following expression


shown in (4) gives the expected audio stream distortion reduc-tion at the receiver, or

(4)

where is the expected audio distortion reduction when re-source allocation strategy is employed, whichaffects the packet error probability. is the distortion reduc-tion brought in by the source bytes in the th audio packet.denotes the packet error probability under certain channel con-ditions when appropriate network resources are allocated to pro-tect the th packet. is set to one to indicate the end ofthe bit stream. For more related studies, please refer to the worksgiven in [15] and [23].

To reduce the encryption overhead, we must consider threefactors that are related directly to the encryption performance,including how many bits to be encrypted , the unit-blockencryption time , and the encryption block size . Theseparameters are related via the equation given as follows:

(5)

Different encryption block sizes result in different processingdelays. Therefore, choosing the appropriate encryption blocksize can improve the real-time performance. The encryptiondelay is also related to the total amount of bits to be encrypted.The proposed selective encryption approach only selects aminimum amount of data for encryption, and these data areof higher importance in the MDCT audio stream such thatthe entire stream is properly protected even though the otherparts of the stream data are unencrypted. Selective encryptiondoes not directly affect the overall network delay performanceand real-time performance improvement. However, in wirelesssensor networks, since sensors with different embedded hard-ware platforms can provide diverse computational capabilities,running different encryption algorithms on these platformscan lead to varying computational delays. For example, asdescribed in [28], the traditional RC4 algorithm takes 344 sto encrypt a block on the Atmega103 processor, while it onlytakes 10 s on the StrongARM processor. The total networkdelay includes coding/decoding time, encrytion/decrption time,queue/dequeue time, MAC access time, etc. The encryptiontime can become a non-trivial factor impacting the wholenetwork performance when the encryption time is at the sameorder as other major delays such as transmission delay. Forexample, Crossbow Micaz [30] sensors can support maximum250 kbps transmission rate, and the transmission time for a 64bytes block can be estimated roughly 2.1 s. In this case, theencryption delay should be considered for improving networkperformance. As we mentioned in [29], a tradeoff between theencryption time and network transmission time exists whendata transmission cannot be started before the time-consumingencryption computation has been completed. This is true insome advanced block algorithms (e.g., 128-bit block for AES).Especially, for selectively encrypted audio transmissions overWMSNs, the encryption and transmission of selected datablocks must be sequential and the total delay time is the sum of

Fig. 3. Selective encryption performance for animal audio class. (a). Encryp-tion with only 0–5 coefficient. Correlation: 0.9372 (Time) 0.9393 (frequency).(b). Encryption with only 0–10 coefficient. Correlation: 0.9233 (Time) 0.4025(frequency). (c). Encryption with only 0–40 coefficient. Correlation: 0.2786(Time) 0.4025 (frequency). (d). Encryption with only 0–100 coefficient. Cor-relation: 0.0544 (Time) 0.1036 (frequency).

delays in each process. Generally, selective encryption achieveshigh audio security while producing less encryption overheads.In this study, since the delay is not the major concern inWMSNs, it is not included in the quality optimization problem.

IV. PROPOSED SELECTIVE ENCRYPTION AND OPTIMIZATION

A. Quality Driven Energy Efficient Audio Encryption

Fig. 3 shows the difference between the selective encryptionand non-selective encryption approach for MDCT codecs.In the selective encryption scheme, only the most importantcoefficients are encrypted based on the proposed MDCT datapartitioning. Additional communication resources are allo-cated to protect these critical encrypted portions of an audiostream. Thus, a secure audio transmission quality maximizationproblem can be formulated as follows:

(6)subject to

in which the scheme takes the first coefficients from eachMDCT transform coefficient set and transforms them into thepacket set . The tuplerepresents the resource allocation strategy, which considers thetransmission rate control, power control, maximum retransmis-sion limit, and adaptive packet size. The packet size could alsoaffect the number of the packets in the set ,which then impacts on the number of packets that are neededto be protected. Once is determined, the selective encryption


scheme encrypts only the packets that contain those first co-efficients in the MDCT stream. The scheme then allocates extraresources to protect these encrypted packets. The value of af-fects both the encryption and communication energy consump-tion performance.

In this proposed approach, the energy efficiency is achievedmainly by the savings in communication energy through an ef-ficient unequal resource allocation. The key in this approachis to invest more energy resources to protect the transmissionof important audio components under a given limited resourcebudget. Our previous studies [9], [15]–[17] showed that an ef-fective unequal resource allocation strategy leveraging trans-mission rate control, power adaptation, frame size and ARQre-transmission control can improve both energy efficiency andtransmission quality significantly for image/video transmissionover WMSNs. We exploit the concept of selective encryptionto identify the importance of coefficients in the audio streamas a separating mechanism for enabling unequal resource allo-cation, such that it assigns more resources to the data of higherimportance and less resources to the remaining information. Thetraditional encryption approaches were developed for textualdata, which considered neither unequal importance nature oflarge-sized multimedia data, nor the requirements for real-timetransmissions. Both software and hardware implementations ofthese traditional approaches can cause significant delays. In ad-dition, the audio data may include noise and other non-percep-tual components that are irrelevant to accurately recreation ofthe audio stream. Leaving these coefficients unencrypted doesnot compromise the overall information security. A similar per-formance study on the image transmission was performed inour previous work [14], in which the Position (P) and Value(V) information of an image were distinguished, and additionalresources were allocated to protect the P information if com-pared to the V information due to its reduced importance. In[14], under the same energy consumption budget, about 6 dB ofimage quality improvement could be achieved. Therefore, it isimportant to include the unequal resource allocation and the se-lective encryption in the cross-layer design such that both mul-timedia quality and security can be assured for resource-limitedWMSNs.

In this study, once the important components of multimediadata are determined through the optimization in (6), the selectiveencryption encrypts only the first coefficients of each audioframe rather than the entire set of coefficients, resulting in sig-nificantly reduced computation overhead and latency. These en-crypted coefficients are then securely protected using unequalresource allocation. The innovation of the proposed approachis to integrate selective encryption and unequal resource alloca-tion (URA) into a unified cross-layer scheme in order to jointlyachieve energy efficiency, security performance and quality im-provement. The contribution from unequal resource allocationsuggested in this work is the reduction of packet errors for theimportant multimedia packets and its ability to conserve re-sources for processing less important data. With the successfuldelivery of more important packets, transmitted audio qualityis assured, even if some of the less important packets are lost.Thus, under the same quality requirements, a reduction in en-ergy cost is achieved.

Communication energy cost in sensor networks is much moreimportant than the computation energy cost. The proposed ap-proach therefore takes the advantages of this fact by increasinga little bit computation complexity in implementing selectivecompression and encryption to achieve a significant reductionin communication energy cost. The fundamental contributionof this research is to exploit the interrelation between the audioprocessing, resource allocation, and security protection suchthat the overall energy efficiency, security and audio qualityexpectation are achieved. A comparison between the proposedapproach (selective encryption and unequal resource allocation)and the traditional approach (full encryption and equal resourceallocation), as shown in Section VI, for multiple audio streamsis conducted. It demonstrates that the proposed approach canachieve a significant gain in audio quality under the sameenergy consumption budget.

B. Proposed Selective Encryption Algorithm With UnequalResource Allocation

In order to tackle the issue on transmission quality assurancewith consideration of selectively encrypted streaming and en-ergy efficiency, we develop a low-complexity genetic optimiza-tion algorithm as given by the pseudo code in Table II. In thisproposed algorithm, the length of important coefficients needsto be determined. This in turn depends on the distortion re-duction of each sample block and the availability of networkresources.

In the proposed algorithm, distortion reduction of audioframes is pre-calculated based on input audio steam. An energyconsumption budget is constrained for the cross layeroptimization. The output of this algorithm includes the optimalsubset size for MDCT coefficients to be encrypted and theoptimal resource allocation parameters of each audio frame. Itworks in three phases: source coding, cross-layer optimizationof joint selective encryption with unequal resource allocation,and the encrypted transmission. The source coding, includingMDCT-based transformation, is conducted at the applicationlayer, where 256 coefficients are organized in a sequential orderbased on the indexed value. In the cross-layer optimizationphase, a specified genetic algorithm is developed to solve thisoptimization problem efficiently. The fitness function ofthe genetic algorithm is defined as the audio quality expectation

for each chromosome for audio quality maximization. Theadvantage of this genetic algorithm is that it works effectivelyfor the constrained discrete optimization problem and it avoidssuboptimal results that may yield in many other optimizationalgorithms such as gradient search [26] or stochastic local search[27]. This cross-layer optimization is frame-based and oper-ates on a limited set of input cross-layer parameters for eachframe. The complexity of this algorithm is not an issue. At theencryption and transmission phase, once the indexed length ofeach major component is determined based on the optimizationoutputs from the second phase, encryption algorithms such asadvanced encryption standard (AES) can be applied to encryptonly the first MDCT coefficients after the compression. Theencryption block length can either be a fixed length or be adjusteddynamically based on measured/desired network performanceparameters such as delay. For more related studies on the


TABLE IIPSEUDO CODE OF SELECTIVE ENCRYPTION WITH UNEQUAL RESOURCE ALLOCATION FOR ENERGY EFFICIENT AUDIO TRANSMISSION IN WMSNS

adaptation between encryption and transmission delay, pleaserefer to our previous work [14]. The cipher key exchange anddistribution in this algorithm follows the standard approach ofAES. Additional network resources are allocated to protect theirtransmission based on URA. Control information is exchangedin a standard cross-layer manner. For example, the power andtransmission rate are set through the control information bits inthe frame header at the physical layer.

V. SIMULATION AND EXPERIMENT RESULTS

We used well-known T-MAC [20] parameters of sensornetworks for transmission energy optimization. In T-MAC,data packets in TinyOS have a MAC header of 11 bytes. Thelength for control packets such as RTS and ACK is 13 bytes.The preamble length is 18 bytes. A CTS packet is 15 bytes. Thetransmission of the control packets uses the basic modulationscheme, while the transmission of DATA packets utilizes thescaled modulation schemes. An energy model proposed inour previous research on wireless multimedia sensor networks[16]–[18] was employed to calculate the optimal energy con-sumption under certain channel conditions.

We conducted the security performance evaluation based on120 typical audio recording files, organized into six categories:animal, alarm, people, music, movies, and chimes. Each cate-gory contains 20 different subtype files. The security level ofthe encryption method depends on actually employed encryp-tion method. However, the selective encryption only encrypts

major components of audio data, leaving non-important dataunencrypted. Even if eavesdroppers intercept the audio trans-mission, the information is protected since they can only accessthe non-encrypted portion which does not contain any signifi-cant content. The security performance for selective encryptiondepends on which portion of the content is encrypted. Choosingto encrypt inappropriate coefficients or less important compo-nents to leave important components unencrypted will disclosethe critical information of the audio stream. To evaluate the se-curity performance, it is important to evaluate the content dis-similarity between the original audio and encrypted audio. Ifthe encrypted audio created by the selective encryption is sig-nificantly different from the original one, the selective method isconsidered to offer a similar security level as the full encryption.A significant difference indicates that the original audio con-tent has been hidden successfully by this selective encryptionmethod. The attackers cannot infer anything about the originalaudio from the selectively encrypted data without a decryptionkey. To measure the dissimilarity and security performance, weuse the correlation in both time and frequency domains as theperformance metrics in this study, showing that the selective en-cryption has a performance comparable to the full encryptionschemes. The correlation between the encrypted audioand the original audio is calculated as

(7)


TABLE IIICORRELATION RESULTS FOR THE ORIGINAL AND ENCRYPTED AUDIO FILES

where is the covariance, and are the vari-ances of and , respectively. takes a value between zeroand one, indicating the similarity between and . Basically, asmaller correlation value represents less similarity between theaudio signals and , and thus a higher security performance.

We conducted the simulations based on the animal typerecorded audio files. The results are presented in Fig. 3.Fig. 3(a) shows the correlation between the original audio anddecoded audio in both time and frequency domains when thefirst six indexed coefficients are encrypted. In this Figure, thecorrelation is high in both time and frequency domains. How-ever, as the number of encrypted index coefficients increases,the correlation between the two signals is significantly reduced.As shown in Fig. 3(b), when the first 11 coefficients are en-crypted, the correlation in the frequency domain is 0.4025,which is close to the threshold 0.35 that was obtained experi-mentally to provide a level of security comparable to the fullencryption, resulting in unrecognizable audio content. With thefirst 41 coefficients encrypted, the correlation is further reducedto 0.2786 in the time domain and 0.4025 in the frequencydomain. When 101 out of 256 coefficients are encrypted, bothcorrelations in the time (0.0544) and frequency (0.1036) do-mains are significantly lower than the threshold. As a result, theaudio information has been successfully protected. The resultsfrom our experiments show the efficiency of the selectiveencryption and the unequal importance of MDCT coefficientsfor audio streaming. It proves that encrypting only importantsubset of the coefficient can achieve the same level of securityas full encryption. To further demonstrate the effectiveness ofthe proposed approach, we have conducted experiments on alarge collection of audio files categorized into multiple types of

Fig. 4. Selective encryption performance results (top) in the time domain and(bottom) in the frequency domain.

audio (i.e., animal, alarm, people, music, movies, and chimes).The average correlation values in the time and frequencydomains of these different audio classes with varying degreesof selective encryption are reported in Table III. Fig. 4(a)shows that the correlation values between the signals in thetime domain decrease as the number of encrypted coefficientsincreases. The characteristics of each class of audio samplesaffect the shape of these curves due to their differences inthe percentage of important coefficients. When we selectivelyencrypt the first 80 coefficients approximately within eachblock of MDCT-transformed audio samples, we can observethat the resulting average correlation for all types of audio havefallen below the threshold to achieve a required security level.Our results clearly demonstrate that the selective encryptioncan accomplish effective encryption and information securityin all different scenarios. Also, to satisfy a correlation thresholdless than 0.35, the number of required encrypted coefficients isdifferent for different types of audio, as shown in Fig. 4. Forexample, encrypting about 40 coefficients for human speechis typically sufficient to achieve information security, whilefor the chimes audio types about 80 coefficients are requiredfor encryption. This further demonstrates that the optimal setsize of important audio coefficients to be encrypted is relatedclosely to the desired level of security.

We also compared the proposed approach with the traditionalapproaches using equal resource allocation (ERA) based on fullencryption of the entire coefficient set. First, we compared theperformance of the two schemes for alarm class audio files. Theresults are shown in Fig. 5. As we can see, the proposed ap-proach achieves a better audio quality than the traditional ap-proach under the same energy consumption budget. Because theproposed approach correctly identifies the first 30 coefficientsas important information, a significant energy saving can beachieved. Thus, it in turn can be leveraged for protecting thoseimportant audio components and to improve the audio quality


Fig. 5. Energy consumption versus audio quality (alarm class recorded file).

Fig. 6. Consumption versus audio quality (based on 120 recorded audio files).

significantly with a given energy budget. Similarly, our schemecan offer a significant energy saving under the same quality re-quirements, and hence extend the sensor network lifetime sig-nificantly. For example, as shown in Fig. 5, with a 35 dB audioquality requirement, the proposed approach can achieve 2.35 mJenergy saving per transmission, representing an over 17% im-provement.

To perform a more comprehensive study of the performance,we conducted experiments on all waveforms for six classesof audio data. The average quality improvement for differentenergy consumption budgets is depicted in Fig. 6. Comparedwith the traditional approach, the new approach offers a signifi-cant quality improvement, given the same energy consumptionbudget for each packet. As shown in Fig. 6, with an energyconsumption budget of at least 12.5 mJ per transmission, wecan achieve a significant audio quality improvement. Similarly,we can also see that on the average the proposed schemeprovides energy saving, with a given desired audio quality. Forexample, Fig. 6shows that the proposed approach achieves atleast 4.5 mJ reduction in energy saving per transmission whenthe audio quality lower bound is 25 dB.

VI. CONCLUSION

In this paper, we proposed a selective encryption approachwith unequal network resource allocation for MDCT-basedaudio streaming in WMSNs. The proposed approach identifiesand then encrypts important portions of the MDCT coefficientdata, and it allocates more network resources to protect the en-crypted audio data transmission to optimize energy efficiency,audio transmission quality and security performance jointly.The major contributions in this approach are summarized asfollows. 1) The proposed selective encryption scheme dif-ferentiates important audio information from less significantaudio information. The important information is encryptedso that the audio security is protected against intercepters oreavesdroppers in the network. 2) With the utilization of unequalnetwork resource allocation, encrypted important MDCT audioinformation is well protected from packet losses. The analyticaland simulation results demonstrated that the proposed selectiveencryption approach with unequal resource allocation not onlyimproves the network real-time performance and computationalefficiency, but also reduces the energy consumption under thesame audio transmission quality.

REFERENCES

[1] C. A. Khalifeh and H. Yousefi’zadeh, “An optimal UEP scheme ofaudio transmission over MIMO wireless links,” in Proc. Wireless Com-munications and Networking Conf., 2008, Mar. 31–Apr. 3, 2008, pp.3191–3196.

[2] Y. Wang, A. Ahmaniemi, D. Isherwood, and W. Huang, “Con-tent-based UEP: A new scheme for packet loss recovery in musicstreaming,” in Proc. 11th ACM Int. Conf. Multimedia.

[3] C. Perkins, O. Hodso, and V. Hardman, “A survey of packet loss re-covery techniques for streaming audio,” IEEE Netw., vol. 12, no. 5, pp.40–48, Sep./Oct. 1998.

[4] B. W. Wah, X. Su, and D. Lin, “A survey of error concealment schemesfor real-time audio and video transmissions over the internet,” in Proc.IEEE Int. Symp. Multimedia Software Engineering, Taipei, Taiwan,Dec. 2000, pp. 17–24.

[5] G. A. Spanos and T. B. Maples, “Security for real-time MPEG com-pressed video in distributed multimedia applications,” in Proc. Com-puters Communications, Mar. 1996, pp. 72–78.

[6] A. M. Alattar and G. I. Al-Regib, “Evaluation of selective encryptiontechniques for secure transmission of MPEG-compressed bit-streams,”in Proc. Symp. Circuits Systems, Jun. 1999, vol. 4, pp. 340–343.

[7] H. C. H. Cheng, “Partial encryption for image and video communica-tion,” M.S. thesis, Univ. Alberta, Edmonton, AB, Canada, 1998.

[8] H. Cheng and X. Li, “Partial encryption of compressed images andvideos,” IEEE Trans. Signal Process., vol. 48, no. 8, pp. 2439–2451,Aug. 2000.

[9] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “Energy-constrained distortion reduction optimization for wavelet-based codedimage transmission in wireless sensor networks,” IEEE Trans. Multi-media, vol. 10, no. 6, pp. 1169–1180, Oct. 2008.

[10] A. Servetti and J. C. De Martin, “Perception-based partial encryptionof compressed speech,” IEEE Trans. Speech Audio Process., vol. 10,no. 8, pp. 637–643, Nov. 2002.

[11] E. Hellerud, J. E. Voldhaug, and U. P. Svensson, “Perceptually con-trolled error protection for audio streaming over IP networks,” in Proc.IEEEICDT, 2006, p. 30.

[12] C. W. Yung, H. F. Fu, C. Y. Tsui, R. S. Cheng, and D. George, “Unequalerror protection for wireless transmission of MPEG audio,” in Proc.IEEE ISCAS, Jul. 1999, pp. 342–345.

[13] D. Sinha and C.-E. W. Sundberg, “Unequal error protection methodsfor perceptual audio coders,” in Proc. IEEE Int. Conf. Acoustics,Speech, and Signal Processing (ICASSP ’99), 1999, vol. 5, pp.2423–2426.

[14] W. Wang, D. Peng, H. Wang, and H. Sharif, “An adaptive approachfor image encryption and secure transmission over multirate wirelesssensor networks,” in Special Issue on Distributed Systems of Sensorsand Applications, Wireless Communications and Mobile ComputingJournal. New York: Wiley, 2007.


[15] H. Wang, D. Peng, W. Wang, H. Sharif, and H. H. Chen, “Image trans-mission with security enhancement based on region and path diversityin wireless sensor networks,” IEEE Trans. Wireless Commun., vol. 8,no. 2, pp. 757–765, Feb. 2009.

[16] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “Energy ef-ficient multirate interaction in distributed source coding and wirelesssensor network,” in Proc. IEEE Wireless Communications and Net-working Conf. (WCNC 2007), Mar. 2007, pp. 4091–4095.

[17] W. Wang, D. Peng, H. Wang, H. Sharif, and H. H. Chen, “Optimalimage component transmissions in multirate wireless sensor net-works,” in Proc. IEEE Global Communications Conf. (GLOBECOM),Nov. 2007, pp. 976–980.

[18] H. Wang, D. Peng, W. Wang, H. Sharif, and H. H. Chen, “Collabora-tive image transmissions based on region and path diversity in wirelesssensor network,” in Proc. IEEE Global Communications Conf., Nov.2007, pp. 971–975.

[19] V. Britanak and K. R. Rao, “A new fast algorithm for the unified for-ward and inverse MDCT/MDST computation,” Signal Process., vol.82, pp. 433–459, 2002.

[20] T. V. Dam and K. Langendoen, “An adaptive energy-efficient MACprotocol for wireless sensor networks,” in Proc. ACM SenSys’03, LosAngeles, CA, Nov. 2003.

[21] [Online]. Available: http://www.tinyos.net/.[22] A. Wood and J. Stankovic, “AMSecure: Secure link-layer communica-

tion in TinyOS for IEEE 802.15.4-based wireless sensor Networks,” inProc. Int. Conf. Embedded Networked Sensor Systems 2006, vol. 1, pp.395–396.

[23] Z. Wu, A. Bilgin, and M. Marcellin, “Joint source/channel codingfor multiple images,” IEEE Trans. Commun., vol. 53, no. 10, pp.1648–1654, Oct. 2005.

[24] H. Cheng and X. Li, “Partial encryption of compressed images andvideo,” IEEE Trans. Signal Process., vol. 48, no. 8, pp. 2439–2451,Aug. 2000.

[25] A. M. Alattar, G. I. Al-Regib, and S. A. Al-Semari, “Improved se-lective encryption techniques for secure transmission of MPEG videobit-streams,” in Proc. 1999 Int. Conf. Image Processing (ICIP ’99),Kobe, Japan, Oct. 24–28, 1999, vol. 4, pp. 256–260.

[26] T. S. Procyck and E. H. Mamdani, “A linguistic self-organizing processcontroller,” Automatica, vol. 15, pp. 15–30, 1979.

[27] H. H. Hoos and Stutzle, Stochastic Local Search: Foundations and Ap-plications. San Francisco, CA: Morgan Kaufmann, 2005.

[28] P. Ganesan, R. Venugopalan, P. Peddabachagari, A. Dean, F. Mueller,and M. Sichitiu, “Analyzing and modeling encryption overhead forsensor network nodes,” in Proc. 2nd ACM Int. Conf. Wireless SensorNetworks and Applications, 2003, pp. 151–159.

[29] W. Wang, D. Peng, H. Wang, and H. Sharif, “An adaptive approachfor image encryption and secure transmission over multirate wirelesssensor networks,” Wireless Commun. Mobile Comput. J., vol. 9, pp.383–393, 2009.

[30] Crossbow Co. [Online]. Available: http://www.xbow.com.

Honggang Wang (M’10) received the Ph.D. degreein computer engineering from the University of Ne-braska-Lincoln in 2009.

He is currently an Assistant Professor at theUniversity of Massachusetts, Dartmouth. His re-search interests include networking, wireless sensornetworks, multimedia communication, network andinformation security, embedded sensory system,biomedical computing, and pattern recognition.

Michael Hempel (M’07) received the Ph.D. degreein computer engineering from the University of Ne-braska-Lincoln in 2007.

He is currently a Research Assistant Professor atthe University of Nebraska-Lincoln. His researchinterests include wireless communications networksand multimedia communications.

Dongming Peng (M’03) received the B.A. andM.A. degrees in electrical engineering from BeijingUniversity of Aeronautics and Astronautics, Beijing,China, in 1993 and 1996, respectively, and the Ph.D.degree in computer engineering from Texas A&MUniversity, College Station, in 2003.

He is currently an Associate Professor at theUniversity of Nebraska-Lincoln. His research in-terests include digital image processing and sensornetworks.

Wei Wang (M’10) received the B.Sc. and M.Sc.degrees from Xian Jiaotong University, Xian, China,in 2002 and 2005, respectively, both in electricalengineering.

He is currently an Assistant Professor at SouthDakota State University, Brookings. His researchinterests are wireless networks, wireless sensornetworks, and multimedia security.

Hamid Sharif (SM’06) received the B.Sc. degreefrom the University of Iowa, Iowa City, the M.Sc.degree from the University of Missouri, Columbia,and the Ph.D. degree from the University of Ne-braska-Lincoln, all in electrical engineering.

He is the Director of Advanced Telecommunica-tions Engineering Laboratory (TEL), University ofNebraska-Lincoln. His research areas include wire-less communications networks, wireless sensor net-works, and QoS in IP networks.

Hsiao-Hwa Chen (F’09) received the B.Sc. andM.Sc. degrees with the highest honor from ZhejiangUniversity, Hangzhou, China, and the Ph.D. degreefrom the University of Oulu, Oulu, Finland, in1982, 1985, and 1990, respectively, all in electricalengineering.

He is currently a full Professor in the Departmentof Engineering Science, National Cheng Kung Uni-versity, Tainan City Taiwan.

Dr. Chen served or is serving as an AssociateEditor and/or a Guest Editor of numerous important

technical journals in communications. He is serving as the Chief Editor (Asiaand Pacific) for Wiley’s Wireless Communications and Mobile Computing(WCMC) journal and Wiley’s International Journal of Communication Sys-tems, etc. He is the founding Editor-in-Chief of Wiley’s Security and Communi-cation Networks journal (http://www.interscience.wiley.com/journal/security).

Index-Based Selective Audio Encryption

Documents

Transcript of Index-Based Selective Audio Encryption