A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

8
1632 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013 A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array Sarthak Khanal, Student Member, IEEE, Harvey F. Silverman, Fellow, IEEE, and Rahul R. Shakya Abstract—Large-aperture microphone arrays can be used to capture and enhance speech from individual talkers in noisy, multi-talker, and reverberant environments. However, they must be calibrated, often more than once, to obtain accurate 3-di- mensional coordinates for all microphones. Direct-measurement techniques, such as using a measuring tape or a laser-based tool are cumbersome and time-consuming. Some previous methods that used acoustic signals for array calibration required bulky hardware and/or xed, known source locations. Others, which allowed more exible source placement, often have issues with real data, have reported results for 2D only, or work only for small arrays. This paper describes a complete and robust method for automatic calibration using acoustic signals which is simple, repeatable, accurate, and has been shown to work for a real system. The method requires only a single transducer (speaker) with a microphone attached above its center. The unit is freely moved around the focal volume of the microphone array gener- ating a single long recording from all the microphones. After that, the system is completely automatic. We describe the free source method (FrSM), validate its effectiveness and present accuracy results against measured ground truth. The performance of FrSM is compared to that from several other methods for a real 128-mi- crophone array. Index Terms—Microphone arrays, calibration, microphone po- sitions, acoustic measurements, array processing. I. INTRODUCTION M ICROPHONE arrays are used to extract and enhance a talker’s speech for applications such as teleconfer- encing and speech recognition [1]–[3]. Almost all algorithms determine the source positions which, in turn, requires the ac- curate knowledge of microphone locations [4]. Thus, array cali- bration, the acquisition of the three-dimensional coordinates for each microphone, is an important issue. Calibration of a large, perhaps randomly distributed, micro- phone array is cumbersome and tedious when using the most conventional means—a measuring tape, or laser distance meter with or without a surveyor’s transit. This fact makes it highly un- likely the task will be undertaken a second time when the array invariably shifts after some time. Acoustic methods have the great advantage of obtaining positions for all the microphones Manuscript received December 19, 2012; revised March 15, 2013; accepted March 15, 2013. Date of publication April 04, 2013; date of current version April 25, 2013. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Emmanuel Vincent. The authors are with The Laboratory for Engineering Man/Machine Systems (LEMS), Brown University, Providence, RI 02912 USA (e-mail: [email protected]; [email protected]; rahul_shakya@brown. edu). Digital Object Identier 10.1109/TASL.2013.2256896 simultaneously, so many have been proposed. For large-aper- ture, three-dimensional, array geometries we suggest that the following criteria need to be met: Given normal rooms and using electret microphones (0.5–1 cm diameter) or MEMS microphones (about 1 mm diameter), one would want positional accuracy on the microphones to be within 1 cm. Setting-up or storing bulky hardware for mounting sources or having to precisely measure source positions a priori is not desirable. There should be no limit on the number of source locations that can be used for calibration. The frame of reference should be centered at the array itself and not at a temporary source system. Only a single transducer (speaker) should be used with free positioning. After recording, the rest of the process should be com- pletely automatic using a standard PC. The method we introduce in this paper, called the free-source method (FrSM), satises the criteria listed above. However, only a few of the methods cited in literature approximately meet the above criteria for large-aperture array calibration. Two of the earliest methods, proposed in [5], [6], used three source transducers but assumed a planar array. A method based on a maximum-likelihood (ML) estimator [7] required initial source and sensor estimates to be sufciently close to the true source and sensor locations, which likely involve direct actions by the user. Other methods used sources in known locations [8], time of arrival (TOA) [9] or energy [10] in order to estimate sensor locations also using an ML estimator. Given a large number of unknown sensor and/or source locations, the functional for these exhibits a complex surface with many local minima. The ML estimator in [11] worked well for rather small two-dimen- sional arrays but was not shown to be extensible to a large three-dimensional array. Methods that used basis-point multi- dimensional scaling (MDS), introduced in [12], [13], require the pairwise distances between the sources to be measured precisely. Whether these distances are measured manually by mounting the sources on a rigid frame [14] or acoustically [15], [16], this puts a practical upper limit on the number of source locations that can be used for calibration. A method applicable for a linear array [17] used only a single source but is not easily extensible for the large-aperture system. A more recent method using diffuse noise elds [18] measured the inter-microphone distances by minimizing a cost function characterizing the dif- ference between the measured diffuse noise and its theoretical model. However, it is only applicable for rather small arrays [19]. An extension of the method in [18], proposed by [19], 1558-7916/$31.00 © 2013 IEEE

Transcript of A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

Page 1: A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

1632 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013

A Free-Source Method (FrSM) for Calibratinga Large-Aperture Microphone Array

Sarthak Khanal, Student Member, IEEE, Harvey F. Silverman, Fellow, IEEE, and Rahul R. Shakya

Abstract—Large-aperture microphone arrays can be used tocapture and enhance speech from individual talkers in noisy,multi-talker, and reverberant environments. However, they mustbe calibrated, often more than once, to obtain accurate 3-di-mensional coordinates for all microphones. Direct-measurementtechniques, such as using a measuring tape or a laser-based toolare cumbersome and time-consuming. Some previous methodsthat used acoustic signals for array calibration required bulkyhardware and/or fixed, known source locations. Others, whichallowed more flexible source placement, often have issues withreal data, have reported results for 2D only, or work only forsmall arrays. This paper describes a complete and robust methodfor automatic calibration using acoustic signals which is simple,repeatable, accurate, and has been shown to work for a realsystem. The method requires only a single transducer (speaker)with a microphone attached above its center. The unit is freelymoved around the focal volume of the microphone array gener-ating a single long recording from all the microphones. After that,the system is completely automatic. We describe the free sourcemethod (FrSM), validate its effectiveness and present accuracyresults against measured ground truth. The performance of FrSMis compared to that from several other methods for a real 128-mi-crophone array.

Index Terms—Microphone arrays, calibration, microphone po-sitions, acoustic measurements, array processing.

I. INTRODUCTION

M ICROPHONE arrays are used to extract and enhancea talker’s speech for applications such as teleconfer-

encing and speech recognition [1]–[3]. Almost all algorithmsdetermine the source positions which, in turn, requires the ac-curate knowledge of microphone locations [4]. Thus, array cali-bration, the acquisition of the three-dimensional coordinates foreach microphone, is an important issue.Calibration of a large, perhaps randomly distributed, micro-

phone array is cumbersome and tedious when using the mostconventional means—a measuring tape, or laser distance meterwith or without a surveyor’s transit. This fact makes it highly un-likely the task will be undertaken a second time when the arrayinvariably shifts after some time. Acoustic methods have thegreat advantage of obtaining positions for all the microphones

Manuscript received December 19, 2012; revised March 15, 2013; acceptedMarch 15, 2013. Date of publication April 04, 2013; date of current versionApril 25, 2013. The associate editor coordinating the review of this manuscriptand approving it for publication was Dr. Emmanuel Vincent.The authors are with The Laboratory for Engineering Man/Machine

Systems (LEMS), Brown University, Providence, RI 02912 USA (e-mail:[email protected]; [email protected]; [email protected]).Digital Object Identifier 10.1109/TASL.2013.2256896

simultaneously, so many have been proposed. For large-aper-ture, three-dimensional, array geometries we suggest that thefollowing criteria need to be met:• Given normal rooms and using electret microphones(0.5–1 cm diameter) or MEMS microphones (about 1 mmdiameter), one would want positional accuracy on themicrophones to be within 1 cm.

• Setting-up or storing bulky hardware for mounting sourcesor having to precisely measure source positions a priori isnot desirable.

• There should be no limit on the number of source locationsthat can be used for calibration.

• The frame of reference should be centered at the array itselfand not at a temporary source system.

• Only a single transducer (speaker) should be used with freepositioning.

• After recording, the rest of the process should be com-pletely automatic using a standard PC.

The method we introduce in this paper, called the free-sourcemethod (FrSM), satisfies the criteria listed above. However,only a few of the methods cited in literature approximatelymeet the above criteria for large-aperture array calibration. Twoof the earliest methods, proposed in [5], [6], used three sourcetransducers but assumed a planar array. A method based on amaximum-likelihood (ML) estimator [7] required initial sourceand sensor estimates to be sufficiently close to the true sourceand sensor locations, which likely involve direct actions by theuser. Other methods used sources in known locations [8], timeof arrival (TOA) [9] or energy [10] in order to estimate sensorlocations also using an ML estimator. Given a large numberof unknown sensor and/or source locations, the functional forthese exhibits a complex surface with many local minima. TheML estimator in [11] worked well for rather small two-dimen-sional arrays but was not shown to be extensible to a largethree-dimensional array. Methods that used basis-point multi-dimensional scaling (MDS), introduced in [12], [13], requirethe pairwise distances between the sources to be measuredprecisely. Whether these distances are measured manually bymounting the sources on a rigid frame [14] or acoustically [15],[16], this puts a practical upper limit on the number of sourcelocations that can be used for calibration. A method applicablefor a linear array [17] used only a single source but is not easilyextensible for the large-aperture system. A more recent methodusing diffuse noise fields [18] measured the inter-microphonedistances by minimizing a cost function characterizing the dif-ference between the measured diffuse noise and its theoreticalmodel. However, it is only applicable for rather small arrays[19]. An extension of the method in [18], proposed by [19],

1558-7916/$31.00 © 2013 IEEE

Page 2: A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

KHANAL et al.: FRSM FOR CALIBRATING A LARGE-APERTURE MICROPHONE ARRAY 1633

calibrated multiple sub-arrays and then fused them together tofind their relative positions and orientations. The error reportedfor this method on microphones separated by a relatively largedistance, however, does not meet the accuracy cited above.The method of [20] discusses the problem of putting calibratedsub-arrays into a single frame of reference. The method of[21] used only a single source in unknown positions and isdemonstrated in a nice YouTube video, but it required a finegrid search over the entire search space, was highly dependenton its initial conditions, and was not shown to be extensible tothe large-aperture environment. The algorithm in [22] reducedthe complexity of the problem by using matrix decomposition.The algorithm proposed by [23] required solving only linearequations and matrix factorization to simplify the problem. Thelatter two methods are perhaps the closest to satisfying all ourcriteria and that of [22] will be compared to the FrSM methodin Section III. Both these methods seem to work well withsynthetic data as reported; however, no results were publishedfor real data on a real array, although we shall do it here. Realdata has background noise that is not necessarily Gaussian aswell as large correlated-noise components (echoes). This makeexperiments in the real environment essential for a calibrationalgorithm to be acceptable.A brief overview of the FrSM algorithm is given here.

Let us consider an element large-aperture microphonearray. First, the user measures with a tape, or laser device theinter-microphone distances for three microphones, establishinga convenient room coordinate system with one of the micro-phones acting as point (0, 0, 0). A single speaker (tweeter)source device with a microphone attached above its centeris used for the next, data acquisition phase of the calibrationprocedure. Given a designed array with microphone signalssynchronized, this arrangement allows one to get estimates ofthe source-to-microphone times-of-flight (TOFs) rather thanusing time-differences-of-arrival (TDOAs). A recording isobtained by freely moving the tweeter source (Fig. 1) aboutthe room, chirping at equispaced time intervals for as long asthe user wants, generating chirps. The interval time and thelength of the recording are parameters that are dependent uponthe room acoustics, the geometry of the array, and the desiredaccuracy. The rest of the calibration is automatic and a boot-strap process. It uses the recording and the three-microphonecoordinate system to first determine an initial estimate of thesource positions. These are then used to compute initial

estimates of the remaining microphone positions. Whilethe previous stage should ideally yield exact coordinates, inreality, the microphone-location estimates are not sufficientlyaccurate because:1) The source locations have been estimated rather than di-rectly measured.

2) The speaker (tweeter dome) (Fig. 1) is not a perfect pointsource and has a voice coil about 3 cm wide.

3) The close microphone sits about 1 cm above the tweeterdome. Thus, the measured TDOAs are not exactly equal tothe TOFs between the tweeter and each microphone.

4) The microphones are about .9 cm in diameter, i.e., not apoint in space.

Fig. 1. A single domed tweeter with a microphone attached above its centerused for FrSM. Note the microphone is surrounded by a dampening foam toreduce overload.

5) The speed of sound may vary slightly throughout the roomand/or the temperature may not be measured perfectly.

Note that the source-microphone offset mentioned in 3. aboveis a bona fide factor in real systems. Our algorithm has been de-signed to handle this kind of error, which has not been consid-ered in other calibration methods. Thus, two refinement stagesfollow, the first addresses the first two items above and thesecond perturbs the distances obtained from the TOF’s to over-come the remaining errors. It should be noted that the first setof three measurements need never be repeated as the refinementstages of the ensuing algorithm will adjust the system for anyslight changes of these referents through time. The entire proce-dure can be easily done by one person in about tenminutes.Mostimportantly, the experiments shown yield an accuracy of within0.2% in the determination of microphone locations in a knowncoordinate system for a room that is 500 700 300 cm. If mi-crophone positions change at all over time, the procedure maybe easily and effectively repeated.

II. THE FREE-SOURCE METHOD (FRSM)

The FrSM consists of six stages: setting up a fixed coordi-nate frame, recording the microphone outputs while a source ismoved about the room, obtaining estimates of the initial sourcelocations, obtaining estimates of the initial microphone coor-dinates, a refinement procedure to reduce microphone positionerror and a perturbation stage to further improve accuracy. Thefirst stage of the algorithm involves setting up a fixed coordi-nate frame. Three reference microphones are selected from thearray that are close enough to be measured by a single personeasily and are placed well to set up a coordinate system. Onethen measures, with a tape or ruler or laser distance device,the three pairwise distances among them. While any three non-collinear microphones may be chosen, best results will be ob-tained with easy-to-measure microphones separated by at least10% of the maximum dimension of the room. The positions ofthe microphones selected for our array are shown in Fig. 2.Calling the three microphones and , with position

vectors and , let pairwise distances be given byand . As typical electret microphones are only 0.9 cm or

smaller and MEMS microphones smaller yet, relative to mostrooms, they may approximated as a single point in space. A mi-crophone, , is conveniently chosen to be the origin, (0, 0, 0),

Page 3: A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

1634 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013

Fig. 2. A fisheye view, facing upwards, of the large-aperture HMAII array. Theroom is about 500 700 300 cm. The numbers inside the parentheses are thenumber of microphones in each section.

of our fixed coordinate frame. Another microphone, , is se-lected to be on the -axis of our coordinate frame. Thus, the co-ordinates for microphone are simply . The thirdmicrophone lies on a circle that is perpendicular to the linejoining the microphones and , i.e., the -axis. We thushave,

(1)

(2)

If the coordinates for are given by , then the Eu-clidean norms above can be used to yield,

(3)

(4)

Equations (3) and (4) validate that lies on a circle that isunits away from the origin along the -axis and is parallel to

the plane of our coordinate frame. The radius of this circle is. The coordinates for may be chosen as any point

on the circumference of this circle. However, for simplicity, thecoordinate of is chosen to be zero. Thus, from (4) above,

the coordinate of is given by the radius of the intersectingcircle, i.e., . Hence,

(5)

(6)

(7)

In the second stage of the algorithm, we record a file of all themicrophone outputs while a source is arbitrarily moved aboutthe focal volume of the array. A key element of FrSM is that ituses minimal special hardware for calibration. Only the hand-held tweeter/microphone combination shown in Fig. 1 needsto be connected to the recording PC. Specifically, any smallspeaker capable of producing sufficiently loud and fast attack“chirps” over the chirp frequency range will do. Our speaker isa domed tweeter. As, in our room, all microphones to becalibrated are mounted high on walls or on the ceiling (Fig. 2),

placing the tweeter at “chest” level facing upward will yieldnearly spherical radiator performance, i.e., each microphone re-ceives close to the same amplitude although attenuated only bythe distance to the source. Thus we treat the sphere’s center apoint source. Mounted exactly 1 cm above the center of the sur-face is an electret microphone with a small “muffler” on it toattenuate the output from the nearby tweeter. This microphoneshould be recorded at the same time and in the same manneras the array microphones, which assures synchrony of all therecorded signals.If a nominal value of , the reverberation time of the room,

is known, then, using the sound card output from the recordingPC, the speaker may be chirped with short pulses separatedby 2 to 5 times the room . If is not known, it may ei-ther be estimated using Sabine’s Law [24] or one can use avery conservative repetition interval which would only affectthe recording time. We recommend a chirp that ranges fromabout 2 KHz to about 6 or 7 kHz in about 5–20 ms with a rep-etition period set for the given room. We used a top frequencyfor the chirp of 6.5 KHz and repeated it every 2 seconds, fora room with s. The tweeter location remained fixedfor each chirp and the unit was arbitrarily moved in betweenchirps throughout the focal volume of the microphone array fora user-controlled recording time, or chirps. The close micro-phone, the one mounted as part of the tweeter system, offers theadvantage that the TDOA between the close microphone andany other microphone is nearly the TOF from the source to thatmicrophone for some chirp. For our array, about chirps(a 30–40 second recording) from different locations within thefocal volume were sufficient to attain the desired level of accu-racy. Given that the recorded file has microphones in synchrony,the high signal-to-noise signal at the close microphone may beauto-correlated to obtain a set of start times for each chirp event.While our hardware was designed to synchronize accurately,if hybrid, non-synchronous hardware were to be used as a mi-crophone array, then a synchronizing scheme would have to beavailable. A matrix of TOFs, , where is the indexon the chirp number and the index on the microphonenumber, is derived. The values are the TDOAs between the closemicrophone and all other microphones for each recorded chirp.These TDOAsmay be accurately determined using the General-ized Cross Correlation method with the phase transform weight,GCC-PHAT [25].In the third stage of the algorithm source locations are es-

timated using the matrix of and the reference microphones.Given the reference microphones’ coordinates, the source lo-cations, , may be estimated using the distance between thesource location for the th chirp and the th, , refer-ence microphone,

(8)

Each source location is estimated by solving a minimizationproblem using the speed of sound at a nominal, measured tem-perature .

(9)

Page 4: A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

KHANAL et al.: FRSM FOR CALIBRATING A LARGE-APERTURE MICROPHONE ARRAY 1635

In three dimensions, (9) will have two solutions [26]. Ifa plane is drawn passing through the three reference micro-phones, the solutions will be reflected images of each otherabout this plane. To determine the correct solution, the mini-mization problem must be constrained to the side of the planeon which the tweeter was chirped, i.e., find subject to theconstraint or , where,

is the equation of the plane passing throughthe reference microphones. Since the surface for a single chirpwith three known distances tends to be smooth, any standardoptimization procedure may be used to find the solution. Weused MATLAB’s constrained optimization package fmincon,and ran this for all chirps.The fourth stage involves estimating the remainder of

the microphone locations using and . There aremicrophones with unknown positions given by

. The distance between source withestimated position and each unknownmicrophone can be written as

(10)

where .Combining (10) for two estimated source positions, and ,yields

(11)

This simplifies to,

(12)

which may be written using vectors as

(13)

For sources, there are unique pairs. Thuswe may form a matrix that is of orderwith each row the coordinate difference between a unique pair.Furthermore we define as an vector, eachelement of which is given byfor the corresponding source pair.Hence,

(14)

Using the Moore-Penrose generalized matrix inverse [27], [28](defining ) to compute the shortest length, least-squares, so-lution to the set of linear equations (14), the microphone posi-tion estimates may be obtained from

(15)

This standard solution method has also been used to determinemicrophone locations, given known source locations, in [15].Alternatively, the minimization problem could be solved di-rectly and more accurately for the least-squares norm or for

some other norms for the cost function. However, (15) is oftenmuch simpler to formulate and solve and its results are sufficientfor this stage of the algorithm.A refinement procedure is applied in the fifth stage of the

algorithm to reduce the error in microphone position estimatesdue to inaccurate source-location estimates. In this stage we finda least-squares solution to the error functional between the cur-rent estimates of the source and microphone positions and themeasured TOF-based distances. The functional is taken as

(16)

where, is the unit vector in the direction of the line joiningand . It has been shown in the Appendix (Section V) that

this is mathematically equivalent to the more commonly usedmaximum-likelihood formulation [22] (6), but this alternateform eliminates dealing with square roots and yields qualityresults in a simpler fashion. This optimization stage allowsall the variables, including the reference microphones, to beadjusted, thus, handling any displacement of these referentsthrough time. In order to minimize (16), the following iterativeprocedure is used.a) Initialize for and for usingthe estimates from previous stages, and initialize iterationindex .

b) Compute initial direction vectors,.

c) Compute the error in (16), .d) Update the variables by computing their respective de-scent vectors,

e) Update the direction vectors.f) Evaluate the new error in (16), .g) If , stop iteration, else increment andgo to step (c).

It should be noted that minimizing (16) for a random set ofinitial estimates does not guarantee that the microphone posi-tion error will be reduced. All the pairwise distances, includinginter-microphone and inter-speaker pairs, are needed in orderto completely specify a set of points in three-dimensional space[12], [29]. Thus, for a given subset of inter-point distance mea-surements, (16) will have multiple local minima so the mini-mization routine must be initialized with estimates close to thetrue values.In the sixth and final stage of the algorithm, the accuracy

of the method is further improved by Perturbing the distanceestimates derived from the recordings. We expect that the ma-jority of the remaining error in our estimates to be offset errorsdue to the small distance between the source transducer and theclose microphone, and perhaps, variations in the speed of sound.Although the offset in each source-to-microphone distance willbe different, we hypothesize that it is largely a function of the

Page 5: A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

1636 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013

Fig. 3. Top view of the large-aperture HMAII array, defining three regions 1,2, and 3.

source location indexed by . Thus, for a given source location,, its distance offset to all the microphones is taken to be a

constant. To account for this error, (16) can be reformulated toinclude small correction factors, .

(17)

Then, for this practical error model, finding the optimal valuesof will yield even better estimates of both the sourceand microphone positions. The new optimization is donein the same fashion as in the previous stage, except the di-mensionality of the internal vector goes from to

. The computational difference is that at steps(c) and (f) the error in (17) is computed, at step (d)is replaced by , and at the end of step (d) isupdated by equating , implying the solution

. It is necessaryto do both stages 5 and 6 because stage 6 needs to have betterestimates to converge consistently.

III. EXPERIMENTAL VALIDATION

To assess accuracy, some precise and very reliable ground-truth measurements had to be established. Unfortunately, for alarge system, using direct physical measurement to obtain ac-curate coordinates is a daunting task. Instead, we mea-sured the distances between 100 different, easily-accessible mi-crophone pairs using a modern laser distance meter that adver-tises 1.5 mm accuracy.While it is likely a priori that it should be best to place

sources throughout the room with nearly a uniform distribu-tion, we wanted to make sure this was the case, given thatthe coordinate-determining reference microphones were allin one corner. Thus we used the ground-truth data to validatethis likelihood. Fig. 3 shows a top view of our room, definingthree regions and indicating the reference microphones used toestablish the coordinate system. Four methods for placementof sources were tried (Table I). As expected, best results wereobtained when the source locations were spread throughout the

Fig. 4. Top view of the room with the final microphone (star) and source(circle) location estimates. The blocks are the tables and cabinets in the room.As seen above, the source locations are spread throughout the room wherepossible. The microphone location estimates shown in the middle of the roomare actually on the ceiling.

TABLE ITHE ACCURACY OF THE ALGORITHM DEPENDING ON THE REGION WHERE THE

CHIRPS ARE TAKEN. THE REGIONS ARE DEFINED IN FIG. 1

focal volume of the microphone array as shown in two dimen-sions by the circles in Fig. 4. The estimates for microphonepositions are shown as stars. However, for an even larger arrayone would expect the accuracy of the source-location estimatesto deteriorate with distance from the reference microphones.The algorithm would then have to be modified to automaticallyestablish new referents as a section of the array gets calibratedaccurately.The next experiment was to determine the accuracy as a func-

tion of the number of source locations (Fig. 5). The error asymp-totically decreases with an increasing number of source loca-tions. We recorded as many as 60 source locations and saw thataccuracy did not get much better after about 16.In Fig. 6 we show the distribution of the source-to-micro-

phone distance offsets, , using 133 different source locations.These are the optimal offsets that minimized the error functional(17). The resultant spread of the histogram justifies using anerror perturbation that is a function of the source index ; a con-stant would be too basic and a perturbation that is also a functionof the microphone would be overkill.Finally we compare the accuracy of FrSM to three other

algorithms. Two of these have been previously proposed andused for large-aperture arrays at our own laboratory. TheFixed-Sources method has been described in [14]. The secondmethod, the SRP-PHAT method, was never published, but used

Page 6: A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

KHANAL et al.: FRSM FOR CALIBRATING A LARGE-APERTURE MICROPHONE ARRAY 1637

Fig. 5. Mean and Standard Deviation of Absolute error as a function of theNo. of Source Locations used for calibration (FrSM). Note: Normalized ScaleMean or Std. Dev. of error/Mean mic-mic distance.

Fig. 6. A histogram of the source-dependent offset errors using 133 sourcelocations.

an SRP-PHAT locator [30] inversely, locating each microphonefrom about ten accurately known source positions. The thirdmethod has been proposed by Crocco, Del Bue and Murino[22] and was published in February 2012. This method is takenas the current state-of-the-art. We duplicated [22] as best wecould in MATLAB and tested our implementation of theiralgorithm by using an exact replica of the synthetic data thatwas reported in the paper. We were able to replicate their resultsvery accurately. Thus, assuming our implementation is correct,we used it for comparison with other methods.We first compared FrSM with the method in [22] using syn-

thetic data. For Fig. 7 we generated 128 microphones and 29sources randomly in a 1 m cube, and added various levels ofzero-mean Gaussian noise into the source-to-microphone dis-tances. We see, as expected, both the methods, [22] and FrSM,track these data nearly perfectly when no offsets are present.For the two other plots in Fig. 7, the data are simulated sim-

ilarly, but offsets are added to the source-to-microphone dis-tances in addition to zero-mean Gaussian noise. We considera moderate and a high case with mean source-dependent offsets

Fig. 7. Mean Microphone Position Error for FrSM and the method in [22] asa function of the standard deviation of added Gaussian noise for various sourceoffset errors.

Fig. 8. Mean Microphone Position Error using 128 microphones for FrSM andthe method in [22] as a function of the number of sources used (moderate sourceoffset with an additive zero-mean Gaussian noise with 0.002 m standard devia-tion was used).

of .002 m and .02 m respectively. Note, as our room is aboutseven times larger with about 2 cm of mean offset, the synthetic.002 m mean offset is close to our room conditions. It is evidentfrom the figure that FrSM has been designed to handle this kindof error, while the published method did not consider any offseterrors. The synthetic data for the moderate case above is usedas we show the mean microphone position error for 128 micro-phones as a function of the number of sources used (Fig. 8).In all cases, FrSM indicates a smaller error than the method of[22] with an improvement of 15.3% when 10 sources are usedto about 57.4%, when 100 sources are used.Finally, we present real results for our 128-microphone

array HMAII (Fig. 2) in Fig. 9. All computations were done inMATLAB and none took longer than 4 minutes to compute, sowe have not considered computation time for the algorithms.The first conclusion that may be drawn, and an important factorfor this paper, is that both the refinement and perturbationstages are essential for good performance. The FrSM algorithmthat uses the perturbation provides the smallest error by more

Page 7: A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

1638 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 8, AUGUST 2013

Fig. 9. Performance on the real HMAII system using 100 inter-microphonemeasured distances for various methods as listed. Note: Normalized ScaleMean or Std. Dev. of error/Mean mic-mic distance.

than a factor of two. Our older methods showed considerablyhigher error than did either the method of [22] or FrSM with orwithout perturbation. We suspect, however, that a perturbationstep added to the method of [22] would give comparable results.

IV. CONCLUSION

We have presented a new acoustic method, FrSM, for cali-bration of a large-aperture microphone array. FrSM features thefree movement of a single source around an array’s focal volumeduring a single recording stage. The procedure is simple enoughfor one person to perform and requires minimal special hard-ware—a transducer source with amicrophone attached above itscenter, the source tied directly to the recording PC’s sound cardand the microphone using one of the array inputs. Another im-provement is that FrSM uses a source-to-microphone distanceoffset model. This allows subsequent refinement and perturba-tion stages which increase accuracy significantly for FrSM andperhaps other algorithms.The real array results for HMAII indicate that the accuracy

of the algorithm increases with the number of source locationsused (Fig. 5). This should, in general, be true for any calibrationmethod using multiple source placements to determine micro-phone locations. Thus, methods that rely on using sources at pre-cisely known or measured locations, such as the Fixed-Sourcesor the SRP-PHAT, have difficulty in obtaining this benefit be-cause it can become difficult to place and measure sources atmore than a few known locations every time calibration needsto be done. With FrSM, 16 or more source locations can easilybe generated by simply moving the speaker/microphone unitaround the room and taking multiple chirps at random locations.FrSM also establishes a coordinate system on the micro-

phones, and not the sources, which makes it ideal for multiplecalibrations. Given the simple apparatus, minimal setup isneeded, and the entire calibration cycle takes fewer than 10minutes for a single operator with 6 minutes for data collec-tion/manipulation and about 4 minutes to run the automaticsoftware using MATLAB. There is also sufficient redundancyto discard and/or manipulate data because the source may beeasily placed at extra locations.

The technique and the algorithm have been tested on aparticular large-aperture microphone array containing 128microphones called the HMAII, which is spread on the wallsand ceiling of a 500 700 300 cm room. The algorithmyielded the microphone locations with an average error in thedistance between microphone pairs of about 1.04 cm. Assumingan additive property, the error in the position measurementswould hence be around 0.5 cm. This represents an error ofless than 0.2% in this room. For most applications in signalprocessing that use a large-aperture microphone array, theaccuracy achieved with this method is more than acceptable.

APPENDIX

The maximum-likelihood estimator of the unknown sourceand microphone locations, given known TOFs, is the solutionto the least-squares problem [22] (6).

(18)

Equation (18) can be manipulated as

(19)

(20)

(21)

(22)

Replacing by the unit vector in thedirection of , we get

which is the expression in (16). The unit vector, , has beentreated as an independent parameter during the descent. Its valuegets updated after each iteration.

REFERENCES[1] Q. Zou, X. Zou, M. Zhang, and Z. Lin, “A robust speech detection

algorithm in a microphone array teleconferencing system,” in Proc.IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP’01), 2001,vol. 5, pp. 3025–3028, vol. 5.

[2] M. L. Seltzer, B. Raj, and R. M. Stern, “Speech recognizer-based mi-crophone array processing for robust hands-free speech recognition,”in Proc. IEEE Int. Conf. Int. Conf. Acoust., Speech, Signal Process.(ICASSP), May 2002, vol. 1, pp. I-897–I-900.

[3] J.-T. Chien, J.-R. Lai, and P.-Y. Lai, “Microphone array signal pro-cessing for far-talking speech recognition,” in IEEE 3rd WorkshopSignal Process. Adv. Wireless Commun. (SPAWC’01), 2001, pp.322–325.

[4] H. Do and H. Silverman, “A method for locating multiple sources froma frame of a large-aperture microphone array data without tracking,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP ’08),2008, pp. 301–304.

Page 8: A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array

KHANAL et al.: FRSM FOR CALIBRATING A LARGE-APERTURE MICROPHONE ARRAY 1639

[5] Y. Rockah and P. Schultheiss, “Array shape calibration using sourcesin unknown locations—Part I: Far-field sources,” IEEE Trans. Acoust.,Speech, Signal Process., vol. ASSP-35, no. 3, pp. 286–299, Mar. 1987.

[6] Y. Rockah and P. Schultheiss, “Array shape calibration using sourcesin unknown locations—Part II: Near-field sources and estimator im-plementation,” IEEE Trans. Acoust., Speech, Signal Process., vol. 35,no. 6, pp. 724–735, Jun. 1987.

[7] A. Weiss and B. Friedlander, “Array shape calibration using sourcesin unknown locations-a maximum likelihood approach,” IEEE Trans.Acoust., Speech, Signal Process., vol. 37, no. 12, pp. 1958–1966, Dec.1989.

[8] B. C. Ng and C.M. S. See, “Sensor-array calibration using a maximum-likelihood approach,” IEEE Trans. Antennas Propag., vol. 44, no. 6,pp. 827–835, Jun. 1996.

[9] R. Biswas and S. Thrun, “A passive approach to sensor network local-ization,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS ’04),2004, vol. 2, pp. 1544–1549, vol.2.

[10] M. Chen, Z. Liu, L.-W. He, P. Chou, and Z. Zhang, “Energy-basedposition estimation of microphones and speakers for ad hoc micro-phone arrays,” in Proc. IEEE Workshop Applicat. Signal Process.Audio Acoust., Oct. 2007, pp. 22–25.

[11] J. Chen, R. Hudson, and K. Yao, “Maximum-likelihood source local-ization and unknown sensor location estimation for wideband signalsin the near-field,” IEEE Trans. Signal Process., vol. 50, no. 8, pp.1843–1854, Aug. 2002.

[12] S. Birchfield, “Geometric microphone array calibration by multidi-mensional scaling,” in Proc. IEEE Int. Conf. Acoust., Speech, SignalProcess. (ICASSP’03), Apr. 2003, vol. 5, pp. V-157–V-160, vol.5.

[13] S. Birchfield and A. Subramanya, “Microphone array position calibra-tion by basis-point classical multidimensional scaling,” IEEE Trans.Speech Audio Process., vol. 13, no. 5, pp. 1025–1034, Sep. 2005.

[14] J. Sachar, H. Silverman, and W. Patterson, “Microphone position andgain calibration for a large-aperture microphone array,” IEEE Trans.Speech Audio Process., vol. 13, no. 1, pp. 42–52, Jan. 2005.

[15] V. Raykar and R. Duraiswami, “Automatic position calibration of mul-tiple microphones,” in Proc. IEEE Int. Conf. Int. Conf. Acoust., Speech,Signal Process. (ICASSP’04), May 2004, vol. 4, pp. iv-69–iv-72, vol.4.

[16] V. Raykar, I. Kozintsev, and R. Lienhart, “Position calibration of mi-crophones and loudspeakers in distributed computing platforms,” IEEETrans. Speech Audio Process., vol. 13, no. 1, pp. 70–83, Jan. 2005.

[17] P. Jager, M. Trinkle, and A. Hashemi-Sakhtsari, “Automatic micro-phone array position calibration using an acoustic sounding source,” inProc. 4th IEEE Conf. Ind. Electron. Applicat. (ICIEA ’09), May 2009,pp. 2110–2113.

[18] I. McCowan, M. Lincoln, and I. Himawan, “Microphone array shapecalibration in diffuse noise fields,” IEEE Trans. Audio, Speech, Lang.Process., vol. 16, no. 3, pp. 666–670, Mar. 2008.

[19] M. Hennecke, T. Plotz, G. Fink, J. Schmalenstroer, and R. Hab-Um-bach, “A hierarchical approach to unsupervised shape calibration ofmicrophone array networks,” in Proc. IEEE/SP 15th Workshop Statist.Signal Process. (SSP’09), 2009, pp. 257–260.

[20] S. Valente, M. Tagliasacchi, F. Antonacci, P. Bestagini, A. Sarti, and S.Tubaro, “Geometric calibration of distributed microphone arrays fromacoustic source correspondences,” in Proc. IEEE Int. Workshop Multi-media Signal Process. (MMSP), Oct. 2010, pp. 13–18.

[21] D. Dobler, G. Heilmann, and M. Ohm, “Automatic detection of micro-phone coordinates,” in Proc. Berlin Beamforming Conf., Berlin, Ger-many, 2010.

[22] M. Crocco, A. Del Bue, and V. Murino, “A bilinear approach to theposition self-calibration of multiple sensors,” IEEE Trans. SignalProcess., vol. 60, no. 2, pp. 660–673, Feb. 2012.

[23] M. Pollefeys and D. Nister, “Direct computation of sound and micro-phone locations from time-difference-of-arrival data,” in Proc. IEEEInt. Conf. Acoust., Speech, Signal Process. (ICASSP ’08), 2008, pp.2445–2448.

[24] M. Rossi, Acoustics and Electroacoustics. Norwood, MA, USA:Artech House, 1988.

[25] C. Knapp and G. Carter, “The generalized correlation method for esti-mation of time delay,” IEEE Trans. Acoust., Speech, Signal Process.,vol. 24, no. 4, pp. 320–327, Aug. 1976.

[26] D. Manolakis, “Efficient solution and performance analysis of 3-D po-sition estimation by trilateration,” IEEE Trans. Aerosp. Electron. Syst.,vol. 32, no. 4, pp. 1239–1248, Oct. 1996.

[27] Q. Tong, S. Liu, Q. Lu, and J. Chai, “A fast algorithm ofMoore-Penroseinverse for the symmetric loewner-type matrix,” in Int. Conf. Inf. Eng.Comput. Sci. (ICIECS ’09), Dec. 2009, pp. 1–4.

[28] J. Parks-Gornet and I. Imam, “Using rank factorization in calculatingthe Moore-Penrose generalized inverse,” in Proc. IEEE South-eastcon’89. Energy Inf. Technol. Southeast., Apr. 1989, pp. 427–431,vol.2.

[29] T. Cox and M. Cox, Multidimensional Scaling. Boca Raton, FL,USA: CRC, 2001, vol. 1.

[30] H. Do, H. Silverman, and Y. Yu, “A real-time SRP-PHAT sourcelocation implementation using stochastic region contraction(SRC)on a large-aperture microphone array,” in Proc. IEEE Int. Conf.Acoust., Speech, Signal Process. (ICASSP ’07), Apr. 2007, vol. 1, pp.I-121–I-124.

Sarthak Khanal (M’12) was born in Kathmandu,Nepal in 1987. He received the B.S. degree with de-partmental honors in engineering and physics fromTrinity College, Hartford, CT, in 2011, and is cur-rently pursuing the Ph.D. degree in electrical engi-neering from Brown University, Providence, RI.From 2008–10, he was a Research Assistant to Dr.

David Branning at Trinity College. His research in-volved developing fast and compact electronic cir-cuits for multi-photon coincidence counting. He wasa member of the Robot Study Team at Trinity College

form 2008–11. His work focused on assistive robotics including robo-waiter andfire fighting robots. He received a grant from the Kathryn Wasserman DavisFoundation in 2010 to organize a robotics workshop in Haifa, Israel. His cur-rent research interests include microphone array processing, array calibration,source localization using microphone arrays, and speech processing.Mr. Khanal’s awards and honors include Phi Beta Kappa, Sigma Pi Sigma,

and several honors from Trinity College including Optimae or Optimi, Presi-dent’s Fellow in Physics, Physics Senior Prize and Albert J. Howard Jr. Prize inPhysics.

Harvey F. Silverman (F’99) was born in Hartford,Connecticut on August 15, 1943. He received theB.S. and B.S.E.E. degrees from Trinity College,Hartford, CT in 1965 and 1966, and the ScM andPh.D. degrees from Brown University, Providence,RI in 1968 and 1971 respectively.He worked with Joseph Gerber of Gerber Scien-

tific Instruments from 1964–66 and helped design thefirst Gerber plotter. He was at the IBM Thomas J.Watson Research Center from 1970 to 1980, workingin the areas of digital image processing, computer

performance analysis, and was an original member of the IBM Research speechrecognition group that started in 1972. He was manager of the Speech Ter-minal project from 1976 until 1980. At IBM he received several outstandinginnovation awards and patent awards. In 1980, he was appointed Professor ofEngineering at Brown University, and charged with the development of a pro-gram in computer engineering. His research interests currently include micro-phone-array research, array signal processing, speech processing and embeddedsystems. He has been the Director of the Laboratory for Engineering Man/Ma-chine Systems in the School of Engineering at Brown since its founding in1981. From July 1991 to June 1998 he was the Dean of Engineering at BrownUniversity.Dr. Silverman was a member of the IEEE Acoustics, Speech and Signal Pro-

cessing Technical Committee onDigital Signal Processing andwas its Chairmanfrom 1979 until 1983. He was the General Chairman of the 1977 ICASSP inHartford. He received an IEEE Centennial Medal in 1984. He was Trustee ofTrinity College in Hartford, CT 1994–2003, and is a Lifetime Fellow of IEEE.

Rahul Shakya was born in Kathmandu, Nepal in1987. He received the B.S. degree with departmentalhonors in engineering and mathematics from TrinityCollege, Hartford, CT, in 2011, and is currentlypursuing the Ph.D. degree in electrical engineeringfrom Brown University, Providence, RI.He was a member of the Robot Study Team at

Trinity College form 2008-10 and the chair in 2011.His work focused on autonomous navigation mainlyinvolving the robot ‘Q’—the team’s entry for theIntelligent Ground Vehicle Competition. His current

research interests include microphone array processing, wireless microphonearray technology, and speech processing.Mr. Shakya’s awards and honors include Phi Beta Kappa, and several honors

from Trinity College including Optimae or Optimi, President’s Fellow in Engi-neering and the Engineering Senior Prize.