Binaural Sonification of Disparity Maps Alfonso Alba, Carlos Zubieta, Edgar Arce Facultad de...

15
Binaural Sonification of Disparity Maps Alfonso Alba, Carlos Zubieta, Edgar Arce Facultad de Ciencias Universidad Autónoma de San Luis Potosí

Transcript of Binaural Sonification of Disparity Maps Alfonso Alba, Carlos Zubieta, Edgar Arce Facultad de...

Binaural Sonificationof Disparity Maps

Alfonso Alba, Carlos Zubieta, Edgar ArceFacultad de Ciencias

Universidad Autónoma de San Luis Potosí

Contents• Project description• Estimation of disparity maps• Segmentation of disparity maps• Object sonification• Test application• Preliminary results• Future work

Project description• The goal of this project is to develop a scene

sonification system for the visually impaired.

• Images from a stereo camera pair will be used to detect objects in the scene and estimate the distance between them and the subject.

• A binaural audio signal will be synthesized for each object, so that the subject can “hear” the objects in the scene in their corresponding locations.

Scene sonification system

• The system will consist of the following stages:– Stereo image acquisition– Disparity map estimation– Disparity map segmentation (object detection)– Binaural sonification of objects in the scene

• Here we will focus only on the segmentation of a given disparity map, and sonification stages.

Estimation of disparity maps• Images from a pair of cameras, separated

by a certain distance, form a stereo image pair.

• The position of a certain object in one of the images will be shifted in the other image by an amount inversely proportional to the distance between the object and the camera arrangement.

• This displacement is called disparity, and can be computed for each pixel to form a disparity map.

• We are currently working on a technique to compute disparity maps in realtime.

Segmentation of disparity maps• Given a disparity map D(x,y), we perform a seeded region-growing

segmentation to detect the objects in the scene.

• To choose the seeds, the algorithm uses a fitness measure given by

where N(x,y) is the set of nearest-neighbors of (x,y), and q is a quality parameter (increases robustness to noise).

• This measure favors homogeneous regions (low dq)with the highest disparity (nearest objects).

Region-growing algorithm

1

1

0 Take a pixel from a region’s border.

For each unlabeled neighbor, compare its intensity to the region’s average intensity.

1

1

0

If they are similar enough, include the neighbor in the region.

1

1

01

Object sonification• Sound coming from a specific location will suffer a series of

degradations before it reaches our ears.

• These degradations provide various cues that our brain uses to locate the sound sorce.

• Binaural spatialization attempts to model these cues, in order to allow the listener to hear a sound as if it were coming from a specific point in space, which is typically defined in spherical coordinates (see below).

Object sonification

• We represent each object in the scene with a ping-like sound whose frequency depends on the disparity, so that the sound becomes more alerting as the object becomes closer.

• The audio signal corresponding to each object is fed through a binaural spatialization system whose parameters depend on the object’s position.

• Spatialization is performed by modeling azimuth and range cues. Elevation cues have not been implemented (yet).

Azimuth cues

• Inter-aural Time Difference: – The sound source is delayed by a different amount for each ear:

Tn = a – a sin(), Tf = a + a.

• Inter-aural Level Difference (head-shadow): – The sound is attenuated when passing through the head.– This cue can be modeled with a one-pole one-zero filter:

Brown et al., 1998

Range cues

• Artificial Reverberation:– Reverberation is the result of a large number of echoes originated from

the reflection of the sound in flat surfaces such as walls.– The level of reverberation is roughly constant and independent of source

location.– We use a simple model composed of 4 parallel delay lines with feedback.

• Attenuation:– The audio signal is attenuated according to the inverse quadratic law.– The ratio between the signal and reverberation levels provides an

additional cue for range.

Test application

• We simulate a moving scene by taking a 160 x 100 sub-frame of a precomputed disparity map.

• The 10 most relevant objects are segmented but only objects that are near enough are sonified.

Preliminary results• Fast segmentation times

– 5 ms per 160 x 100 frame in a 2.4 GHz dual core CPU– Over 100 frames per second including sonification stage (but

without disparity map estimation)– Embedded implementation is viable

• Good azimuth representation: object direction is easily perceived.

• Object range is perceived in a relative manner (e.g., one object is nearer than another), but not in an absolute way.

• Between 3 and 5 objects can be sonified before too much clutter is heard.

Future work

• Camera setup and calibration• Realtime estimation of disparity maps• Elevation cues in binaural spatialization• Optimization of sonification system• Implementation in an embedded device

Thank you!