8th Annual CSIS Research Conference 1 Client Server Browsing of Sound Resources: Classification and...

Post on 11-Jan-2016

212 views 0 download

Transcript of 8th Annual CSIS Research Conference 1 Client Server Browsing of Sound Resources: Classification and...

8th Annual CSIS Research Conference

1

Client Server Browsing of Sound Resources: Classification and Browsing

E. Brazil

Interaction Design Centre

University of Limerick

Ireland

8th Annual CSIS Research Conference

2

Introduction

? - how to classify sound resources and how

to provide an interface to browse these

resources.

! - provide a browsable sound database for

users via intranet / Internet environments

8th Annual CSIS Research Conference

Overview of Research Areas

• Sound Classification

• Sound Representation

• Sound Browsing

8th Annual CSIS Research Conference

Sound Classification

• Two levels of classification

• Course level– Distinguish whether Speech, Music,

Environmental, Silence or Other category

• Fine level– Use human perceptual features

8th Annual CSIS Research Conference

Coarse-level classification of audio (1)

– Audio signals are classified into basic types, including speech, music, several types of environmental sounds, and silence

– Take morphological and statistical analyses of short-time feature curves (energy function, average zero-crossing rate, fundamental frequency), as well as a rule-based heuristic classification procedure

8th Annual CSIS Research Conference

Coarse-level classification of audio (2)

• Short-time energy function– Short-time energy of audio signal reflects the

amplitude variations over time

• Short-time average zero-crossing rate

– ZCR is the number of times the signal passes

through zero in a given time interval

• Spectral Centroid

8th Annual CSIS Research Conference

Fine-level classification of audio

• Further classification will be conducted within each basic type:

– music: classify music played by different instruments, different types of music, singing, plain song

– speech: differentiate voices of man, woman, and child, speech with music background

– environmental sound: divide them into classes such as applause, bell ring, footstep, windstorm, laughter, bird’s sound, and so on

8th Annual CSIS Research Conference

Sound Representation

• Previous work has concentrated on– Visual star-field type display

• New novel visual representations– Visualisations on spheres (non-Euclidean

spaces)– Hyper tree– Excentric labeling

8th Annual CSIS Research Conference

Star-field Display

Virtual University - Uni. Vienna

8th Annual CSIS Research Conference

Visualisations on Spheres

H3: Laying OutLarge DirectedGraphs in 3D HyperbolicSpace - Munzer

8th Annual CSIS Research Conference

Hyper Tree

www.inxight.com

8th Annual CSIS Research Conference

Excentric Labeling

HCIL – Uni. Maryland

8th Annual CSIS Research Conference

Sound Browsing

• Iterative & Interactive Activity:– Opportunistic & Serendipitous

• Enable users’ to explore a data set

• External & internal properties of objects:– Context & Content

• Evaluate and revise understanding of relationships

8th Annual CSIS Research Conference

14

The Sonic Browser ApplicationAudio: Direct representation of tunes

(exploting the cocktailparty effect)

• Sounds are panned out in a stereo field controlled by the visual location of the tunes nearest to the cursor.

• The volume of the tunes playing concurrently is proportional to the visual distance between the objects and the cursor

8th Annual CSIS Research Conference

16

The Sonic Browser Application

8th Annual CSIS Research Conference

Client – Server Issues

• let the server do the mixing and spatialisation

• analysis and classification on server

• lightweight client - Java.

• different network topologies and protocols.– Latency issues– Use of a floating ‘Aura’

8th Annual CSIS Research Conference

Cue Points

• Use Cue Points as Marker Points– Mark a specific point or section of a sound

• Play only significant portion of sound while browsing

• Reduce time to identify sound by playing characteristic or significant part

• Found in many common sound file formats* Technical Report UL-IDC-01-02

8th Annual CSIS Research Conference

22

Application Platform: HW & OS

• Normal Multimedia PC – (Pentium II/III w. SB Live, etc)

• Server – MS Windows 98/2000

• Client– Any O/S with Java Runtime

8th Annual CSIS Research Conference

Conclusion

• Facilitate different visualisation tools, e.g. for non-Euclidean space.

• Address payment and copyright issues

• Investigate other file types, e.g. MPEG-7.

8th Annual CSIS Research Conference

References (1)

• Brazil, E. (2001). Cue Points: An Examination Of Common Sound File Formats. Limerick, University of Limerick.

• Fekete, J. D., Plaisant, C. (1999). Excentric Labeling: Dynamic Neighborhood Labeling for Data Visualization. Conference on Human factors in Computer Systems, New York, ACM.

• Fernström, M., Brazil, E. (2001). Sonic Browsing: An Auditory Tool For Multimedia Asset Management. International Conference on Auditory Display, Espoo, Finland.

• Ó Maidín, D. and M. Fernström (2000). The Best of Two Worlds: Retrieving and Browsing. COST-G6 Conference on Digital Audio Effects DAFx-00, Verona, Universita degli Studi Verona.

8th Annual CSIS Research Conference

References (2)

• Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. IEEE, Visual Languages, Boulder, CO, USA.

• Zhang, T., Kuo, C.C. (1998). Content-based Classification and Retrieval of Audio. SPIE's 43rd Annual Meeting - Conference on Advanced Signal Processing Algorithms, Architectures, and Implementations VIII, San Diego.

• Zhang, T., Kuo, C.C. (1998). Hierarchical System for Content-Based Audio Classification and Retrieval. SPIE's Conference on Multimedia Storage and Archiving Systems III, Boston.