IEEE International Conference on Acoustics, Speech, …HemantMisra, Shajith Ikbal, Sunil Sivadas,...
Transcript of IEEE International Conference on Acoustics, Speech, …HemantMisra, Shajith Ikbal, Sunil Sivadas,...
2005 IEEE International
Conference on Acoustics, Speech,and Signal Processing
Proceedings
Volume I of V
Speech Processing
March 18-23, 2005
Pennsylvania Convention Center/Marriott Hotel
Philadelphia, Pennsylvania, USA
Sponsored by
The Institute of Electrical and Electronics Engineers
Signal Processing Society
TIB/UB Hannover 89
127 547 649
OIEEE
TABLE OF CONTENTS
Volume I
SP-Ll: VOICE MORPHING
SP-L1.1: POLYGLOTSYNTHESIS USING A MIXTURE OF MONOLINGUAL CORPORA I -1
Javier Latorre, Koji Iwano, Sadaoki Furui, Tokyo Institute ofTechnology, Japan
SP-L1.2: INTRODUCING ROUGHNESS IN INDIVIDUALITY TRANSFORMATION I - 5
THROUGH JITTER MODELING AND MODIFICATION
Asliish Verma, IBM India Research Labs, India; Arun Kumar, CARE, Indian Institute ofTechnology Delhi, India
SP-L1.3: SPECTRAL CONVERSION BASED ON MAXIMUM LIKELIHOOD ESTIMATION I - 9
CONSIDERING GLOBAL VARIANCE OF CONVERTED PARAMETER
Tomoki Toda, Nagoya Institute ofTechnology, Japan; Alan W, Black, Carnegie Mellon University, United Slates; Keiichi Tokuda,
Nagoya Institute ofTechnology, Japan
SP-L1.4: A STUDYON RESIDUAL PREDICTION TECHNIQUES FOR VOICE I -13
CONVERSION
David Suendennann, Antonio Bonafonte, Universitat Polilecnica de Catalunya, Spain; Hermann Ney, RWTH Aachen University,Germany; Harald Hoege, Siemens AG, Germany
SP-L1.S: VOICE FORGERY USING ALISP: INDEXATION IN A CLIENT MEMORY I -17
Patrick Perrol, Guido Aversano, Raphael Blouet, Maurice Charbit, Gerard Chollet, Ecole Nationale Superieure cles
Telecommunications, France
SP-L1.6: AN IMPROVED SPECTRAL AND PROSODIC TRANSFORMATION METHOD IN I - 21
STRAIGHT-BASED VOICE CONVERSION
Long Qin, Gaopeng Chen, Zhenhua Ling, Lirong Dai, University ofScience and Technology ofChina, China
SP-L2: SPOKEN LANGUAGE UNDERSTANDING AND DIALOG
SP-L2.1: INCORPORATING DISCOURSE FEATURES INTO CONFIDENCESCORING OF I - 25
INTENTION RECOGNITION RESULTS IN SPOKEN DIALOGUE SYSTEMS
Ryuichiro Higashinaka, Katsuhito Sudoh, Mikio Nakano, Nippon Telegraph and Telephone Corporation, Japan
SP-L2.2: SEMANTIC INTERPRETATION WITH ERROR CORRECTION I - 29
Christian Raymond, Frideric Bechel, Nathalie Camelin, Renato De Mori, University ofAvignon, France; Geraldine Damnati,
France Telecom R&D, France
SP-L2.3: DIALOG ACT TAGGING USING GRAPHICAL MODELS I - 33
GangJi, JeffBilmes, University of Washington, Seattle, United States
SP-L2.4: A CLARIFICATION ALGORITHM FOR SPOKENDIALOGUE SYSTEMS I - 37
Charles Lewis, Giuseppe Di Fabbrizio, AT&TLabs - Research, United States
SP-L2.5: MODEL ADAPTATION FOR SPOKEN LANGUAGE UNDERSTANDING I - 41
Gokhan Tur, AT&T Labs - Research, United States
SP-L2.6: UNSUPERVISED SEMANTIC INTENT DISCOVERY FROM CALL LOG I - 45
ACOUSTICS
Xiao Li, University of Washington, United States; Asela Gunewardana, AlexAcero, Microsoft Research, United States
xv
SP-L3: SPEECH PERCEPTION AND PSYCHACOUSTICS
SP-L3.1: PROPOSAL ON OBJECTIVE SPEECH QUALITY ASSESSMENTFOR WIDEBAND I - 49
IP TELEPHONY
Chiharu Morioka, Atsuko Kurashima, Akira Takahashi, NTT Sendee Integration Laboratories, Japan
SP-L3.2: NEURAL CELL TYPE RECOGNITION BETWEEN GLOBUS PALLIDUS I - 53
EXTERNUS AND GLOBUS PALLIDUS INTERNUS BY GAUSSIAN MIXTURE MODELING
Qiang Fu, Mark Clements, Georgia Institute ofTechnology, United States; Klaus Mewes, Emory University, United States
SP-L3.3: ANALYSIS OF RELATIONSHIP BETWEEENOVERALL QUALITY AND I - 57
PSYCHOLOGICAL FACTORS AFFECTING HIGH-QUALITY SPEECH COMMUNICATION
SERVICES
Hitoshi Aoki, Akira Takahashi, NTT, Japan
SP-L3.4: CAN YOU UNDERSTAND HIM? LET'S LOOKAT HIS WORD ACCURACY - I - 61
AUTOMATIC EVALUATION OF TRACHEOESOPHAGEAL SPEECH
Maria Schuster, Universitdtsklinikum Erlemgen, Gennany; Elmar Noelh, Tino Haderl, Stefan Sleidl, Anton Batliner, Universitdt
Erlangen-NUmberg, Germany; Frank Rosanowski, Universitdtsklinikum Erlangen, Germany
SP-L3.5: A WARPED BANDWIDTH EXPANSION FILTER I - 65
Marc Boillol, University ofFlorida / Motorola, United Slates; John Harris, University ofFlorida, United Stales
SP-L3.6: RELATIVE ENERGY AND INTELLIGIBILITY OF TRANSIENT SPEECH I - 69
INFORMATION
Sungyub Yoo, J. Robert Boston, John Durrant, Kristie Kovacyk, Stacey Karn, Susein Shaiman, Amro El-Jamudi, Ching-Chung Li,
University ofPittsburgh, United Stales
SP-L4: CONFIDENCE MEASURES AND REJECTION ALGORITHMS
SP-L4.1: REJECTION USING RANK STATISTICS BASED ON HMM STATE SHORTLISTS I - 73
Enrico Bocchieri, Sarangarajan Parthasaralhy, AT&T Labs - Research, United Slates
SP-L4.2: SPEAKER ADAPTIVE CONFIDENCE SCORING USING BAYESIAN COMBINING I - 77
Tae-Yoon Kim, Hanseok Ko, Korea University, Republic of Korea
SP-L4.3: IMPROVING UTTERANCE VERIFICATION USING ADDITIONAL CONFIDENCE I - 81
MEASURES IN ISOLATED SPEECH RECOGNITION INTERFACES
Graham Greenland, Willy Wong, Hans Kunov, University of Toronto, Canada
SP-L4.4: GENERALIZED POSTERIOR PROBABILITY FOR MINIMUM ERROR I - 85
VERIFICATION OF RECOGNIZED SENTENCES
Wai Kit Lo, Frank K. Soong, Spoken Language Translation Research Labs, ATR, Japan
SP-L4.5: ROBUST SPEECH RECOGNITION BY INTEGRATING SPEECH SEPARATION I - 89
AND HYPOTHESIS TESTING
Soundararajan Srinivasan, DeLiang Wang, The Ohio Stale University, United States
SP-L4.6: COMBINATION OF MULTIPLE PREDICTORS TO IMPROVE CONFIDENCE I - 93
MEASURE BASED ONLOCAL POSTERIOR PROBABILITIES
Yuewen Fu, Limin Du, Chinese Academy of Sciences, China
SP-L5: DISCRIMINATIVE TRAINING
SP-L5.1: ADAPTATION OF PRECISION MATRIXMODELS ON LARGE VOCABULARY I - 97
CONTINUOUS SPEECH RECOGNITION
Khe Chai Sim, Mark J. F. Gales, Cambridge University, United Kingdom
xvi
SP-L5.2: DISCRIMINATIVE TRAINING OF CDHMMS FORMAXIMUM RELATIVE I -101
SEPARATION MARGIN
Chaojun Liu, Hui Jiang, Xinwei Li, York University, Canada
SP-L5.3: STATISTICAL PERFORMANCE ANALYSIS OF MCE/GPD LEARNING IN GAUSSIAN I -105
CLASSIFIERS AND HIDDEN MARKOV MODELS
MohamedAfify, BBN Technologies, United States; Xin-Wei Li, Hui Jiang, York University, Canada
SP-L5.4: DISCRIMINATIVETRAINING OF ACOUSTIC MODELS APPLIED TO DOMAINS I -109
WITH UNRELIABLE TRANSCRIPTS
Lambert Mathias, Johns Hopkins University, United States; Girija Yegnanarayanan, Juergen Fritsch, Multimodal Technologies,Inc., United States
SP-L5.5: MINIMUM CLASSIFICATION ERROR FOR LARGE SCALE SPEECH I -113
RECOGNITION TASKS USING WEIGHTED FINITE STATE TRANSDUCERS
Erik McDermott, Shigeru Katagiri, NTT Corporation, Japan
SP-L5.6: DISCRIMINATIVETRAINING BASEDONTHE CRITERION OF LEAST PHONE I -117
COMPETING TOKENS FOR LARGE VOCABULARY SPEECH RECOGNITION
Bo Liu, University ofSci. & Tech. ofChina, China; Hui Jiang, York University, Canada; Jian-Lai Zhou, Microsoft Reseach Asia,
China; Ren-Hua Wang, University ofSci. & Tech. of China, China
SP-L6: QUANTIZATION AND QUALITY MEASUREMENT
SP-L6.1: MULTI-FRAME GMM-BASED BLOCK QUANTISATION OF LINE SPECTRAL I -121
FREQUENCIES FOR WIDEBAND SPEECH CODING
Stephen So, Kuldip K. Paliwal, Griffith University, Australia
SP-L6.2: NON-INTRUSIVE GMM-BASED SPEECH QUALITY MEASUREMENT I -125
Tiago Folk, Qingfeng Xu, Wai-Yip Chan, Queen's University, Canada
SP-L6.3: A MULTIPLE-DESCRIPTION PCM SPEECH CODER USING STRUCTURED I -129
DUAL VECTOR QUANTIZERS
Stephen Voran, Institute for Telecommunication Sciences, United States
SP-L6.4: A NEW SEGMENT QUANTIZER FOR LINE SPECTRAL FREQUENCIES USING I -133
LEMPEL-ZIV ALGORITHM
Minoru Kohata, Chiba Institute of Technology, Japan; Motoyuki Suzuki, Shozo Makino, Tohoku University, Japan
SP-L6.5: PREDICTIVE VQFORBANDWIDTH SCALABLE LSP QUANTIZATION I -137
Hiroyuki Ehara, Toshiyuki Morii, Masahiro Oshikiri, Koji Yoshida, Matsushita Electric Industrial Co., Ltd., Japan
SP-L6.6: CODING WITH SIDE INFORMATION TECHNIQUES FORLSF I -141
RECONSTRUCTION IN VOICE OVER IP
Yannis Agiomyrgiannakis, Foundation ofResearch and Technology Hellas, Greece; Yannis Stylianou, University of Crete,
Greece
SP-L7: SPEECH ENHANCEMENT WITH NOISE REDUCTION
SP-L7.1: SIGNAL SUBSPACE SPEECHENHANCEMENT FOR AUDIBLE NOISE I -145
REDUCTION
Changhuai You, SooNgee Koh, Nanyang Technological University, Singapore; Susanto Rahardja, Institutefor Infocomm
Research, Singapore
SP-L7.2: A WAVELET KALMAN FILTER WITHPERCEPTUAL MASKING FOR SPEECH I -149
ENHANCEMENT IN COLORED NOISE
Ning Ma, Martin Bouchard, University ofOttawa, Canada; Rafik A. Goubran, Carleton University, Canada
x\m
SP-L7.3: ADAPTIVE TIME SEGMENTATION OF NOISY SPEECH FOR IMPROVED I -153
SPEECH ENHANCEMENT
Richard Christian Hendriks, Richard Heusdens, Jesper Jensen, Delft University ofTechnology, Netherlands
SP-L7.4: SPEECHENHANCEMENT USING HARMONIC REGENERATION I -157
Cyril Plapous, Claude Marro, France Telecom, France; Pascal Scedart, ENSSAT, France
SP-L7.5: INSTANT NOISE ESTIMATION USING FOURIERTRANSFORM OFAMDFAND I -161
VARIABLE START MINIMA SEARCH
Zhong Lin, RafikA. Goubran, Carlelon University, Canada
SP-L7.6: SPEECH ENHANCEMENT BASED ON SPEECH SPECTRAL COMPLEX GAUSSIAN I -165
MIXTURE MODEL
Guo-Hong Ding, Xia Wang, Yang Cao, Feng Ding, Yuezhong Tang, Nokia Research Center, Beijing, China
SP-L8: SPEAKER RECOGNITION USING ACOUSTIC AND HIGHER LEVEL FEATURES
SP-L8.1: IMPROVED PHONETIC SPEAKER RECOGNITION USING LATTICE DECODING I -169
Andrew Hatch, Barbara Peskin, International Computer Science Institute, United States; Andreas Stolcke, SRI International,
United States
SP-L8.2: SRFS 2004 NIST SPEAKER RECOGNITION EVALUATION SYSTEM I -173
Sachin Kajarekar, Luciano Ferrer, Elizabeth Shriberg, Kemal Sonmez, Andreas Stolcke, Anand Venkataraman, Jing Zheng, SRI
International, United States
SP-L8.3: THE 2004 MIT LINCOLN LABORATORY SPEAKER RECOGNITION SYSTEM I -177
Douglas Reynolds, William Campbell, Terry Gleason, Carl Quillen, Douglas Sturim, Pedro Torres-Carrasquillo, MIT Lincoln
Leiboratory, United States; Andre Adami, Oregon Health & Science University, United States
SP-L8.4: SPEAKER VERIFICATION USING ADAPTED ARTICULATORY FEATURE-BASED I -181
CONDITIONAL PRONUNCIATION MODELING
Ka-Yee Leung, Man-Wai Mak, Hong Kong Polytechnic University, Hong Kong SAR ofChina; Manhitng Siu, Hong KongUniversity ofScience and Technology, Hong Kong SAR ofChina; Sun-Yuan Kung, Princeton University, United Stales
SP-L8.5: PROSODY MODELING AND EIGEN-PROSODY ANALYSIS FORROBUST I -185
SPEAKER RECOGNITION
Zi-He Chen, National Central University, Taiwan; Yuan-Fu Liao, National Taipei University ofTechnology, Taiwan; Yau-TarngJuang, National Central University, Taiwan
SP-L8.6: PROSODIC MODELING FOR SPEAKER RECOGNITION BASED ON SUB-BAND I -189
ENERGY TEMPORAL TRAJECTORIES
Andre Adami, University ofCaxias do Sul, Brazil
SP-L9: LARGE VOCABULARY ASR
SP-L9.1: SUB-PHONETIC POLYNOMIAL SEGMENT MODEL FOR LARGE VOCABULARY I -193
CONTINUOUS SPEECH RECOGNITION
Siu-Kei Au Yeung, Chak-Fai Li, Man-Hung Siu, Hong Kong University ofScience and Technology, Hong Kong SAR ofChina
SP-L9.2: CONTRUCTING ENSEMBLES OF ASR SYSTEMS USING RANDOMIZED I -197
DECISION TREES
Olivier Siohan, Bhuvana Ramabhadran, Brian Kingsbury, IBM T. J. Watson Research Center, United Stales
SP-L9.3: EFFICIENT GENERATION OF HIGH-ORDER CONTEXT-DEPENDENT I - 201
WEIGHTED FINITE STATETRANSDUCERS FOR SPEECHRECOGNITION
Mike Schuster, Takaaki Hori, NTT Corporation, Japan
xviu
SP-L9.4: THE IBM 2004 CONVERSATIONAL TELEPHONY SYSTEM FOR RICH I - 205
TRANSCRIPTION
Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Soon, Geoffrey Zweig, IBM, United States
SP-L9.5: TRAINING LVCSR SYSTEMS ON THOUSANDS OF HOURS OF DATA I - 209
Gunnar Evermann, Ho Yin Chan, MarkJ. F. Gales, Bin Jia, DavidMrva, Phil Woodland, Kai Yu, Cambridge University, United
Kingdom
SP-L9.6: LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS I - 213
HOPKINS SUMMER WORKSHOP
Mark Hasegawa-Johnson, University ofIllinois, United States; James Baker, Carnegie Mellon University, United States; Sarah
Borys, University ofIllinois, United States; Ken Chen, University ofCalifornia, San Diego, UnitedStates; Emily Coogan,
University ofIllinois, United States; Steven Greenberg, University ofCalifornia, Berkeley, United States; AmitJuneja, University
ofMaryland, United States; Katrin Kirchhqff, University of Washington, United States; Karen Livescu, Massachusetts Institute of
Technology, United States; Srividya Mohan, Johns Hopkins University, United States; Jennifer Mutter, Department ofDefense,United Stales; Kemal Sonmez, SRI International, United States; Tianyu Wang, Georgia Institute ofTechnology, United States
SP-L10: NOVEL METHODS FOR SPEECH ANALYSIS
SP-L10.1: SPEECH ANALYSIS BY ESTIMATING PERCEPTUALLY RELEVANT POLE I - 217
LOCATIONS
Venkatraman Atli, Andreas Spanias, Arizona State University, United States
SP-L10.2: COHERENTENVELOPE DETECTION FOR MODULATION FILTERING OF I - 221
SPEECH
Steven Schimmel, Les Atlas, University of Washington, United Slates
SP-L10.3: SPEECHSIGNAL ANALYSIS WITH EXPONENTIALAUTOREGRESSIVE MODEL I - 225
Kentaro Ishizuka, Hiroko Kato, Tomohiro Nakatani, NTT Corporation, Japan
SP-L10.4: COMPARISON OF AUTOREGRESSIVEPARAMETER ESTIMATION ALGORITHMS 1-229
FORSPEECH PROCESSING AND RECOGNITION
Robert Morris, Jon Arrowood, Nexidia Inc., United States; Mark Clements, Georgia Institute of Technology, United States
SP-L10.5: ANALGORITHMFOR LOCATING FUNDAMENTAL FREQUENCY MARKERS IN I - 233
SPEECH SIGNALS
Princy Dikshit, Stephen Zahorian, Shivaram Nagulapati, Old Dominion University, United States
SP-L10.6: AN AUTOREGRESSIVE, NON-STATIONARY EXCITED SIGNAL PARAMETER I - 237
ESTIMATION METHOD AND AN EVALUATION OF A SINGING-VOICE RECOGNITION
Akira Sasou, Masataka Goto, Natl. Inst, ofAdv. lnd. Sci. & Technology (AIST), Japan; Satoru Hayamizu, Gifu University, Japan;
Kazuyo Tanaka, University' ofTsukuba, Japan
SP-L11: NOISE ROBUST SPEECH RECOGNITION
SP-Lll.l: STATIC AND DYNAMIC SPECTRAL FEATURES: THEIR NOISE ROBUSTNESS I - 241
AND OPTIMAL WEIGHTS FOR ASR
Chen Yang, The Chinese University ofHong Kong, Hong Kong SAR ofChina; Frank K. Soong, Spoken Language Translation
Labs, ATR, Japan; Tan Lee, The Chinese University ofHong Kong, Hong Kong SAR ofChina
SP-L11.2: LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH I - 245
RECOGNITION
Weizhong Zhu, INRS-EMT, University ofQuebec, Canada; Douglas O'Shaughnessy, University ofQuebec, Canada
SP-L11.3: A COMPANDING FRONT END FOR NOISE-ROBUST AUTOMATIC SPEECH I - 249
RECOGNITION
Jethran Guinness, Bhiksha Raj, Bent Schmidt-Nielsen, Mitsubishi Electric Research Laboratories, United Slates; Lorenzo
Turicchia, Rahul Sarpeshkar, Massachusetts Institute of Technology, United States
xix
SP-L11.4: MULTI-RESOLUTION SPECTRAL ENTROPY FEATUREFOR ROBUST ASR I - 253
Hemant Misra, Shajith Ikbal, Sunil Sivadas, Herve Bourlard, IDIAP Research Institute, Switzerland
SP-L11.5: PARTICLE FILTER BASED NON-STATIONARY NOISE TRACKING FORROBUST I - 257
SPEECHRECOGNITION
Masakiyo Fujimolo, Satoshi Nakamura, ATR Spoken Language Translation Research Labs, Japan
SP-L11.6: ONLINE CEPSTRAL FILTERING USING A SEQUENTIAL EM APPROACH WITH I - 261
POLYAKAVERAGING AND FEEDBACK
Tor Andre Myivoll, Norwegian University ofScience and Technology, Nonvay; Satoshi Nakamura, SLTLaboratory, ATR, Japan
SP-P1: PROSODY AND SPEECH SYNTHESIS
SP-P1.1: IMPROVING THE UNDERSTANDABILITY OF SPEECH SYNTHESIS BY I - 265
MODELING SPEECH IN NOISE
Brian Langner, Alan W. Black, Carnegie Mellon University, United States
SP-P1.2: AN AUTOMATIC PROSODY RECOGNIZER USING A COUPLED MULTI-STREAM I - 269
ACOUSTIC MODEL ANDA SYNTACTIC-PROSODIC LANGUAGE MODEL
Sankaranarayanan Aneinthakrishnan, Shrikanth Narayanan, University ofSouthern California, United States
SP-P1.3: FO CONTROL CHARACTERIZATION BY PERCEPTUAL IMPRESSIONS ON I - 273
SPEAKING ATTITUDES USING MULTIPLE DIMENSIONAL SCALING ANALYSIS
Yoko Kokenawa, Waseda University, Japan; Minoru Tsuzaki, Kyoto City University' ofArts, Japan; Hiroaki Kato, ATR Human
Information Science Labs, Japan; Yoshinori Sagisaka, Waseda University, Japan
SP-P1.4: ADDITIVE MODELING OF ENGLISH FO CONTOUR FOR SPEECH SYNTHESIS I - 277
Shinsuke Sakai, Massachusetts Institute ofTechnology, United Stales
SP-P1.5: PROSODY ANALYSIS AND MODELING FOR EMOTIONALSPEECH SYNTHESIS I - 281
Dan-ning Jiang, Tsinghua University, China; Wei Zhang, Li-qin Shen, IBM China Research Lab, China; Lian-hong Cai,
Tsinghua University, China
SP-P1.6: SLIDING WINDOW SMOOTHING FORMAXIMUM ENTROPY BASED I - 285
INTONATIONAL PHRASE PREDICTION IN CHINESE
Jian-Feng Li, Guo-Ping Hit, Ren-Hua Wang, Li-Rong Dai, University ofScience and Technology of China, China
SP-P1.7: IDENTIFICATION AND SYNTHESIS OF CANTONESE TONES BASED ON THE I - 289
COMMAND-RESPONSE MODELFOR FO CONTOUR GENERATION
Wenlao Gu, Shanghai Jiaotong University, China; Keikichi Hirose, Hiroya Fujisaki, University of Tokyo, Japan
SP-P1.8: COMPRESSION OF EXCEPTION LEXICONS FOR SMALL FOOTPRINT I - 293
GRAPHEME-TO-PHONEME CONVERSION
Joram Meron, Peter Veprek, Panasonic Digital Networking Lab, United States
SP-P1.9: PREDICTION OF PRONUNCIATION VARIATIONS FOR SPEECH SYNTHESIS: A I - 297
DATA-DRIVEN APPROACH
Christina Bennett, Alan W. Black, Carnegie Mellon University, United States
SP-P1.10: RECORDING SCRIPT DESIGN FOR CORPUS-BASED TTS SYSTEM BASED ON I - 301
COVERAGE OF VARIOUS PHONETIC ELEMENTS
Mitsuaki Isogai, Hideyuki Mizuno, Kazunori Memo, NTT Cyber Space Laboratories, NTT Corporation, Japan
SP-P1.11: OPTIMAL SUBSET SELECTIONFROM TEXT DATABASES I - 305
Jilei Tian, Jcmi Nurminen, Imre Kiss, Nokia Research Center, Finland
SP-P1.12: COMPARATIVE STUDY OF AUTOMATIC PHONE SEGMENTATION METHODS I - 309
FOR TTS
Jordi Aclell, Antonio Bonafonle, Universitat Polite'cnica de Catalunya, Spain; Jon Ander Gdmez., Maria Jose Castro, Universitat
Politecnica de Valencia, Spain
xx
SP-P2: GENERAL TOPICS IN ASR
SP-P2.1: INCREASED ROBUSTNESS AGAINST BIT ERRORS FOR DISTRIBUTED SPEECH I - 313
RECOGNITION IN WIRELESS ENVIRONMENTS
Brian Delaney, Georgia Institute of Technology, United States
SP-P2.2: "OF ALL THINGS THE MEASURE IS MAN": AUTOMATIC CLASSIFICATION OF I - 317
EMOTIONS AND INTER-LABELER CONSISTENCY
Stefan Steidl, Michael Levit, Anton Ballmer, ElmarNoth, HeinrichNiemann, University ofErlangen, Germany
SP-P2.3: DISORDERED SPEECHEVALUATION USING OBJECTIVE QUALITY MEASURES I - 321
Lingyun Gu, John Harris, Rahul Shrivastav, Christine Sapienza, University ofFlorida, United States
SP-P2.4: META-CLASSIFIERS IN ACOUSTIC AND LINGUISTIC FEATURE FUSION-BASED I - 325
AFFECT RECOGNITION
Bjorn Schuller, Raquel Jimenez Villar, Gerhard Rigoll, Manfred Lang, Technische Universitat Munchen, Germany
SP-P2.5: PACKET LOSS CONCEALMENT BASED ONVQ REPLICAS AND MMSE I - 329
ESTIMATION APPLIEDTO DISTRIBUTED SPEECH RECOGNITION
Antonio M. Peinado, Angel M. Gomez, Victoria E. Sanchez, Jose L. Perez-Cordoba, Antonio J. Rubio, Universidad de Granada,
Spain
SP-P2.6: A COMPARISON OF SOFT-FEATUREDISTRIBUTED SPEECH RECOGNITION I - 333
WITH CANDIDATE CODECS FOR SPEECH ENABLED MOBILE SERVICES
Valentin Ion, Reinhold Haeb-Umbach, University ofPaderborn, Germany
SP-P2.7: A HIDDEN TRAJECTORY MODEL WITH BI-DIRECTIONAL TARGET-FILTERING: I - 337
CASCADED VS. INTEGRATED IMPLEMENTATIONFOR PHONETIC RECOGNITION
Li Deng, Xiang Li, Dong Yu, Alex Acero, Microsoft Research, United States
SP-P2.8: A COMPARISON OF CLASSIFIERS FOR DETECTINGEMOTION FROM SPEECH I - 341
Izhak Shafran, Johns Hopkins University, United States; Mehryar Mohri, New York University, United States
SP-P2.9: SOFT DECODING OF TEMPORAL DERIVATIVES FOR ROBUST DISTRIBUTED I - 345
SPEECHRECOGNITION IN PACKET LOSS
Alastair James, Ben Milnet; University ofEastAnglia, United Kingdom
SP-P2.10: DBN-BASED MULTI-STREAM MODELS FOR MANDARIN TONEME I - 349
RECOGNITION
Xin Lei, Gang Ji, Tim Ng, JeffBilmes, Mari Ostendorf, University of Washington, United States
SP-P2.11: SPARSE KPCA FOR FEATURE EXTRACTION IN SPEECH RECOGNITION I - 353
Amaro Lima, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura, Nagoya Institute ofTechnology, Japan;
Fernando Gil Resende, Federal University ofRio de Janeiro, Brazil
SP-P2.12: EFFECTS OF PHONEME CHARACTERISTICS ON TEO FEATURE-BASED I - 357
AUTOMATIC STRESS DETECTION IN SPEECH
Evan Ruzanski, University ofColorado, United States; John H. L. Hansen, University ofColorado, Boulder, United States; James
LMeyerhoff, George Saviolakis, Michael Koenig, Walter ReedArmy Institute ofResearch, United States
SP-P3: SPEECH ANALYSIS AND SYNTHESIS
SP-P3.1: SCALABLE CONCATENATIVE SPEECH SYNTHESIS BASED ON THE PLURAL I - 361
UNIT SELECTION AND FUSION METHOD
Masalsune Tamura, Tatsuya Mizutani, Takehiko Kagoshima, Toshiba Corporation, Japan
SP-P3.2: ADAPTIVE TRAINING FOR HIDDEN SEMI-MARKOV MODEL I - 365
Junichi Yamagishi, Takao Kobayashi, Tokyo Institute ofTechnology, Japan
XXI
SP-P3.3: PERCEPTUALLY WEIGHTED LONG TERM MODELING OF SINUSOIDAL I - 369
SPEECH AMPLITUDE TRAJECTORIES
Mohammad Firouzmand, Laurent Girin, INPG, France
SP-P3.4: SPEECH RECOGNITION IN THE BLIND CONDITION BASED ON MULTIPLE I - 373
DIRECTIVITY PATTERNS USING A MICROPHONE ARRAY
Toshiyuki Sekiya, Tetsunori Kobayashi, Waseda University, Japan
SP-P3.5: AN UNSUPERVISED QUANTITATIVE MEASURE FOR WORD PROMINENCE IN I - 377
SPONTANEOUS SPEECH
Dagen Wang, Shrikanth Narayanan, USC Viterbi School ofEngineering, United States
SP-P3.6: SPEECH MODELLING BASED ON GENERALIZED GAUSSIAN PROBABILITY I - 381
DENSITY FUNCTIONS
Kostas Kokkinakis, Asoke K. Nandi, University ofLiverpool, United Kingdom
SP-P3.7: BAYESIAN MODEL BASED NON-INTRUSIVE SPEECH QUALITY EVALUATION I - 385
Guo Chen, VijayParsa, University ofWestern Ontario, Canada
SP-P3.8: ROBUST PITCH ESTIMATION AT VERY LOW SNR EXPLOITING TIME AND I - 389
FREQUENCY DOMAIN CUES
Celia Shahnaz, Wei-Ping Zhu, M. Omair Ahmad, Concordia University, Canada
SP-P3.9: FUNDAMENTAL FREQUENCY ESTIMATION AND VOCAL TREMOR ANALYSIS BY I - 393
MEANS OF MORLET WAVELET TRANSFORMS
Laurence Cnockaerl, Francis Grenez, Jean Schoentgen, University Libre de Bruxelles, Belgium
SP-P3.10: AUTOMATIC SPEECH SEGMENTATION USING AVERAGE LEVEL CROSSING I - 397
RATE INFORMATION
Anindya Sarkar, T. V. Sreenivas, Indian Institute of Science, India
SP-P3.11: DWT-BASED PHONETIC GROUPS CLASSIFICATION USING NEURAL I - 401
NETWORKS
Van Titan Pham, Gemot Kubin, University of Technology, Graz, Austria
SP-P3.12: A NOVEL KLTALGORITHM OPTIMIZED FOR SMALL SIGNAL SETS I - 405
Francesco Gianfelici, Giorgio Biagetti, Paolo Crippa, Claudio Turchetli, Universita Politecnica delle Marche, Italy
SP-P3.13: VOICING-STATE CLASSIFICATION OF CO-CHANNEL SPEECH USING I - 409
NONLINEAR STATE-SPACE RECONSTRUCTION
Yasser Mahgoub, Richard Dansereau, Carleton University, Canada
SP-P3.14: SPEECH RATE ESTIMATION VIA TEMPORAL CORRELATION AND SELECTED I - 413
SUB-BAND CORRELATION
Shrikanth Narayanan, Dagen Wang, USC Viterbi School ofEngineering, United States
SP-P4: MODEL-BASED ROBUST SPEECH RECOGNITION
SP-P4.1: CLOSELY COUPLED ARRAY PROCESSING AND MODEL-BASED I - 417
COMPENSATION FOR MICROPHONE ARRAY SPEECH RECOGNITION
Xianyu Zhao, Zhijian On, Minima Chen, Zuoying Wang, Tsinghua University, China
SP-P4.2: CONTEXT-DEPENDENTDURATION MODELING I - 421
Daniel Willett, Temic Speech Dialog Systems, Germany
SP-P4.3: RECOGNISING SPEECH IN THE PRESENCE OF A COMPETING SPEAKER I - 425
USING A SPEECHFRAGMENT DECODER'
Andre" Coy, Jon Barker, University ofSheffield, United Kingdom
xxu
SP-P4.4: AN ENVIRONMENT COMPENSATED MAXIMUM LIKELIHOOD TRAINING I - 429
APPROACHBASED ON STOCHASTICVECTOR MAPPING
Jian Wu, Microsoft Corp., United States; Qiang Huo, Donglai Zhu, University ofHong Kong, Hong Kong SAR of China
SP-P4.5: EFFECT OF PHASE-SENSITIVE ENVIRONMENT MODELAND HIGHER ORDER I - 433
VTS ON NOISY SPEECH FEATURE ENHANCEMENT
Veronique Stouten, Hugo Van hamme, Patrick Wambacq, Katholieke Universiteit Leuven, Belgium
SP-P4.6: TOWARDS SPEECHRECOGNITION ORIENTED DEREVERBERATION I - 437
Pamornpol Jinachitra, Stanford University, United States; Ramon Prieto, Toyota InfoTechnology Center U.S.A., United States
SP-P4.7: NOISY SPEECHRECOGNITION BASED ONROBUST END-POINT DETECTION I - 441
ANDMODELADAPTATION
Zhipeng Zhang, NTT DoCoMo, Japan; Sadaoki Fund, Tokyo Institute ofTechnology, Japan
SP-P4.8: ANALYSIS OF A LARGE IN-CAR SPEECH CORPUS AND ITS APPLICATION TO THE I - 445
MULTIMODEL ASR
Hiroshi Fujimura, Chiyomi Miyajima, Katsunobu Itou, Kazuya Takeda, Nagoya University, Japan; Fumilada Itakura, Meijo
University, Japan
SP-P4.9: BUILDING AN EFFECTIVE CORPUS BY USING ACOUSTIC SPACE I - 449
VISUALIZATION (COSMOS) METHOD
Goshu Nagino, Makoto Shozakai, Asahi Kasei Corporation, Japan
SP-P4.10: HMM/ANN BASED SPECTRAL PEAK LOCATION ESTIMATION FOR NOISE I - 453
ROBUST SPEECH RECOGNITION
Shajith Ikbeil, Heiye Bourlard, Mathew Magimai.-Doss, 1DIAP Research Institute, Switzerland
SP-P4.11: ACOUSTIC FEATURE COMBINATION FOR ROBUST SPEECH RECOGNITION I - 457
Andrds Zolnay, RalfSchlueter, Hermann Ney, RWTH-Aachen Germany, Germany
SP-P4.12: ACOUSTIC TRAINING FROM HETEROGENEOUS DATA SOURCES: I - 461
EXPERIMENTS IN MANDARIN CONVERSATIONAL TELEPHONE SPEECH TRANSCRIPTION
Stavros Tsakalidis, Johns Hopkins University, United States; William Byrne, Cambridge University, United Kingdom
SP-P5: SPEECHMINING AND AUDIO-VISUAL INFORMATION PROCESSING
SP-P5.1: DYNAMICMATCH PHONE-LATTICE SEARCHES FORVERY FAST AND I - 465
ACCURATEUNRESTRICTED VOCABULARY KEYWORD SPOTTING
Kishan Thambiralnam, Sridha Sridharan, Queensland University ofTechnology, Australia
SP-P5.2: A STREAM-WEIGHT OPTIMIZATION METHOD FOR MULTI-STREAM HMMS I - 469
BASED ON LIKELIHOOD VALUE NORMALIZATION
Satoshi Tamura, Koji Iwano, Sadaoki Furui, Tokyo Institute of Technology, Japan
SP-P5.3: LIP READING FOR ROBUST SPEECH RECOGNITION ONEMBEDDED I - 473
DEVICES
Jesus Fernando Guitarte Perez, Siemens AG, Corporate Technology, Germany; Alejandro F. Frangi, Pompeii Fabra University,
Spain; Eduardo Lleida Solano, University ofZaragoza, Spain; Klaus Lukas, Siemens AG, Corporate Technology, Spain
SP-P5.4: NOVEL TECHNIQUES FOR TIME-COMPRESSING SPEECH: AN I - 477
EXPLORATORY STUDY
Simon Tucker, Steve Whittaker, University of Sheffield, United Kingdom
SP-P5.5: FAST TWO-STAGE VOCABULARY-INDEPENDENT SEARCH IN SPONTANEOUS I - 481
SPEECH
Peng Yu, Frank Seide, Microsoft Research Asia, China
JOT/7
SP-P5.6: AN HMM-BASED TEXT SEGMENTATION METHOD USING VARIATIONALBAYES I - 485
APPROACH AND ITS APPLICATION TO LVCSRFOR BROADCAST NEWS
Takafumi Koshinaka, Ken-ichi Iso, Akitoshi Okumura, NEC Corporation, Japan
SP-P5.7: DETECTING GROUP INTEREST-LEVEL IN MEETINGS I - 489
Daniel Gatica-Perez, Iain McCowan, Dong Zhang, Sainy Bengio, 1DIAP Research Institute, Switzerland
SP-P5.8: SEMANTIC DATA MINING OF SHORT UTTERANCES I - 493
Lee Begeja, AT&TLabs - Research, United States; Harris Drucker, Monmouth University, United States; David Gibbon, Patrick
Haffner, Zhu Liu, Bernard Renger, Behzad Shahraray, AT&TLabs - Research, United States
SP-P5.9: AUTOMATIC PROCESSING OF AUDIO LECTURES FOR INFORMATION I - 497
RETRIEVAL: VOCABULARY SELECTION AND LANGUAGEMODELING
Alex Park, Timothy Hazen, James Glass, MIT CSAIL, United States
SP-P5.10: BLIND CHANGEDETECTIONFOR AUDIO SEGMENTATION I - 501
Mohamed Omar, Upendra Chaudhari, Ganesh Ramaswamy, IBM, United States
SP-P5.11: COMBINING MULTIPLE SUBWORD REPRESENTATIONS FOR I - 505
OPEN-VOCABULARY SPOKEN DOCUMENTRETRIEVAL
Shi-wook Lee, National Institute ofAlST, Japan; Kazuyo Tanaka, University ofTsukuba, Japan; Yoshiaki Itoh, lwate Prefectural
University, Japan
SP-P5.12: ROBUST LIP-MOTION FEATURES FOR SPEAKER IDENTIFICATION I - 509
Hasan Ertan Cetingiil, Yucel Yemez, Engin Erzin, A. Murat Tekalp, Koc University, Turkey
SP-P6: FEATURE-BASED ROBUST SPEECH RECOGNITION
SP-P6.1: VARIATIONALBAYESIAN FEATURE SALIENCY FOR AUDIO TYPE I - 513
CLASSIFICATION
Fabio Valenle, Christian Wellekens, Eurecom Institute, France
SP-P6.2: PITCH-SYNCHRONOUS ZCPA (PS-ZCPA)-BASED FEATURE EXTRACTION WITH I - 517
AUDITORY MASKING
Muhammad Ghulam, Takashi Fukiida, Junsei Horikawa, Tsuneo Nitta, Toyohashi University of Technology, Japan
SP-P6.3: MFCC COMPENSATION FOR IMPROVED RECOGNITION OF FILTERED AND I - 521
BAND-LIMITED SPEECH
Nicolas Morales, Universidad Autonoma de Madrid, Spain; John H. L. Hansen, University of Colorado, Boulder, United Slates;
Doroteo T. Toleelano, Universidad Autonoma de Madrid, Spain
SP-P6.4: SPEECH FEATURE SMOOTHING FOR ROBUST ASR I - 525
Chia-Ping Chen, JeffBilmes, University ofWashington, United States; Daniel Ellis, Columbia University, United States
SP-P6.5: ON DESENSITIZING THE MEL-CEPSTRUM TO SPURIOUS SPECTRAL I - 529
COMPONENTSFOR ROBUST SPEECH RECOGNITION
Vivek Tyagi, Christian Wellekens, Institute Eurecom, France
SP-P6.6: TWO-STAGENOISE SPECTRA ESTIMATION AND REGRESSION BASED IN-CAR I - 533
SPEECHRECOGNITIONUSING SINGLE DISTANT MICROPHONE
Weifeng Li, Katunobu Itou, Kazuya Takeda, Nagoya University, Japan; Fumitaela Itakura, Meijo University, Japan
SP-P6.7: MASK ESTIMATION BASED ON SOUND LOCALISATION FOR MISSING DATA I - 537
SPEECH RECOGNITION
Sue Harding, Jon Barker, Guy J. Brown, University ofSheffield, United Kingdom
SP-P6.8: SPEECH PROCESSING USING JOINT FEATURES DERIVED FROM THE I - 541
MODIFIED GROUP DELAY FUNCTION
Rajesh Hegde, Hema Murthy, Indian Institute of Technology, India; Gaclde V. Ramana Rao, SRI International, United States
xxiv
SP-P6.9: INFLUENCE OF AUTOCORRELATION LAG RANGES ON ROBUST SPEECH I - 545
RECOGNITION
Benjamin J. Shannon, Kuldip K. Paliwal, Griffith University, Australia
SP-P6.10: SUBSPACE-BASED SPEAKER-INDEPENDENTVOWEL RECOGNITION I - 549
R. Muralishankar, Douglas O'Shaughnessy, University of Quebec, Canada
SP-P6.11: ROBUSTSPEECH RECOGNITION BASED ON SPECTRAL ADJUSTING AND I - 553
WARPING
Rui Zhao, Zuoying Wang, Tsinghua University, China
SP-P6.12: ROBUST SPEECH ACTIVITY DETECTION USING LDA APPLIED TO FF I - 557
PARAMETERS
Jaume Padrell, Dusan Macho, Climent Nadeu, Universitat Politecnica de Catalunya, Spain
SP-P7: LANGUAGE MODELING AND IDENTIFICATION
SP-P7.1: JOINT DISCRIMINATIVE LANGUAGE MODELING AND UTTERANCE I - 561
CLASSIFICATION
Murat Saraclar, AT&TLabs - Research, United States; Brian Roark, OG1 at Oregon Health & Science University, United States
SP-P7.2: LANGUAGEMODEL ESTIMATION FOR OPTIMIZING END-TO-END I - 565
PERFORMANCE OF ANATURAL LANGUAGE CALL ROUTING SYSTEM
Vaibhava Goel, IBM, United States; Hong-Kwang (Jeff) Kuo, IBM T. J. Watson Research Center, United States; Sabine Deligne,
Cheng Wu, IBM, United States
SP-P7.3: LANGUAGE IDENTIFICATION USING PHONETIC AND PROSODIC HMMS WITH I - 569
FEATURENORMALIZATION
Yasunari Obuchi, Nobuo Sato, Hitachi Ltd., Japan
SP-P7.4: RAPID LANGUAGE MODEL DEVELOPMENTUSING EXTERNAL RESOURCES I - 573
FORNEW SPOKEN DIALOG DOMAINS
Ruhi Sarikaya, IBM T. J. Watson Research Center, United States; Agustin Gravano, Columbia University, United States; YuqingGao, IBM T. J. Watson Research Center, United States
SP-P7.5: USING LOCAL & GLOBAL PHONOTACTICFEATURES IN CHINESE DIALECT I - 577
IDENTIFICATION
Boon Pang Lim, Haizhou Li, Bin Ma, Institutefor Infocomm Research, Singapore
SP-P7.6: RANDOM CLUSTERINGS FOR LANGUAGEMODELING I - 581
Ahmad Emami, Frederick Jelinek, Johns Hopkins University, United States
SP-P7.7: DIALECT/ACCENT CLASSIFICATION VIA BOOSTED WORDMODELING I - 585
Rongqing Huang, University of Colorado at Boulder, United States; John H. L. Hansen, University ofColorado, Boulder, United
States
SP-P7.8: WEB-DATA AUGMENTED LANGUAGEMODELS FOR MANDARIN I - 589
CONVERSATIONAL SPEECHRECOGNITION
Tim Ng, Mari Ostendoif, Mei-Yuh Hwang, University of Washington, United States; Manhung Siu, Hong Kong University of
Science and Technology, Hong Kong SAR ofChina; Ivan Bulyko, Xin Lei, University of Washington, UnitedStates
SP-P7.9: AN EFFICIENT ALGORITHM FOR CLUSTERING SHORT SPOKEN I - 593
UTTERANCES
Zhu Liu, AT&T Labs - Research, United States
SP-P7.10: MAXIMUM ENTROPY BASED GENERIC FILTER FORLANGUAGE MODEL I - 597
ADAPTATION
Dong Yu, Mil'md Mahajan, Peter Mau, Alex Acero, Microsoft Research, United States
xx\i
SP-P7.11: LANGUAGE IDENTIFICATION USING PITCH CONTOUR INFORMATION I - 601
Chi-Yueh Lin, Hsiao-Chuan Wang, National Tsing Hua University, Taiwan
SP-P7.12: INTEGRATING MULTIPLE LAYERS OF CONCEPT INFORMATION INTO I - 605
N-GRAM MODELING FOR SPOKEN LANGUAGE UNDERSTANDING
Nick J.-C. Wang, Delta Electronics, Inc., Taiwan
SP-P7.13: AUTOMATIC LANGUAGE IDENTIFICATION USING ERGODIC HMM I - 609
S. A. SantoshKumar, V. Ramasubramanian, Indian Institute ofScience, India
SP-P8: TEXT-INDEPENDENT SPEAKER RECOGNITION
SP-P8.1: DISCRIMINATIVEPOWER OF TRANSIENTFRAMES IN SPEAKER I - 613
RECOGNITION
Jerdme Louradour, Khalid Daoudi, Regine Andrd-Obrecht, IRIT - University Paul Sabatier, France
SP-P8.2: SPEAKER IDENTIFICATION IN UNKNOWN NOISY CONDITIONS - A I - 617
UNIVERSAL COMPENSATION APPROACH
Ji Ming, Danyl Stewart, Queen's University Belfast, United Kingdom; Saeeel Vaseghi, Brunei University, United Kingdom
SP-P8.3: EXTRACTING ADDITIONAL INFORMATION FROM GAUSSIAN MIXTURE MODEL I - 621
PROBABILITIES FOR IMPROVED TEXT-INDEPENDENT SPEAKER IDENTIFICATION
Balakrishnan Narayanaswamy, Carnegie Mellon University, United States; Rashmi Gangadharaiah, Indian Institute ofScience,India
SP-P8.4: COMBINING SELECTION TREE WITH OBSERVATION REORDERING I - 625
PRUNING FOR EFFICIENT SPEAKER IDENTIFICATION USING GMM-UBM
Zhenyu Xiong, Thomas Zheng, Tsinghua University, China; Zhanjieing Song, Beijing d-Ear Technologies Co., Ltd., China; Wenhu
Wu, Tsinghua University, China
SP-P8.5: ADVANCES IN CHANNEL COMPENSATION FOR SVMSPEAKER RECOGNITION I - 629
Alex Solomonoff, William Campbell, Ian Boardman, MIT Lincoln Laboratory, United Stales
SP-P8.6: IMPROVED SPEAKER MODEL MIGRATION VIA STOCHASTIC SYNTHESIS I - 633
Jiri'Navrdtil, Ganesh Ramaswamy, IBM T. J. Watson Research Center, United Stales
SP-P8.7: FACTOR ANALYSIS SIMPLIFIED I - 637
Patrick Kenny, Gilles Boulianne, Pierre Ouellel, Pierre Dumouchel, CRIM, Canada
SP-P8.8: MINIMUM CLASSIFICATION ERROR INTERACTIVE TRAINING FOR SPEAKER I - 641
IDENTIFICATION
Yusuke Kida, Kyoto University, Japan; Hiroyoshi Yamamoto, Nagoya Institute of Technology, Japan; Chiyomi Miyajima, NagoyaUniversity, Japan; Keiichi Tokuda, Tadashi Kitamura, Nagoya Institute ofTechnology, Japan
SP-P8.9: A NEW COMMON COMPONENT GMM-BASED SPEAKER RECOGNITION I - 645
METHOD
Yih-Ru Wang, Chen-Yu Chiang, National Chiao Tung University, Taiwan
SP-P8.10: GMM-BASED BHATTACHARYYA KERNEL FISHER DISCRIMINANT ANALYSIS I - 649
FOR SPEAKER RECOGNITION
Yi-Hsiang Chao, Hsin-Min Wang, Ruei-Chuan Chang, Academia Sinica, Taiwan
SP-P8.11: A STUDY OF THE RELATIVE IMPORTANCE OF TEMPORAL CHARACTERISTICS I - 653
IN TEXT-DEPENDENT AND TEXT-CONSTRAINED SPEAKER VERIFICATION
James Nealand, RMIT, Australia; Jason Peleceinos, Ran Zilca, Ganesh Ramaswamy, IBM T. J. Watson Research Center, United
Slates
SP-P8.12: NOISE ROBUST SPEAKER VERIFICATION USING MEL-FREQUENCY I - 657
DISCRETE WAVELET COEFFICIENTS AND PARALLEL MODEL COMPENSATION
Zekeriya Tufekci, Izmir Institute of Technology, Turkey; Sabri Gurbuz, Harran University, Turkey
xxvi
SP-P9: ACOUSTIC MODELING AND CLUSTERING ALGORITHMS
SP-P9.1: INITIALIZING SUBSPACE CONSTRAINED GAUSSIAN MIXTUREMODELS I - 661
Peder Olsen, Karthik Visweswariah, Ramesh Gopinath, IBM, United States
SP-P9.2: MULTI-RATE AND VARIABLE-RATE MODELING OF SPEECH AT PHONE AND I - 665
SYLLABLE TIME SCALES
Ozgiir Cetin, Mari Ostendorf, University of Washington, United States
SP-P9.3: OPTIMAL CLUSTERING AND NON-UNIFORM ALLOCATION OF GAUSSIAN I - 669
KERNELS IN SCALAR DIMENSION FORHMM COMPRESSION
Xiao-Bing Li, Frank K. Soong, ATR Spoken Language Translation Research Labs, Japan; Tor Andre Myrvoll, NorwegianUniversity ofScience and Technology, Norway; Ren-Hua Wang, University ofScience and Technology ofChina, China
SP-P9.4: HIERARCHICAL CORRELATION COMPENSATION FOR HIDDEN MARKOV I - 673
MODELS
Hui Lin, Tsinghua University, China; Ye Tian, JianLai Zhou, Microsoft Research Asia, China; Hui Jiang, York University,
Canada
SP-P9.5: CLUSTER-DEPENDENT ACOUSTIC MODELING I - 677
Bing Xiang, Long Nguyen, Spyros Matsoukas, Richard Schwartz, BBN Technologies, United States
SP-P9.6: FUZZY PARAMETER CLUSTERINGMETHOD IN SPEECH RECOGNITION I - 681
Xianghua Xu, .lie Zhu, Shanghai Jiaotong University, China
SP-P9.7: AUTOMATIC TRAINING SET SEGMENTATIONFOR MULTI-PASS SPEECH I - 685
RECOGNITION
Mark Mao, Stanford University, United States; Vincent Vanhoucke, Brian Strope, Nuance Communications, United States
SP-P9.8: GENERALIZED STATISTICAL MODELING OF PRONUNCIATION VARIATIONS I - 689
USING VARIABLE-LENGTH PHONE CONTEXT
Yuya Akila, Tatsuya Kawahara, Kyoto University, Japan
SP-P9.9: ON INITIALIZATION OF GAUSSIAN MIXTURES: A HYBRID GENETICEM I - 693
ALGORITHM
Franz Pemkopf, Graz University ofTechnology, Austria
SP-P9.10: ACOUSTIC MODEL TRAINING USING GREEDY EM I - 697
Rusheng Hu, Xiaolong Li, Yunxin Zhao, University ofMissouri-Columbia, United States
SP-P9.11: MODELING SUCCESSIVE FRAME DEPENDENCIES WITH HYBRID HMM/BN I - 701
ACOUSTIC MODEL
Konstantin Markov, Satoshi Nakamura, ATR Spoken Language Translation Research Labs, Japan
SP-P9.12: IMPROVED COVARIANCE MODELING FORMAXIMUM LIKELIHOOD I - 705
MULTIPLE SUBSPACE TRANSFORMATIONS
Xi Zhou, University of Science and Technology ofChina, China; Ye Tian, JianLai Zhou, Microsoft Research Asia, China; Bei-
qian Dai, University ofScience and Technology of China, China
SP-P10: TOPICS IN SPEAKER RECOGNITION
SP-P10.1: A PROBABILISTIC MEASURE OF MODALITY RELIABILITY IN SPEAKER I - 709
VERIFICATION
Jonas Richiardi, Plamen Prodanov, Andrzej Diygajlo, Swiss Federal Institute of Technology (EPFL), Switzerland
SP-P10.2: A CORRELATION METRIC FOR SPEAKER TRACKING USINGANCHOR I - 713
MODELS
Mikael Collet, Delphine Charlet, France Telecom R&D, France; Frederic Bimbot, IR1SA (CNRS & INRIA), France
xxvu
SP-P10.3: ESTIMATING AND EVALUATING CONFIDENCEFORFORENSIC SPEAKER I - 717
RECOGNITION
William Campbell, Douglas Reynolds, Joseph Campbell, Kevin Brady, MIT Lincoln Laboratory, United Slates
SP-P10.4: F-RATIO CLIENT-DEPENDENT NORMALISATION FOR BIOMETRIC I - 721
AUTHENTICATION TASKS
Norman Poh, Samy Bengio, ID1AP Research Institute, Switzerland
SP-P10.5: CLUSTERING SPEECH UTTERANCES BY SPEAKERUSING I - 725
EIGENVOICE-MOTIVATED VECTOR SPACE MODELS
Wei-Ho Tsai, Shih-Sian Cheng, Yi-Hsiang Chao, Hsin-Min Wang, Academia Sinica, Taiwan
SP-P10.6: T-NORM FOR TEXT-DEPENDENT COMMERCIAL SPEAKER VERIFICATION I - 729
APPLICATIONS: EFFECT OF LEXICAL MISMATCH
Matthieu Hebert, Daniel Boies, Nuance Communication, Canada
SP-P10.7: A SESSION-GMM GENERATIVE MODEL USING TEST UTTERANCE GAUSSIAN I - 733
MIXTURE MODELING FOR SPEAKER VERIFICATION
Hagai Aronowitz, Bar-Han University, Israel; David Burshtein, Tel-Aviv University, Israel; Amihood Amir, Bar-Han University,Israel
SP-P10.8: ALIZE, A FREE TOOLKIT FOR SPEAKER RECOGNITION I - 737
Jean-Francois Bonastre, Frederic Wils, University ofAvignon, France; Sylvain Meignier, University of Maine, France
SP-P10.9: SPEAKER ADAPTIVE COHORTSELECTION FOR TNORM IN I - 741
TEXT-INDEPENDENT SPEAKER VERIFICATION
Douglas Sturirn, Douglas Reynolds, MIT Lincoln Laboraloiy, United Slates
SP-P10.10: HYBRID SPEAKER-BASED SEGMENTATION SYSTEM USING MODEL-LEVEL I - 745
CLUSTERING
Hyoung-Gook Kim, Daniel Ertelt, Thomas Sikora, Technical University ofBerlin, Germemy
SP-P10.11: ROBUSTNESS OF BIT-STREAM BASED FEATURES FOR SPEAKER I - 749
VERIFICATION
Antonio Moreno-Daniel, Biing-Hwang (Fred) Juang, Georgia Institute ofTechnology, United Stales; Juan Arluro Nolazco-
Flores, lnstituto Tecnologico de Monterrey (1TESM), Mexico
SP-P10.12: TWO-WAY CLUSTER VOTINGTO IMPROVE SPEAKER DIARISATION I - 753
PERFORMANCE
Sue Tranter, Cambridge University, United Kingdom
SP-P10.13: SPEAKER DETECTION WITHOUT MODELS I - 757
Daniel Gillick, Stephen Stafford, Barbara Peskin, Berkeley, United States
SP-P11: TOPICS IN SPEECH CODING AND ENHANCEMENT
SP-P11.1: IMPROVING THE 2.4 KB/S MILITARY STANDARDMELP (MS-MELP) CODER I - 761
USING PITCH-SYNCHRONOUS ANALYSIS AND SYNTHESIS TECHNIQUESAli Erdem Ertan, Thomas P. Barnwell III, Georgia Institute ofTechnology, United Slates
SP-P11.2: ULTRA LOW BIT RATE SPEECH CODING USING AN ERGODIC HIDDEN I - 765
MARKOV MODEL
Matthew Lee, Adriane Durey, Elliot Moore, Mark Clements, Georgia Institute ofTechnology, United Slates
SP-P11.3: TOWARDS ILBC SPEECH CODING AT LOWER RATES THROUGH A NEW I - 769
FORMULATION OF THE START STATE SEARCH
Christopher M. Garrido, Manohar N. Murlhi, University of Miami, United States; S0ren Vang Andersen, Aalborg University,Denmark
xxvin
SP-P11.4: A MISSING-DATA APPROACH TO NOISE-ROBUST LPC EXTRACTIONFOR I
VOICED SPEECH USING AUXILIARY SENSORS
Cenk Demiroglu, Thomas P. Barnwell 111, Georgia Institute ofTechnology, United States
SP-P11.5: A TECHNIQUEOF MULTI-TAP LONG TERMPREDICTOR (LTP) FILTER I
USING SUB-SAMPLE RESOLUTION DELAY
Mark Jasiuk, Tenkasi Ramabadran, UdarMittal, James Ashley, Michael McLaughlin, Motorola Labs, United States
SP-P11.6: VOICE ACTIVITY DETECTIONBASED ON GENERALIZED GAMMA I
DISTRIBUTION
Jong Won Shin, Seoul National University, Republic of Korea; Joon-Hyuk Chang, University of California, Santa Barbara,
United States; Hwan Sik Yun, Nam Soo Kim, Seoul National University, Republic ofKorea
SP-P11.7: INCREASING THE ROBUSTNESS OF CELP-BASED CODERS BY I
CONSTRAINED OPTIMIZATION
Mohamed Chibani, Philippe Gournay, Roch Lefebvre, University of Sherbrooke, Canada
SP-P11.8: JOINT OPTIMIZATION OF EXCITATION PARAMETERS IN I
ANALYSIS-BY-SYNTHESIS SPEECH CODERS HAVING MULTI-TAP LONG TERMPREDICTOR
Udar Mittal, James Ashley, Edgardo Cruz-Zeno, Mark Jasiuk, Motorola Labs, United States
SP-P11.9: BLOCK-BASED BANDWIDTH EXTENSION OF NARROWBAND SPEECH SIGNAL I
BY USING CDHMM
Sheng Yao, Cheung-Fat Chan, City University ofHong Kong, Hong Kong SAR of China
SP-P11.10: SEGMENTATION-BASED SPEECHENHANCEMENT FOR INTELLIGIBILITY I
IMPROVEMENT IN MELP CODERS USING AUXILIARY SENSORS
Cenk Demiroglu, Sunil Kamath, David Anderson, Georgia Institute ofTechnology, United States
SP-P11.11: STOCHASTIC INTEGRATION AND LONG TERMPREDICTOR ESTIMATION I
UNDER NOISY CONDITIONS FOR SPEECH ENHANCEMENT
Marcin KuropaWinski, Bastiaan Kleijn, KTH (Royal Institute of Technology), Sweden
SP-P11.12: A ROBUST NARROWBAND TO WIDEBAND EXTENSION SYSTEMFEATURING I
ENHANCED CODEBOOKMAPPING
Takahiro Unno, Texas Instruments, United States; Alan McCree, MIT Lincoln Laboratory, United States
SP-P11.13: ARTIFICIAL BANDWIDTH EXPANSION METHOD TO IMPROVE I
INTELLIGIBILITY AND QUALITYOF AMR-CODED NARROWBAND SPEECH
Laura Laaksonen, Nokia Research Center, Finland; Juho Kontio, Paavo Alku, Helsinki University ofTechnology, Finland
SP-P11.14: A SOFT DECISION BASED NOISE CROSS POWERSPECTRAL DENSITY I
ESTIMATION FOR TWO-MICROPHONESPEECH ENHANCEMENTSYSTEMS
Xuefeng Zhang, Ying Jia, Intel China Research Center, China
SP-P12: LARGE VOCABULARY ASR
SP-P12.1: LATTICE SEGMENTATION AND SUPPORT VECTOR MACHINES FORLARGE I
VOCABULARY CONTINUOUS SPEECH RECOGNITION
Veera Venkataramani, Johns Hopkins University, United States; William Byrne, Cambridge University, United Kingdom
SP-P12.2: FIRST STEPS IN FAST ACOUSTIC MODELING FOR A NEW TARGET I
LANGUAGE: APPLICATION TO VIETNAMESE
Viet-BacLe, Laurent Besacier, CLIPS /IMAG, France
SP-P12.3: CROSS DOMAIN AUTOMATIC TRANSCRIPTION ONTHE TC-STAR EPPS I
CORPUS
Christian Gollan, Maximilian Bisani, Slephan Kanthak, RalfSchluter, Hermann Ney, RWTH-Aachen Germany, Germany
xxix
SP-P12.4: USING RULE-BASED KNOWLEDGE TO IMPROVE LVCSR I - 829
Rene Beutler, Tobias Kaufmann, Beat Pfister, ETH, Switzerland
SP-P12.5: ADAPTATION STRATEGIES FORTHE ACOUSTIC AND LANGUAGE MODELS IN I - 833
BILINGUAL SPEECH TRANSCRIPTION
Javier Dieguez-Tiraclo, Carmen Garcia-Mateo, Laura Docio-Fernandez, Antonio Cardenal-Lopez, ETSI Telecomunicacion,
Spain
SP-P12.6: A STUDY ON KNOWLEDGESOURCE INTEGRATION FOR CANDIDATE I - 837
RESCORING IN AUTOMATIC SPEECH RECOGNITION
JinyuLi, YuTsao, Chin-Hui Lee, Georgia Institute ofTechnology, United States
SP-P12.7: DEVELOPMENT OF THE CUHTK 2004 MANDARIN CONVERSATIONAL I - 841
TELEPHONE SPEECH TRANSCRIPTION SYSTEM
Mark J. F. Gales, Bin Jia, Andrew Liu, Khe Chai Sim, Phi! Woodland, Kai Yu, Cambridge University, United Kingdom
SP-P12.8: BAYESIAN MODEL COMBINATION (BAYCOM) FOR IMPROVED I - 845
RECOGNITION
Ananth Sankar, Nuance Communications, United Stales
SP-P12.9: INVESTIGATION OF ACOUSTIC MODELING TECHNIQUES FORLVCSR I - 849
SYSTEMS
Xunying Liu, Mark J. F. Gales, Khe Chai Sim, Kai Yu, Cambridge University, United Kingdom
SP-P12.10: IMPROVED CONFUSION NETWORK ALGORITHM AND SHORTEST PATH I - 853
SEARCH FROM WORD LATTICE
Jian Xue, Yunxin Zhao, University ofMissouri-Columbia, United Stales
SP-P12.11: THAI AUTOMATIC SPEECHRECOGNITION I - 857
Sinapom Suebvisai, Paisam Charoenpornsawat, Alan W. Black, Carnegie Mellon University, United States; Monika Woszczyna,Multimodal Technologies, Inc., United States; Tanja Schultz, Carnegie Mellon University, United States
SP-P12.12: DEVELOPMENT OF THE CU-HTK 2004 BROADCAST NEWS TRANSCRIPTION I - 861
SYSTEMS
Do Yeong Kim, Ho Yin Chan, Gunnar Evermann, Mark J. F. Gales, David Mrva, Khe Chai Sim, Phil Woodland, CambridgeUniversity, United Kingdom
SP-P12.13: CROSS-LANGUAGE ACOUSTIC MODEL REFINEMENT FORTHEINDONESIAN I - 865
LANGUAGE
Terrence Martin, Sridha Sridharan, Queensland University of Technology, Austredia
SP-P13: SPEECH ANALYSIS AND PRODUCTION
SP-P13.1: ANALYSIS OF SPECTRAL MEASURES FOR VOICED SPEECH WITH VARYING I - 869
NOISE AND PERTUBATION LEVELS
Eoin O'Leidhin, Peter Murphy, University ofLimerick, Ireland
SP-P13.2: AUTOMATIC DYSPHONIA RECOGNITION USING BIOLOGICALLY-INSPIRED I - 873
AMPLITUDE-MODULATION FEATURES
Nicolas Malyska, Thomas Quatieri, Douglas Sturim, MIT Lincoln Laboratory, United States
SP-P13.3: VOICED/UNVOICED DETERMINATION OFSPEECH SIGNAL IN NOISY I - 877
ENVIRONMENT USING HARMONICITY MEASURE BASED ON INSTANTANEOUS FREQUENCYDhany Arifianto, Takao Kobayashi, Tokyo Institute of Technology, Japan
SP-P13.4: SNR AND LOCAL NOISE POWER ESTIMATIONS BASED ONGAUSSIAN I - 881
MIXTURE MODELING ON THE LOG-POWER DOMAIN
Kazuya Takeda, Tran Huy Dal, Hiroshi Fujimura, Fumitada Itakura, Nagoya University, Japan
xxx
SP-P13.5: DETECTION OF SYMBOLIC GESTURAL EVENTS IN ARTICULATORY DATA 1. 885FOR USE IN STRUCTURAL REPRESENTATIONS OF CONTINUOUS SPEECHAlexander Gutkin, Simon King, University ofEdinburgh, United Kingdom
SP-P13.6: MATHEMATICALEVIDENCE OF THE ACOUSTIC UNIVERSAL STRUCTURE IN I - 889SPEECH
Nobuaki Minematsu, University of Tokyo, Japan
SP-P13.7: MODELING OF THE FRONT CAVITY AND SUBLINGUAL SPACE IN AMERICAN I - 893ENGLISH RHOTIC SOUNDS
Zhaoyan Zhang, Carol Espy-Wilson, University ofMaryland, United States; Suzanne Boyce, University of Cincinnati, United
States; Mark Tiede, Haskins Laboratories, United States
SP-P13.8: OBJECTIVE QUALITY MEASURES FOR GLOTTAL INVERSE FILTERING OF I - 897
SPEECH PRESSURE SIGNALS
Tom Bdckstrom, MattiAiras, Laura Lehto, Paavo Alku, Helsinki University of Technology, Finland
SP-P13.9: EFFECTS OF GLOTTAL AND LIP BOUNDARY CONDITIONS ON VOCAL-TRACT I - 901
AREA FUNCTION ESTIMATESFROMSPEECH SIGNALS
Huiqun Deng, Rabab K. Ward, Michael Beddoes, Murray Hodgson, University ofBritish Columbia, Canada
SP-P13.10: ADAPTIVE FILTERBANKS INSPIRED BY THE AUDITORY SYSTEMFOR I - 905SPEECH FEATURE EXTRACTION
Ramdas Kumaresan, Gopi Krishna Allu, University ofRhode Island, United States; Peter Cariani, Tufts Medical School, United
States
SP-P13.11: MULTI-SPEAKERARTICULATORY RECONSTRUCTIONBASED ON AN EIGEN I - 909
ARTICULATORYHMM
Sadao Hiroya, Tetkemi Mochida, NTT Communication Science Laboratories, Japan
SP-P13.12: A GRAPHICAL MODEL FORFORMANT TRACKING I - 913
Jonathan Malkin, Xiao Li, JeffBilmes, University of Washington, United States
SP-P13.13: DYSPHONIC SPEECH ANALYSIS USING GENERALIZED VARIOGRAM I - 917
Abdellah Kacha, Francis Grenez, Jean Schoentgen, Universite Libre de Bruxelles, Belgium; KhierBenmahammed, Universite de
Setif Algeria
SP-P14: FEATURE EXTRACTION AND MODELING
SP-P14.1: TRAINING WIDEBAND ACOUSTIC MODELS USING MIXED-BANDWIDTH I - 921
TRAINING DATA VIA FEATURE BANDWIDTH EXTENSION
Michael Seltzer, AlexAcero, Microsoft Research, United States
SP-P14.2: MINIMUM PHONEMEERROR BASED HETEROSCEDASTIC LINEAR I - 925
DISCRIMINANT ANALYSIS FOR SPEECH RECOGNITION
Bing Zhang, Northeastern University, United States; Spyros Matsoukas, BBN Technologies, United States
SP-P14.3: A STUDY OF AUDITORY MODELING AND PROCESSING FOR SPEECH I - 929
SIGNALS
Woojay Jeon, Biing-Hwang (Fred) Juang, Georgia Institute ofTechnology, United Stales
SP-P14.4: A WAVELETAND FILTER BANKFRAMEWORK FOR PHONETIC I - 933
CLASSIFICATION
Ghinwa Choueiter, James Glass, Massachusetts Institute ofTechnology, United States
SP-P14.5: AUTOMATIC SYLLABLE STRESS DETECTION USING PROSODIC FEATURES I - 937
FORPRONUNCIATION EVALUATION OF LANGUAGE LEARNERS
Joseph Tepperman, Shrikanth Narayanan, University ofSouthern California, United States
xxxi
SP-P14.6: PREDICTINGFORMANT FREQUENCIES FROM MFCC VECTORS I - 941
Jonathan Darch, Ben Milner, Xu Shao, University of East Anglia, United Kingdom; Saeed Vaseghi, Qin Yan, Brunei University,
United Kingdom
SP-P14.7: TONOTOPIC MULTI-LAYERED PERCEPTRON: A NEURAL NETWORKFOR I - 945
LEARNING LONG-TERM TEMPORALFEATURES FOR SPEECH RECOGNITION
Barry Chen, Qifeng Zhu, Nelson Morgan, University' of California Berkeley, United Stales
SP-P14.8: TOWARDS AN INTELLIGENT ACOUSTIC FRONT-END FOR AUTOMATIC I - 949
SPEECH RECOGNITION: BUILT-IN SPEAKER NORMALIZATION (BISN)
Umil Yapanel, University ofColorado at Boulder, United Slates; John H. L. Hansen, University ofColorado, Boulder, United
States
SP-P14.9: QUASI-CONTINUOUS LOCAL CODEBOOKFEATURES FOR MULTILINGUAL I - 953
ACOUSTIC PHONETIC MODELLING
Frank Diehl, Asuncidn Moreno, Universitat Politecnica de Catalunya, Spain
SP-P14.10: GARCH COEFFICIENTS AS FEATUREFORSPEECH RECOGNITIONIN I - 957
PERSIAN ISOLATED DIGIT
MohamadAbdolahi, Hamidreza Amindavar, Amirkabir University ofTechnology, Iran (Islamic Republic of)
SP-P14.11: FMPE: DISCRIMINATIVELY TRAINED FEATURES FOR SPEECH I - 961
RECOGNITION
Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Sollau, Geoffrey Zweig, IBM, United States
SP-P15: ADAPTATION AND NORMALIZATION
SP-P15.1: VARIATIONAL BAYESIAN ADAPTATION FOR SPEAKER CLUSTERING I - 965
Fabio Valente, Christian Wellekens, Institut Eurecom, France
SP-P15.2: AUTOMATIC DISFLUENCY REMOVAL ON RECOGNIZED SPONTANEOUS I - 969
SPEECH - RAPID ADAPTATION TO SPEAKER DEPENDENT DISFLUENCIES
Matthias Honal, Universitat Karlsruhe, Germany; Tanja Schultz, Carnegie Mellon University, United Stales
SP-P15.3: AGGREGATEA POSTERIORI LINEAR REGRESSION FOR SPEAKER ADAPTATION I - 973
Chih-Hsien Huang, Jen-Tz.ung Chien, National Cheng Kung University, Taiwan
SP-P15.4: TWO-STAGE SPEAKER ADAPTATION OF HYBRID TIED-POSTERIOR ACOUSTIC I - 977
MODELS
Jan Sladermann, Gerhard Rigoll, Technische Universitat Miinchen, Germany
SP-P15.5: VARIOUS REFERENCE SPEAKERS DETERMINATION METHODS FOR I - 981
EMBEDDED KERNEL EIGENVOICE SPEAKER ADAPTATION
Brian Mak, Simon Ho, Hong Kong University ofScience and Technology, Hong Kong SAR of China
SP-P15.6: KERNEL EIGENSPACE-BASED MLLRADAPTATION USING MULTIPLE I - 985
REGRESSION CLASSES
Roger Hsiao, Brian Mak, Hong Kong University ofScience and Technology, Hong Kong SAR of China
SP-P15.7: AUTOMATICALLY TRANSCRIBING MEETINGS USING DISTANT I - 989
MICROPHONES
Florian Metze, Christian Ftlgen, Universitat Karlsruhe (TH), Germany; Yue Pan, Waibel Alexander, Carnegie Mellon University,United Stales
SP-P15.8: A NOVEL METHOD FOR RAPID SPEAKERADAPTATION BASED ON SUPPORT I - 993
SPEAKER WEIGHTING
Tie Cai, Jie Zhu, Shanghai Jiaolong University, China
SP-P15.9: ADAPTIVE TRAINING USING SIMPLE TARGET MODELS I - 997
Georg Slemmer, Fabio Britgnara, Diego Giuliani, ITC-irst, Italy
xxxi I
SP-P15.10: LEARNING PRONUNCIATION AND FORMULATION VARIANTS IN I -1001CONTINUOUS SPEECH APPLICATIONS
Daniele Colibro, Luciano Fissore, Cosmin Popovici, Claudio Vair, Loquendo, Italy; Pietro Laface, Politecnico di Torino, Italy
SP-P15.11: ALTERNATE PHONE MODELS FOR CONVERSATIONAL SPEECH I -1005Lori Lamel, Jean-Luc Gauvain, CNRS-LIMSI, France
SP-P15.12: WHISPERY SPEECH RECOGNITION USING ADAPTED ARTICULATORY I -1009FEATURES
Szu-Chen Jou, Tanja Schultz, Alex Waibel, Carnegie Mellon University, United States
SP-P16: TOPICS IN SPEECH PROCESSING AND SYSTEMS
SP-P16.1: OPEN VOCABULARY ASR FOR AUDIOVISUALDOCUMENT INDEXATION I -1013Alexandre Allauzen, Jean-Luc Gauvain, LIMSI-CNRS, France
SP-P16.2: CONSTRAINED PHRASE-BASED TRANSLATION USING WEIGHTED FINITE I -1017
STATE TRANSDUCER
Bowen Zhou, Stanley Chen, Yuqing Gao, IBM T. J. Watson Research Center, United States
SP-P16.3: UNSUPERVISEDVOCABULARY EXPANSION FOR AUTOMATIC I -1021
TRANSCRIPTION OF BROADCAST NEWS
Katsutoshi Ohtsuki, Nobuaki Hiroshima, Masahiro Oku, Akihiro Imamura, NTT Corporation, Japan
SP-P16.4: CLASSIFICATION OF STRUCTUREDDESCRIPTIONS I -1025
Srinivas Bangalore, AT&T Labs - Research, United Slates; Owen Rainbow, Columbia University, United States
SP-P16.5: MAXIMUM ENTROPY SEGMENTATION OF BROADCAST NEWS I -1029
Heidi Christensen, BalaKrishna Kolluru, Yoshihiko Gotoh, University ofSheffield, United Kingdom; Steve Renals, University ofEdinburgh, United Kingdom
SP-P16.6: THE AT&T WATSON SPEECH RECOGNIZER I -1033
Vincent Goffin, CyrilAllauzen, Enrico Bocchieri, Dilek Hakkani-Tur, Andre] Ljolje, Sarangarajan Parthasarathy, Mazin Rahim,
Giuseppe Riccardi, Murat Saraclar, AT&TLabs - Research, United States
SP-P16.7: OPEN VOCABULARY CHINESENAME RECOGNITION WITH THEHELP OF I -1037
CHARACTER DESCRIPTION AND SYLLABLE SPELLING RECOGNITION
Ching-Ho Tsai, NickJ.-C. Wang, Patrick Huang, Jia-Lin Shen, Delia Electronics, Inc., Taiwan
SP-P16.8: ERROR PREDICTION IN SPOKEN DIALOG: FROM SIGNAL-TO-NOISE RATIO I -1041
TO SEMANTIC CONFIDENCESCORES
Dilek Hakkani-Tur, Gokhan Tur, Giuseppe Riccardi, AT&TLabs - Research, United States; Hong KookKim, Gwangju Institute
ofScience and Technology, Republic ofKorea
SP-P16.9: INCORPORATING DIALOGUE CONTEXT AND TOPIC CLUSTERING IN I -1045
OUT-OF-DOMAINDETECTION
Ian Lane, Tatsuya Kawahara, Kyoto University, Japan
SP-P16.10: STRUCTURING BASEBALL LIVE GAMES BASED ON SPEECH RECOGNITION I -1049
USING TASK DEPENDENT KNOWLEDGEAND EMOTION STATE RECOGNITION
Atsushi Sako, Yasuo Ariki, Kobe University, Japan
SP-P16.11: A NEWASR EVALUATION MEASUREAND MINIMUM BAYES-RISK DECODING I -1053
FOR OPEN-DOMAIN SPEECH UNDERSTANDING
Hiroaki Nanjo, Ryukoku University, Japan; Tatsuya Kawahara, Kyoto University, Japan
SP-P16.12: SPEECHRECOGNITION OF A NAMEDENTITY I -1057
Tatsuhiko Tomila, Waseda University, Japan; Yoshiyuki Okimolo, Matsushita Electric Industrial Co., Ltd., Japan; HirofumiYamamoto, ATR Spoken Language Translation Research Labs, Japan; Yoshinori Sagisaka, Waseda University, Japan
xxxiii
SP-P16.13: AUTOMATIC DIALOG ACT SEGMENTATION AND CLASSIFICATION IN I -1061
MULTIPARTY MEETINGS
Jeremy Aug, Yang Liu, Elizabeth Shriberg, International Computer Science Institute, United States
SP-P16.14: SENTENCE EXTRACTION-BASED PRESENTATION SUMMARIZATION I -1065
TECHNIQUES AND EVALUATION METRICS
Makolo Hirohata, Yousuke Shinnaka, Kofi Iwano, Saelaoki Furui, Tokyo Institute of Technology, Japan
SP-P17: TOPICS IN SPEECH ENHANCEMENT, SEPARATION AND DEREVERBERATION
SP-P17.1: BLIND DEREVERBERATION BASED ON ESTIMATES OF SIGNAL I -1069
TRANSMISSION CHANNELS WITHOUT PRECISEINFORMATION OF CHANNEL ORDER
Takafumi Hikichi, Marc Delcroix, Masato Miyoshi, NTT Corporation, Japan
SP-P17.2: FAST ESTIMATION OF A PRECISE DEREVERBERATION FILTER BASED ON I -1073
SPEECH HARMONICITY
Keisuke Kinoshita, Tomohiro Nakalani, Masato Miyoshi, NTT Corporation, Japan
SP-P17.3: CODEBOOK-BASED BAYESIAN SPEECH ENHANCEMENT I -1077
Sriram Srinivasan, Jonas Samuelsson, Bastiaan Kleijn, KTH (Royal Institute of Technology), Sweden
SP-P17.4: OVERCOMING THE STATISTICAL INDEPENDENCE ASSUMPTION W.R.T I -1081
FREQUENCY IN SPEECH ENHANCEMENT
Tim Fingscheidt, Christophe Beaugeeint, Suhaeli Suhadi, Siemens AG, COM Mobile Phones, Germany
SP-P17.5: A TWO-STAGE ALGORITHM FOR ENHANCEMENT OF REVERBERANT SPEECH I -1085
Mingyang Wu, Fair Isaac Corporation, United States; DeLiang Wang, The Ohio Stale University, United Stales
SP-P17.6: MATRIX QUANTIZATION BASED TIME-VARYING FILTER SPEECH I -1089
ENHANCEMENT
Sharath Rao K, Boston University, United Stales; Sreenivas Thippur, Indian Institute ofScience, India
SP-P17.7: LEAKAGE MODELAND TEETH CLACK REMOVAL FOR AIR- AND I -1093
BONE-CONDUCTIVE INTEGRATEDMICROPHONES
Zieheng Liu, Amar Subramanya, Zhengyou Zhang, Jasha Droppo, AlexAcero, Microsoft Research, United Slates
SP-P17.8: SPEECH ENHANCEMENT USING A MMSE SHORT TIME SPECTRAL I -1097
AMPLITUDE ESTIMATOR WITH LAPLACIAN SPEECH MODELING
Bin Chen, Philipos Loizou, University ofTexas, Dallas, United States
SP-P17.9: SEPARATION OF FRICATIVES AND AFFRICATES I -1101
Guoning Hu, DeLiang Wang, The Ohio Stale University, United States
SP-P17.10: SPEECH ENHANCEMENT BASED ON FILTERING THE SPECTROTEMPORAL I -1105
MODULATIONS
Nima Mesgarani, Shihab Shamma, University ofMaryland, United Slates
SP-P17.11: IMPROVED KALMAN FILTERING FOR SPEECH ENHANCEMENT I -1109
Volodya Grancharov, Jonas Samuelsson, Bastiaan Kleijn, KTH (Royal Institute of Technology), Sweden
SP-P17.12: ADAPTIVE DECORRELATION FILTERING ALGORITHM FOR SPEECH SOURCE I -1113
SEPARATION IN UNCORRELATED NOISES
Rong Hu, Yunxin Zhao, University ofMissouri-Columbia, United Slates
SP-P17.13: AN IMPROVED ESTIMATION OF A PRIORI SPEECH ABSENCE PROBABILITY I -1117
FOR SPEECHENHANCEMENT : IN PERSPECTIVE OF SPEECHPERCEPTION
Min Seok Choi, Hong-Goo Kang, Yonsei University, Republic ofKorea
xxxiv
SP-P17.14: SPEECHENHANCEMENT USING A SWITCHING KALMAN FILTER WITH A I -1121
PERCEPTUAL POST-FILTER
Jianping Deng, Martin Bouchard, TetH. Yeap, University of Ottawa, Canada
Volume II
IMDSP-Ll: WATERMARKING
IMDSP-L1.1: USING PERCEPTUAL MODELS TO IMPROVE FIDELITY AND PROVIDE II -1
INVARIANCE TO VALUMETRIC SCALINGFOR QUANTIZATION INDEX MODULATIONWATERMARKING
Qiao Li, Ingemar Cox, University College London, United Kingdom
IMDSP-L1.2: SCALAR SCHEME FORMULTIPLE USER INFORMATION EMBEDDING II - 5
AbdellatifZaidi, Pablo Piantanida, Pierre Duhamel, LSS/CNRS SUPELEC, France
IMDSP-L1.3: RANDOMIZED DETECTIONFOR SPREAD-SPECTRUM WATERMARKING: II - 9
DEFENDING AGAINST SENSITIVITY AND OTHER ATTACKS
Ramarathnam Venkalesan, Mariusz Jakubowski, Microsoft Research, United States
IMDSP-L1.4: LINEAR COMBINATION COLLUSION ATTACK AND ITS APPLICATION ON AN II -13
ANTI-COLLUSION FINGERPRINTING
Yongdong Wu, Institute for Infocomm Research, Singapore
IMDSP-L1.5: PITCH AND DURATION MODIFICATION FOR SPEECH WATERMARKING II -17
Mehmet Celik, Gaurav Sharma, University ofRochester, United States; A. MuratTekalp, University ofRochester, United States /
Koc University, Turkey
IMDSP-L1.6: MORPHOLOGICAL STEGANALYSIS OF AUDIO SIGNALS AND THE II - 21
PRINCIPLE OF DIMINISHING MARGINAL DISTORTIONS
Oktay Altun, Gaurav Shanna, Mehmet Celik, Mark Sterling, Edward Titlebaum, MarkBocko, University ofRochester, United
States
IMDSP-L2: DENOISING
IMDSP-L2.1: IMAGE DENOISING BY NON-LOCAL AVERAGING II - 25
Antoni Buades, Bartomeu Coll, Universitat de les Illes Balears, Spain; Jean-Michel Morel, ENS Cachan, France
IMDSP-L2.2: IMAGE DENOISING FOR SIGNAL-DEPENDENT NOISE II - 29
Keigo Hirakawa, New England Conservatoiy ofMusic, United States; Tliomas W. Parks, Cornell University, United States
IMDSP-L2.3: WAVELET DOMAIN PARTITION-BASED IMAGE DENOISING II - 33
11 Ryeol Kim, Kenneth E. Banter, University ofDelaware, United States
IMDSP-L2.4: AN IMPROVED IMAGE DENOISING ALGORITHM BASED ON WEIGHTED II - 37
ADAPTIVE LOCAL BOUNDS
Qi Li, Tania Stathaki, Imperial College London, United Kingdom
IMDSP-L2.5: A SELF-CONSISTENTWAVELETMETHOD FOR DENOISING IMAGES II - 41
WITH MISSING PIXELS
Thomas Lee, Colorado State University, United States; Xiao-Li Meng, Harvard University, United States
JCOT