AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamalai B.E. Master’s...
-
Upload
darlene-barnett -
Category
Documents
-
view
217 -
download
0
Transcript of AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION by Narayanan Annamalai B.E. Master’s...
AN EXTENSIBLE TRANSCODER FOR HTML TO VOICEXML CONVERSION
by
Narayanan Annamalai B.E.
Master’s Thesis
Advisors:Dr. Gopal Gupta
andDr. B Prabhakaran
THE UNIVERSITY OF TEXAS AT DALLASMay 2002
By 2003 - One billion people will use wireless
devices. By 2005 - Half of them will have Internet connectivity. Growth far surpasses that of wire-bound Internet
users. New Technology is needed to support the masses of
Customers. A medium is required for Data Transfer The medium should be easy to use and efficient. The right choice is – Voice
The Scenario
Motivation
Drawback of Existing Web Infrastructure – content
Users of WAP – not satisfied
Not feasible to maintain multiple versions
Client
WEB SERVER
(content in format A)
FORMAT TRANSLATOR
(Convert A to B)
Request BB
A
B
Related Work
The visually impaired – used Screen readers.
Frankie James proposed Auditory HTML Access System (AHA) – used distinct tones
Above two systems – No Interactive feature
Stuart Goose et al. proposed HTML to VoXML converter. VoXML is the ancestor of VoiceXML.
Application of Transcoder
PSTN
INTERNET
Mobile User
Voice Server
Transcoder
WEB SERVER
Req.
http req. html
VoiceXML
VoiceXML
Audio
Application of Transcoder
INTERNET
Client
TranscoderWEB SERVER
http req.
Voice Browser
HTML
VXML
HTML
Audio
Application of Transcoder
INTERNET
Client
WEB SERVER
http req.
Transcoder
Voice Browser
HTMLAudio
VXML
VXML
Objectives
Provide means for Visually impaired to access the Web.
Strive to express the structure of HTML pages in Voice form.
Application can be custom made with respect to User’s wish.
Make the transcoder extensible – to accommodate new HTML tags in future
What is VoiceXML?
VoiceXML – Standard developed by VoiceXML forum (AT & T, Motorola, IBM, Lucent)
Markup language used for creating Human – Computer interfaces through telephone.
User can interact with a VoiceXML page through spoken or DTMF inputs (Telephone key press).
Plays synthesized speech, audio files using TTS (Text to speech) converters
VoiceXML Example
<?xml version="1.0"?>
<vxml version="2.0">
<form id="f1">
<block> starting of the vxml page </block>
<block> Sample Page </block>
<block> The output is in the form of audio</block>
</form>
</vxml>
<html>
<head>
<title> Sample Page</title>
</head>
<body>
<h3> The output is in the form of audio </h3>
</body>
</html>
HTML file VoiceXML file
HTML vs VoiceXML
HTML VoiceXML
1. Single unit, presented with full efficiency.
2. Displays several inputs at the same time.
3. Input does not need any grammar for validation.
1. Consists of forms and blocks alone.
2. Inputs are collected sequentially
3. Every input needs a grammar for validation.
System Model
The application is realized in two phases
I. Parsing Phase
II. Translation Phase
Parsing Phase: The Input HTML file is parsed and the HTML node tree is obtained as output. Parser used - purpose is Web-Wise Systems HTML parser
Translation Phase: Each HTML node is converted in to corresponding VoiceXML node.
System Architecture
Input Provider
Parser
Translator
Internal data sheet
External data sheet
Output VoiceXML file
Parsing Phase
The structure of the HTML file should be transported to the VoiceXML file.
HTML file is parsed and the root node of the input file is obtained. Any HTML file’s root node will be the <html> node
<html>
<head> <body>
<html>
<head><title>
Example 1</title></head>
<body>
<h1> Hello World </h1>
</body>
</html>
Input HTML file Output parse tree
(htmlRoot = new RootNode())
.addNode(new PageNode()
.addNode(new HeadNode()
.addNode(new TitleNode()
.addNode(new StringNode().setHtmlData(“Example1”))
) //end TitleNode
) //end HeadNode
.addNode(new BodyNode()
.addNode(new H1Node().setAlign(``center’’)
.addNode(new StringNode().setHtmlData( ``Hello World ‘’))
) // end H1 Node
) // end Body Node
) //end PageNode
Parsing Example
Translating Phase: Issues
Translating phase: Node tree is traversed recursively (from left to right – depth first).
Html node converted to appropriate VoiceXML node.
Issues:
Verify inputs before submission – different from HTML
Highly structured – follows strict convention eg. consider <prompt> It is a beautiful city </prompt> syntactically right, but can be child of only field or block
One to one conversion not possible always
Forms: radio tag
Radio tags – provide choices, user selects one choice.
When one choice selected, other becomes inactive.
HTML – radio tags does not have closing tag.
Challenge is to identify the last ‘radio’ button of the same type.
example: Input HTML section
<form>
<INPUT type = radio name = “sex’’ value=“male”> Male <br>
<INPUT type = radio name = “sex’’ value=“female”> Female <br>
<h1> End of Radio </h1>
</form>
Forms: radio tag (contd.)
Output VoiceXML section ……
<field name=“sex”>
<prompt> Please select an Entrée, what sex <enumerate/></prompt>
<option dtmf=“1” VALUE=“Male”> Male </option>
<option dtmf=“2” VALUE=“Female”> Female </option>
</field> …….
Form node
Radio: male sex
Radio: female sex
h1
String: ‘end of radio’
Form: Text Box
text box and text area are used to obtain String inputs from user.
No sample space for string : e.g., name of a person.
VoiceXML inputs need a grammar always. <record> element is used to solve the problem.
User can specify record time and attributes.
<submit> needs a list of fields and a URL for submission.
Should verify the inputs with user before submission.
Form: text box (contd.)
Sample HTML extract Corresponding VoiceXML extract
…….
<form action=WW method=XX>
<LABEL for=“firstname”> Firstname </LABEL>
<INPUT type=“text” id=“firstname”>
<INPUT type=“submit” value= “send”>
</form>
……..
……..
<form id=“f2”>
<record name=“firstname” beep=“true” maxtime=“10s” finalsilence=“4000ms” dtmfterm=“true”>
<prompt> At tone, speak First name: </prompt>
<noinput> I did not hear anything, please try again </noinput>
<filled> <prompt> Your input is <audio expr=“firstname”/></prompt>
</filled>
…….
<submit next=WW method=XX namelist= …..> </form>
Links
In HTML, links are given by <a href..> tag in two ways:
• To different part of the same document.
• To a different document altogether.
In VXML, links are provided by <goto next ..> method.
To Internal documents: Sub-dialogs are created. Sub-dialog is like a function call. <goto next= sub-dialog name>
To External documents: <goto next=URL>. The target HTML URL is converted to a VoiceXML page, thus VoiceXML URL is provided.
Text Display Tags
Tags used for display – does not make much sense in VoiceXML.
Function of some display tags can be spoken out orally
<block>…….</block> and <prompt>…….</prompt> are tags used to speak out text enclosed between them.
Content to be spoken can be tailored using Interface sheet.
The Interface sheet – used to add new HTML tags, making the system Extensible
Extensible Feature of Transcoder
A
B
Input Attributes
HTML Tags Corresponding Text spoken
Input duration in seconds for Text-box :
Input duration in seconds for Text-Area :
………….
<blockquote>
</blockquote>
…………
Starting of text quoted from elsewhere
Ignore
…………..
Row A – Input Attributes can be supplied by the user
Row B – Treatment of HTML tags can be altered, ignored. New tags can be added in this section.
Conclusion
Our transcoder is capable of converting any HTML (4.0 or lower version) file to corresponding VoiceXML file.
Prominent feature of the Transcoder – Extensibility and User Inter-activeness.
HTML to VoiceXML paves the way for Anytime, Anywhere Internet access for mobile clients.
Future Work
Our system will strive to remove the restriction – all open tags in the input HTML file should have close tags.
Try to process applets and Scripts that may be present in input HTML page.
Analyzing the feasibility of implementing Transcoder in Proxy Servers.