BUENO Technology--PFA Lined Valve, Fitting and Pipe / PFA ...
Pfa 10.0 Beta (Ang)
-
Upload
ahmed-bahri -
Category
Documents
-
view
217 -
download
0
Transcript of Pfa 10.0 Beta (Ang)
-
8/2/2019 Pfa 10.0 Beta (Ang)
1/62
2
-
8/2/2019 Pfa 10.0 Beta (Ang)
2/62
Acknowledgment
We would like to thank our supervisors Mr. Kamel KHENISSI for his valuable support to
make this work without forgetting to thank Ms. Wiem FRADI for linguistic revision of
our report.
Ahmed BAHRI
Moemen MANSOURI
3
-
8/2/2019 Pfa 10.0 Beta (Ang)
3/62
Abstract
"The voice is the technology of tomorrow" is the affirmation of such specialists of the
giants Microsoft and IBM. In North America, where it is ahead of decades compared to
the rest of the world, speech technology is becoming the most natural mode of interactionwith the machine: Windows7, the flagship product of Microsoft, is an excellent example.
The maturity of the technology of synthesis and voice recognition has led researchers to
the realization of an old dream: "the understanding of spontaneous speech in the
machine".
The work developed during this project consists in the creation of an application to
manipulate vocally MySQL data base utility.
The project focused on finding vocabularies which allow the manipulation of MySQL
data base including the vocal manipulation of SKYPE an Mozilla Firefox. In automatic
speech recognition, we used a specific configuration of the framework used in this project
with the aim of having the best result.
Table of contents4
-
8/2/2019 Pfa 10.0 Beta (Ang)
4/62
List of figures5
-
8/2/2019 Pfa 10.0 Beta (Ang)
5/62
List of tables
6
-
8/2/2019 Pfa 10.0 Beta (Ang)
6/62
Glossary
7
-
8/2/2019 Pfa 10.0 Beta (Ang)
7/62
GENERAL INTRODUCTION
Speech recognition is a technology that allows computer software to interpret a
natural human language to control a well-defined system.
8
-
8/2/2019 Pfa 10.0 Beta (Ang)
8/62
Early research in automatic speech recognition began around 40 years in the U.S.
during the Cold War through the early attempts to create a machine capable of
understanding human speech in order to interpret Russian intercepted messages. Now the
development of speech recognition has continued to evolve, taking great importance since
it became widely used by:
Large firms for some of their internal applications or look in applications
based on speech recognition (Dragon Naturally Speaking, etc.) All these
applications generally use their own speech engine, as there are also companies
that specialize in creating and selling these engines; Voxalead example (still
experimental).
People with disabilities by allowing them greater autonomy.
Speech recognition can also be linked to many planes of science (natural
language processing, linguistics, formal language theory, information theory,
signal processing, neural networks, artificial intelligence, etc..).
In fact, we can see that this technology today represents a potential market in the
world of software selling because some speech recognition and PC formed an
indispensable means of intellectual and social development.
As part of our End of Year Project, we thought of creating a speech recognition
system for controlling the management system MySQL databases that will facilitate its
manipulation to develop other projects.
To design a system of automatic speech recognition (ASR) as correct as possible, it
should:
Firstly to understand how the speech signal is really complex, ie know the object or
observation input
Secondly to define properly the task of the system, ie the constraints and expected
performance.
We briefly present the project, then we will expose problems with the study of what
exists, then we will present the different needs and needs to improve the current system
9
-
8/2/2019 Pfa 10.0 Beta (Ang)
9/62
and define the various specifications (platform, tools,...). Finally we propose a system that
we deem appropriate.
CHAPTER I: PRESENTATION OF THE PROJECT
I- Context of the project
10
-
8/2/2019 Pfa 10.0 Beta (Ang)
10/62
Nowadays the voice technology is broadcasting across different operating systems
and each day the necessity of this technology is getting enlarged continually.
Within the context of our project, we applied this technology in MySQL data base
utility with the aim of knowing how vocal application work.
This project was created using local resources of the Private High School of
Engineering and Technology ESPRIT".
II- The choice of methodology
To better achieve the project, it is essential to establish a process aiming to help to
formalize the preliminary stages of developing a system to make this development morefaithful to the client's needs.
Given the number of available methods (2TUP, RUP, AGILE methods ...), the choice
becomes difficult; leading a project manager was asked during a startup project:
How will I organize the development teams?
What are the tasks assigned to whom?
How long would it take to deliver the product?
How do we involve the client development to capture the needs of it?
The following table shows us the advantages and disadvantages of each methodology
11
-
8/2/2019 Pfa 10.0 Beta (Ang)
11/62
Table1: Comparative table of design methodologies
Justification of our choice :
Given that our project is based on a well-defined development process that will determine the
functional needs expected of the system until the final design and coding 2TUP has appeared
the most appropriate to lead and plan the sequence of stages during this project. Two Tracks
Unified Process respond to the constraints imposed continual change information of the
company systems.
III Introduction to the 2TUP methodology:
Abbreviation of "Two Track Unified Process. It is a process that meets the needs
of the Unified Process. The process 2TUP responds to constraints imposed by continual
change information Systems Company. In this sense, it strengthens the control over the
evolving capacities and correction of such systems. "Track 2" literally means that the
process follows two paths or limbs. These are the "Functionnal ways"and "technical
architecture", which correspond to the two axes of change imposed on the system
information.
12
-
8/2/2019 Pfa 10.0 Beta (Ang)
12/62
Figure1: Two types of constraints imposed on the information system
1-The functional limb:
This part capitalizes the knowledge of the company's business. It generally
constitutes an investment in the medium and long term. The functions of the information
system are in fact independent of the technologies used.This part includes the following steps:
1 - The capture of requirement needs, producing a model focused on the needs of the
business users.
2 - Functional Analysis.
2-The technical limb:
It capitalizes the know-how. It is also an investment for the short and medium term.
Techniques developed for the system can be in effect independently of the the functions
to be performed. This part includes the following steps:
1- Capture of technical needs.
2 - The Generic Design
3-Branch of the middle
As a result of the developments in the functional model and technical architecture,
implementation of the system is to merge the results of the two limbs. This merger results
in the production process of a Y-shaped
This part includes the following steps:
1. Preliminary design.
2. Detailed design.
3. Coding.
4. Integration.
13
-
8/2/2019 Pfa 10.0 Beta (Ang)
13/62
Figure2: Development Process in Y
14
-
8/2/2019 Pfa 10.0 Beta (Ang)
14/62
CHAPTER II-THE FUNCTIONAL PART
I-Preliminary study
Figure3: Preliminary Study Schema
As the diagram above shows, the preliminary study is the first step 2TUP. It is to
perform an initial identification of the functional and operational needs, mainly using the
text.It prepares more formal activities to capture functional needs and capture techniques.
For our project this study was achieved through the development of a specification. It
examined the various systems already on the market, tried of identify the positive and
negative sides through the critical part to fix our main objectives, articulate the needs and
secure the modules that will maintain or improve by thereafter.
Last stage of this study is the modeling of a context diagram
15
-
8/2/2019 Pfa 10.0 Beta (Ang)
15/62
Figure 4: Functional Schema
Description of the schema
1. The speaker emits a sentence, once the sound; it is captured by a microphone.
2. The voice signal is then digitized using an analog to digital. The setting of the
signal provides a fingerprint.
3. The decoding is to describe the acoustic signal in terms of linguistic units. It aims
to segment the signal, identifying of the different segments is based of the phonetic and
linguistic constraints.
Once the analysis process is completed the recognition phase begins, in fact all the
words spoken are separated by silences of duration greater than a few tenths of a second
"recognition phase consists mainly of two phases:
1) The learning curve : The speaker pronounces the whole vocabulary often several
times to create a reference dictionary
2) The recognition phase : The speaker stated before a word. To recognize the
words emitted by the speaker there are three parts:
- First, the sensor: to apprehend the phoneme physical balance, we in our case it is
the microphone. A signal is transmitted to the microphone when the speaker speaks.
- Second, the parameterization of forms which gives us an impression that is to say,
the characteristic sound (Time / Frequency / Intensity).And finally, the identification of
16
-
8/2/2019 Pfa 10.0 Beta (Ang)
16/62
forms. A second schema is needed to better absorb all its different use cases that should
be treated.
Figure 5: Operating principle of speech recognition
This diagram shows over the operating principle of recognition: a speaker pronounces
a word vocabulary. Then word recognition is a typical problem of pattern recognition.
Any system of pattern recognition always involves the following three parts:
- A sensor for understanding the physical phenomenon under consideration (in our
case a microphone).
- A floor-parameterization of the shapes (eg a spectrum analyzer).
- A floor-loaded decision to classify an unknown form in one of of the possible
categories.
II- Capture of the Functional needs
Once the preliminary study is done we move to the next step, in which we will
determine the functional needs on the left branch and the parallel technical needs on the
right branch.
17
-
8/2/2019 Pfa 10.0 Beta (Ang)
17/62
1-Functional Needs:
This project consists in designing and implementing a tool for voice manipulation of the
database. First we will start by defining the actors who will interact with the system.
Considering the need for our application, it appears that the main actors are reduced to an
administrator and a user. The administrator responsible of database creation and maintenance of
the accounts of the individual users using the database, all these tasks must be completed only
by his voice. The user can access the database through his natural voice, after authentication we
can fulfill the request LDD.
Figure 6: Capture of the functional needs
This project consists in designing and implementing a tool for voice manipulation of the
database. First we will start by defining the actors who will interact with the system.
Considering the need for our application, it appears that the main actors are reduced to an
administrator and a user. The administrator responsible of database creation and maintenance of
the accounts of the individual users using the database, all these tasks must be completed only
by his voice. The user can access the database through his natural voice, after authentication we
can fulfill the request LDD.
1.1-The use case diagram:
The use case diagram reflects the principle of the overall functioning of our
application and the various actions of the actors. The study of the needs of the actors whointeract with our system requires the development of use cases as follows:
18
-
8/2/2019 Pfa 10.0 Beta (Ang)
18/62
Figure 7: Use case diagram
We consider that in our system two users are possible: The administrator and the normal user.
The administrator accesses to all the existing use cases including that of manipulating the
database, so that the user can only manipulate the database once created.
a) Voice Authentication:
The user pronounces his login and password in authenticate to access the main interface of
MySQL.
The following table details the process of authentication
Title Authentification
Intention Authentification des utilisateurs.
Actors Users.
Preconditions MySQL available Preconditions MySQL available
Start when Application is launched.
Definition of Transitions - pronounce the login and password.
Finish when The administrator or the user's choice validates the session
and connects
Exception(s) Invalid user name. Invalid password. MySQL not found.
19
-
8/2/2019 Pfa 10.0 Beta (Ang)
19/62
Postconditions MySQL Menu
Table2: nominal scenario of the use case "Voice Authentication"
b) Manipulate of vocally database:
Title: Manipulating the database
Summarizing: The user can create, modify or delete one or more databases and create and
execute queries LMD vocally.
Actors: the application user.
The table details the process of the vocal manipulating the base.
Title Vocal manipulation of the date base
Intention Creating and manipulating a database in MySQL
Actors User of the application
Preconditions Authentication succeeded
Start when the main window of MySQL opens.
Definition of
Transitions
CASE 1: The user wants to create a database:
-pronounced "create new schema".
-Say the name of the database to create.
- Confirm selection.CASE 2: The user wants to delete of database:
20
-
8/2/2019 Pfa 10.0 Beta (Ang)
20/62
-Say the name of of table to drop.
- pronounced "drop database".
- Confirm selection.
CASE 3: The user wants to create a new table in the database:
-Select of database
-Be 'create new table. "-Say the name of the table to create.
- Confirm selection.
CASE 4: The user wants to create a query LMD:
- pronounce the name of of table.
-Deliver the application to create.
- Confirm selection.
CASE 5: The user wants to execute a query LMD:
- Say the name of the table.
-Say the word "execute"
NB: for this case the user must write the LMD querry
Finish when The user confirms his choice.Exception(s) - the name of the database or table already exists.
-Syntax Error in SQL.Table3: nominal scenario of the use case "Vocal Manipulation"
c) Manage vocally users:
Title: Managing users
Summary: The Administrator can create, modify or delete a user account
Actors: Administrator application.
The table details the process of managing users.
Title Manage the User
Intention Creation, modification or deleting of the user account
Actors Administrator
Preconditions Authentification en tant que administrateur russi.
Start when The interface MySQL Administrator opens
Definition of
Transitions
CASE 1: The user wants to create a new user:
- Say "user administration ".
-Say "add new user".
- Say the name and password.
-Say "apply changes "CASE 2: The user wants to delete a user:
21
-
8/2/2019 Pfa 10.0 Beta (Ang)
21/62
-Say "user administration ".
- pronounce the name of the user.
- pronounced "drop user".
- pronounced "ok"
CASE 3: The user wants to create a clone user:
-Say "user administration ".- pronounce the name of the user.
- pronounced "clone user".
- Give the name and password of the user decision
-ok.
Finish when The administrator completes
The session disconnects
Exception(s) Invalid user name.
Invalid password.Table4: nominal scenario of the use case "Manage user"
1.2- The Activity diagram
1. User pronounce a sentence through a microphone.
2. The voice signal is then analyzed using the model to achieve an Acoustic signal.
3. The decoding is to describe the acoustic signal in terms of linguistic units. It aims to segment
the signal.
4. Segmented signal is compared with the database (dictionary) thanks to the search graph.
5. Projection of the action on screen.
22
-
8/2/2019 Pfa 10.0 Beta (Ang)
22/62
Figure 8: Activity Diagram
1. User pronounce a sentence speaker emits a sentence through a microphone.
2. The voice signal is then analyzed using the model to achieve an signal.
3. The decoding is to describe the acoustic signal in terms of linguistic units. It aims to
segment the signal.
4. Segmented signal is compared with the database (dictionary) thanks to the search
graph.
5. Projection of the action on screen.
2 - Non-Functional Needs:
Besides the functional needs developed above, we must consider the following constraints:
-The service quality of the application:
-Ergonomics of the application:
-The interfaces of our application must be clear
23
-
8/2/2019 Pfa 10.0 Beta (Ang)
23/62
-The response time of the application should be minimal.
III-Functional Analysis
Figure 9: The functional analysis
1 - Cutting into categories
It consists of
1) Divide the class into categories Candidate
2) Elaborate preliminary class diagrams by categories
3) Decide the dependencies between categories
24
-
8/2/2019 Pfa 10.0 Beta (Ang)
24/62
Figure 10: Cutting into categories
1.1- The packages Diagram:
The package diagram is a graphical representation of the relationships between the packages
from the speech recognition system
25
-
8/2/2019 Pfa 10.0 Beta (Ang)
25/62
Figure 11: Packaging Diagram
The package and more general overall is the "general media treatment" decomposed of
two packages which are: "Sound treatment " and "Speech treatment "
- The Sound is the wave which it's audible to ear for the humain.
- The Speech Is the process of stretching and relaxing Vocal cords to Produce sound.
The package that we will interests on is the "Speech treatment " which in turn is divided
into two sub packages that are "Speech Synthesis " and "Speech recognition ".
Our development is based on "Speech recognition".
26
-
8/2/2019 Pfa 10.0 Beta (Ang)
26/62
2 - Development of the static model
Figure 12: nominal scenario of the use case "Voice Authentication"
2.1-Diagram of Classes:
The different classes are
Caller ou Appelant
Instruction
Language Instruction
Recording ou Enregistrement
Speech Recognizer ou Reconnaissance Vocale
Feature ou Caractristique
Feature Extraction ou Extraction de Caractristique
27
-
8/2/2019 Pfa 10.0 Beta (Ang)
27/62
Feature Classification ou Classification de Caractristique
Feature Matching ou Correspondance de Caractristique
Code book ou Dictionnaire
Action.
The various relationships are :
Listen
Record
Send Speech Signal
Perform
Search And Match
Contain
Figure 13: Model participating Class Diagram
28
-
8/2/2019 Pfa 10.0 Beta (Ang)
28/62
2.2-Description of class diagram
Class "Caller" has a relationship "Listen" with another class "Instruction ". The caller can
listen to a type of instruction, which is "Language Instruction".
Then of class "Caller"is associated with a 'Record' with the class "Recording". This class
it's then associated with a 'Send Speech Signal "with class" Speech Recognizer ".
The class 'Speech Recognizer " is then associated with of class" Feature "in a
relationship" Perform ", which means that the class' Speech Recognizer" contact of class
"Feature" in feature extraction, classification of the features and feature matching.
However, the class "Feature Matching" is associated with the relationship "Search and
Match" with class " code book" to match the input speech, and its associated to the class
Action with the relation contain.
Note: To ensure clarity of the diagram we preferred not to put the attributes and class
methods.
We defined physical model needs to consist of 3 classes that facilitate of implementation
of our application.
3 - Development of dynamic model:
The development of the dynamic model is the third activity of the analysis stage. It is
situated on of left branch of the cycle Y. This is an iterative activity, strongly coupled
with the activity of static modeling, described above. The development of dynamic model
precedes the preliminary design.
29
-
8/2/2019 Pfa 10.0 Beta (Ang)
29/62
Figure 13: The dynamique model
3.1- Sequence diagram:
The sequence diagram is mainly used to show interactions between the 6 categories
listed / objects in of previous section. However, this interaction is in a sequential order
that interactions take place. The figure shows the sequence diagram system, including the
classes / objects, lifelines, processes and interactions. The interactions between the seven
classes / objects are numbered 1 through 11 sequentially, which indicates which process
should be done first to implement the following process.
30
-
8/2/2019 Pfa 10.0 Beta (Ang)
30/62
Figure 14: object sequence diagram of a speech recognition
Note:
The nominal scenario for voice recognition has been represented by details in the
sequence diagram above (see Figure 14), in the following sequence diagrams we chose to
group the class instances related to the recognition of a proceeding that is "Recognition
System" according to the diagram package described later (see diagram package Figure
11).
31
-
8/2/2019 Pfa 10.0 Beta (Ang)
31/62
Administrator
:Recognition system :Data base utility :MySQl data base
ref
alt
connection
Pronounciation of login and password
Verification of grammar which countain login and psw
altValid grammar of parameter connection
Insertion of the login and psw in the f ield of textVerification of login and psw
Invalid g rammar
demand of parameter of connection
valid parameter of connectionResponse
Displaying MySQl administrator interface
Response
Demand of parameter of connection
Invalid parametre
Figure 15: sequence diagram representing the connection
32
-
8/2/2019 Pfa 10.0 Beta (Ang)
32/62
Administrator
:Recognition system :Data base utility :MySQl data base
ref
alt
connection
ouncing of the control privilege u ser command
Verification of grammar
manipulation the mouse to insert privilege
addition of privilege
Valid grammar
Demand to pronounce again the command
Figure 16: sequence diagram representing the affection a privileged user
33
-
8/2/2019 Pfa 10.0 Beta (Ang)
33/62
Administrator
:Recognition system :Data base utility :MySQl data base
ref
alt
connection
pronouciation of the creation users command
Verification of grammar
manipulation the mouse to insert new user
addition of new user
Valid grammar
Demand to pronounce again the command
Opening the new user information interface
pronouciation of the login information
Verification grammar
alt Valid grammar
Insertion of the login information in the f ield of textattribute the login information
Demand to pronounce again the command
invalid grammar
Figure 17: sequence diagram addition a user
34
-
8/2/2019 Pfa 10.0 Beta (Ang)
34/62
3.2- Diagram of States transitions
Now that the scenarios were formalized, the knowledge of all interactions between
objects allows representing business rules system dynamics. However, it should focus on
class behavior richest precisely in order to develop some of these dynamic rules. It uses
this concept of finite state machine, which involves tracking the life cycle of a generic
object of a particular class over its interactions with the rest of the world, in all possible
cases. The local view of an object, describing how he reacts to events based on its current
state and moves into a new state, it's plotted as a state diagram.
Figure 18: sequence diagram of states transitions
35
-
8/2/2019 Pfa 10.0 Beta (Ang)
35/62
4-Confrontation between the static and the dynamic models:
The various relationships that exist between the main concepts of the static model (object,
class, association, attribute and operation) and the main dynamic concepts (message,
event, state and activity).
The matches are far from trivial, because it is indeed complementary points of view and
not redundant. Try of synthesize the most important, without being exhaustive:
a message can be an operation invocation on an object (the receiver) by another object
(the issuer);
An event or effect on a transition may correspond to the call of an operation;
An activity in one state may affect the performance of a complex transaction or a series
of operations;
A diagram of interactions involves objects (or roles);
An operation can be described by an interaction diagram or activity;
A guard condition and a change event attributes can view links or static;
An effect on a transition can handle attributes or static links;
The setting of a message can be an attribute or an entire object.
36
-
8/2/2019 Pfa 10.0 Beta (Ang)
36/62
Chapter III-The Technical part
I-Capture of the technical requirements:
The capture of requirements, which identifies all the constraints on the choice of
dimensioning system design. Tools and equipment selected, thus that taking into account
the constraints of integration with the existing (pre-requisite technical architecture).
Figure 19: Capture of the technical requirements
Part of the work consisted in the study the functioning of speech recognition systems, to
attach then to develop an acoustic model allowing the recognize words.
This is why, we propose a first step which we introduce the Hidden Markov Models,
mathematical concept that will allow to discuss the layout of the operating systems of
automatic speech recognition (ASR). And in a second step, we will apply this model in
our project.
1-The Hidden Markov Models:37
-
8/2/2019 Pfa 10.0 Beta (Ang)
37/62
Definition :
A Markov process is a discrete time system which is constantly in a state
taken from N distinct states. The transitions between states occurs between two
consecutive discrete instants, according to some probability law. The probability of each
state depends only on the condition that immediately precedes it.
A hidden Markov model (HMM) represents as the same way as a Markov chain, a
whole sequence of observations whose state of each observation is not observed, but
associated with a probability density function . It is therefore a stochastic process in
which observations are a random function of the state and whose state changes every
moment according to the probabilities of transition from the previous state.
Figure 20: The Markov model
More formally, a state machine hidden Markov is characterized by a quadruplet
assemblies described below:
-Ifthe state of i
- iis the propability ofIf be the initiale state
-aijis the propability of transitionIf If
-bi(k)probability of emitting the symbol kbeing in the stateIf.On condition that :
-The sum of the probabilities of initial states is equal to 1
i = 1
i
- The sum of probabilities of transitions from a state is equal to 1.
- The sum of probabilities of outputs from a state is equal to 1
38
-
8/2/2019 Pfa 10.0 Beta (Ang)
38/62
We can describe a hidden Markov model as the parameter set
= (, A, B)
With :
- the set of the initial probabilities
- A the set of transition probabilities between states.
- B the set of laws (or densities) of probabilities associated with a state.
2-The Voice recognition theory:
We use speech recognition to dial a phone number, browse through the windows on our
computer, entering data into a software or dictate a letter in a word processor, the basic problem
remains to the same: identify the meaning of a flow of words uttered often in a background
noise more or less important.
This task is made difficult not only by the deformations induced by the use of a
microphone but also by a number of factors inherent to human language:
- homonyms where the same sequence of sounds can correspond to several words (like
the sound "s-in" in "cent" "sans" San [Francisco "sang" means bloodI ).
- The local accents.
- Patterns of language (as some elisions that make it difficult to separate the words: ("
j'vais l'chercher") in ( je vais le cherchI)..
- The speed differences between the users.
- Imperfections of a microphone ...
For the human ears, these factors do not usually represent difficulties. The brain plays
with these deformations of speech by taking into consideration almost unconsciously, nonverbal
and contextual elements that allow to eliminate ambiguities.
Only by taking into account these elements surrounding the sound itself that the voice
recognition software can achieve high degrees of reliability.
Today software that give the best results are all based on a probabilistic approach.
The aim of speech recognition is to reconstruct a sequence of words M from a recorded
acoustic signal A.
39
-
8/2/2019 Pfa 10.0 Beta (Ang)
39/62
In the statistical approach, we will consider all sequences of words M which could match
the signal A.
In this set of possible consequences we will then choose one which is most likely that is
to say that maximizes the probability P (M / A) that M is the correct interpretation ofA what is
Note M=arg max P (M / A).
Figure 21: reconstruction of a sequence of words M from a recorded acoustic signal A.
Note that P (A / B) represents the probability of event A if the event B has occurred.
The axiom of Bayes calculates the probability of concurrence of two events A and B by the
following equalities
P (A and B) = P (A / B) P (B) = P (B / A) P (A)
Where P (A) is the probability that event A occurs.
Thus, the axiom Bayes can rewrite the expression:
And as P (A) is a constant search for the best M, we have finally the equation:
This last equation is the key to the probabilistic approach to speech recognition. In fact, the
first term P (A / M) represents the probability of observing the acoustic signal A if the
sequence of words was pronounced M: it is a purely acoustic problem.
The second term P (M) represents the probability that it is the sequence of words that M was
pronounced: it is a linguistic problem.
The above equation tells us so that we can divide the problem of speech recognition into two
independent parts: we will model separately the aspects acoustic and language problems.
40
-
8/2/2019 Pfa 10.0 Beta (Ang)
40/62
Thus, the transcript is divided into several modules
feature extraction produces A
using the acoustic model calculating P (A / M) and M, looking for the hypotheses that are
likely associated to A.
using the language model calculating P (M) to select one or more assumptions on M
depending on the language knowledge
The following schema illustrates the components of a transcription system.
Figure 22: The transcription systme
3-Features extraction:
The sound signal to be analyzed in the form of a wave whose intensity varies over time.
The first stage of the transcription process is to extract a series of numerical values
sufficiently informative on the acoustic level to decode the signal thereafter.
41
-
8/2/2019 Pfa 10.0 Beta (Ang)
41/62
The signal may contain areas of silence, noise or music. These areas are first removed in
order to have only portions of useful signal to the transcript, that is to say, those
corresponding to speech.
The sound signal is then segmented into what are described as breath groups, using as
delimiters of silent pauses long enough (about 0.3 s). The advantage of this segmentation is
to have a continuous tone of a reasonable size compared to the capabilities of model
calculations of the ASR system. Later in the transcription process, the analysis is done
separately for every breath.
To identify changes in the signal, which generally varies rapidly over time, the group is
blowing itself divided into windows of a few milliseconds of study (usually 20 or 30 ms).
In order to avoid losing important information on the top or end of windows, we made sure
that they overlap, which leads to extract features every 10 ms.
From the signal contained in each analysis window are calculated numerical values
characterizing the human voice. After this step, the signal becomes a sequence of vectors
called acoustic dimension often greater than or equal to 39.
4- The Acoustic Model:
The next step is to associate the acoustic vectors, which are, as we have seen, numeric
vectors, a set of assumptions of words (symbols). Referring to equation 1 of the statistical
modeling, this amounts to estimate P (A / M). The techniques for calculating this value
form what is called the acoustic model.
The most used tool for modeling the acoustic model is the Hidden Markov Model
presented above. The HMMs have indeed shown their effectiveness in practice to recognize
speech. Even if they have some limitations to model signal characteristics, such as the
duration or length of successive acoustic observations, the HMMs offer a well-defined
mathematical framework to calculate the probabilities P (A / M).
Acoustic models involve three levels of HMM shown in the figure below.
42
-
8/2/2019 Pfa 10.0 Beta (Ang)
42/62
Figure 23: The acoustic model
They look at first to recognize the types of sound, in other words to identify the phones (which
sounds are pronounced by speakers and defined by specific characteristics). To do this, they
model a phone by an HMM, usually 3 states representing the beginning, middle and the end.
The hidden variable is then sub-phone and acoustic observations are acoustics vectors.
To calculate the probabilities of observation in each state, two approaches are often
considered, one based on the representation of probability densities by Gaussian el'autre based
on neural networks. These different methods establish assumptions about the likelihood of
phones uttered. However, the aim of acoustic models is to determine a sequence of words.
Acoustic models for this purpose use a dictionary of pronunciations, making the
correspondence between a word and pronunciations. As a word may be pronounced in
different ways, according to his predecessor and his successor, or simply as the habits of the
speaker, there may be multiple entries in the lexicon for the same word. The indications are
given through the features of pronunciation phonemes.
43
-
8/2/2019 Pfa 10.0 Beta (Ang)
43/62
The second level of HMM models the words from the HMM representing phones and lexicon
of pronunciations. It comes in the form of a lexical tree initially containing all the words in the
vocabulary gradually pruned as and when the phones are accepted. Since HMMs modlisent
first level of phonemes, not phones, phonemes found in the dictionary pronunciations are
converted into phones to recognize words. Transformation rules depending on the context of
developing phoneme are then used.
The third level models finally the sequence of words M in a group of breath and can then
incorporate the knowledge gained from the language model on M. To establish the HMM
equivalent to a word graph, the HMM corresponding to the lexical tree is duplicated each time
the acoustic model makes the assumption that a new word has been recognized.
The functioning of the acoustic model just described is facing a major problem: the search
space of higher-level HMM is often considerable, especially if the vocabulary is important and
if the breath to be analyzed contains multiple words . Algorithms from dynamic programming
can effectively calculate the probabilities. These are mainly the Viterbi algorithm and the
decoding stack, also called decoding A *. In addition, use is made of very regular pruning to
keep only those assumptions that could be most interesting.
The role of the acoustic model is thus to align the sound signal with theories of words using
only acoustic indices of order. It includes in its last level modeling information about the
words introduced by the language model.
5-The model language :
The language model is intended to find sequences of words most likely, in other words
those that maximize the value P (M) of equation 1. If one refers to the highest level of
HMM acoustic model (see previous figure), the values P (M) are the probabilities of
successive words.
a) Functioning of a language model
By placing M1N = M = m1 ... mn, where m is the word of rank i of the sequence M, the
probability
P (M) is as follows:
44
-
8/2/2019 Pfa 10.0 Beta (Ang)
44/62
The evaluation of P (M) reduces then to calculate P values (mi) and P (mi | M1i - 1)
respectively which are obtained using the equalities
Where V is the vocabulary used by the ASR system, and C (mi) and C (M1i) represent the
respective numbers of occurrences of the word and half of the sequence of words in the
corpus M1i learning. Unfortunately, predicting the sequence of words M1i, the number of
parameter P (m) and P i (mi | M1i - 1) of the language model to estimate, increasesexponentially with n. In order to reduce this number, P (mi | M1i - 1) is modeled by a N-
gram, that is to say, a Markov chain of orderN-1 (with N> 1) using the equation:
P (mi | M1i - 1), P (mi | mi - N + 1i - 1)
This equation indicates that every word may be mid predicted from the N-1 preceding words.
ForN = 2, 3 or4 refers respectively bi gram model, trigram or Quad gram. ForN = 1, the
model is said united program and returns to estimate P (mi).
Generally, these are models bis grams, trigrams and quad grams that are used in language
models for ASR.
6- The choice of Sphinx API :
Sphinx 4 is a speech recognizer written entirely in Java. The goals are to have a speech
recognition highly flexible to equal the other commercial products and develop collaborative
research centers from various universities, laboratories of Sun and HP, but also from MIT.
While being highly configurable, recognition of Sphinx 4 supports including single words and
phrases (use of grammar). Its architecture is scalable to enable new research and test newalgorithms.
The recognition quality depends directly on the quality of voice data. The latter being the
information relating to their own voices. Examples are different phonemes, the individual
words (vocabulary), different ways of pronunciation. More information will only be important
and known by the system, the better his reaction and his choice to make.
As shown in the following figure which represents its architecture, Sphinx 4 is based on 3
modules.
45
-
8/2/2019 Pfa 10.0 Beta (Ang)
45/62
Figure 24: General architecture of the Sphinx-4
6.1- The Architecture of sphinx -4 :
Figure 25: Detailed Architecture of Sphinx-4
The main blocks are the frontend, decoder, and the linguist. The support blocks include the
configuration manager and the tool blocks.
The frontend takes one or more input signals and meterizes by a sequence of functions. The
linguist translates into any kind of model of standard language, as well as information on the
pronunciation of dictionary and structural information of one or more sets of acoustic modelsin a search graph. The research director in the decoder uses the featured frontend and search
46
-
8/2/2019 Pfa 10.0 Beta (Ang)
46/62
graph of the linguist to do the actual decoding, generating results. At any time before or during
the recognition process, the application can issue checks to each module, becoming a partner
in the recognition process.
a) The Frontend
Front-End cuts the recorded voice into different parts and prepare them for the decoder.
The aim of the Front End is to set an input signal (example, audio) into a sequence of outputs.
As illustrated in Figure 26, the frontend has one or more parallel chains of replaceable
communication signal processing modules called "dataprocessors.
Support multiple channels allows simultaneous calculation of different types of parameters of
the input signals are identical or different. This allows the creation of systems that cansimultaneously decode types derived from non-voice signals.
Figure 26: Parallel chains of communicating Data Process
b) The Linguist :
The linguist generates searchgraph which is used by the decoder during the search, at the same
time hiding the complexity of the generation of this graph. As the case along the Sphinx-4, the
linguist is a plug-in module allows people to dynamically configure the system with different
linguist implementations.
A typical implementation of constructs searchgraph using the structure of the language
represented by a given language model and the topological structure of AcousticModel (HMM
for basic sound units used by the system).
During generation of searchgraph, the linguist may also incorporate sub-word units in the
contexts of arbitrary length.
By allowing different implementations of the linguist to be connected to the execution,
Sphinx-4 allows individuals to provide different configurations for different systems and
recognition. For example, a simple numerical application recognition digits may use a single
linguist who keeps the search space entirely in memory. The linguist is based around threecomponents which are described in the following sections:
47
-
8/2/2019 Pfa 10.0 Beta (Ang)
47/62
The language model
The dictionnary
The acoustic model
b.1) The language model :Role :
Describes what can be said in a very special context.
Helps narrow the search space.
There are three kinds of language model: the simplest is used for isolated words, the second
for applications based on commands and controls and the last for the current language.
The model implementation language supports several types of grammars, we opted for The
Grammar JSGF that supports the Java TM Speech API Grammar Format (JSGF) [20], whichdefines a BNF style, platform independent representation Unicode and vendor-independent
grammars.
b.2) The Dictionary
The dictionary gives the pronunciation of words found in the languageModel. The
pronunciations of the words cut into sequences of sub-word units found in the AcousticModel.
Dictionary interface also supports the classification of words allows for one-year term to be in
several classes.
b.3) The AcousticModel
The module AcousticModel provide a correspondence between a unit of speech and an HMM
that can be scored against incoming characteristics provided by the Frontend.
b.4) the Search Graph
The SearchGraph is the main data structure used during the decoding process.
It is a directed graph where each node, called SearchState, represents either a state issue or not
transmitter. States transmitters can be scored against incoming noise characteristics while non-
issues are generally used to represent higher level language constructs such as words and
phonemes that are not directly scored against the elements involved. The arcs between states
represent possible state transitions, each with a probability representing the likelihood of
transition along the arc.
48
-
8/2/2019 Pfa 10.0 Beta (Ang)
48/62
How is built SearchGraph affects memory footprint, speed and accuracy of recognition. The
modular design of Sphinx-4, however, allows different strategies to be used
SearchGraphcompilation without changing other aspects of the systems.
The choice between static and dynamic construction of language HMMs depends mainly on
the size of vocabulary, complexity of the language model and the desired memory footprint of
the system, and can be performed by the application.
c) Decoder
The decoder is the heart of the Sphinx 4. It was he who processes the information received
from the
Front-End, analyzes and compares them with the knowledge base to give a result to the
application.
The main role of the Sphinx-4 decoder block is to use the features of the Front End in
collaboration with the linguist SearchGraph to generate hypotheses results. The block decoder
includes SearchManager ins and other supporting code that simplifies the decoding process of
an application. As such, the most interesting element of the block decoder is SearchManager.
The decoder simply tells the SearchManager recognize a frameset features. At each step of the
process, creates SearchManager results object that contains all the paths that have not reached
a final state transmitter.
7-Technical use case diagram
This results in the block diagram of the overall operation of the system and the various
actions of the actors.
The study of the needs of actors who interact with our system requires the development of
use case diagram as follows:
49
-
8/2/2019 Pfa 10.0 Beta (Ang)
49/62
Figure 27: Technical use case diagram
The different use cases are:
Listen for instructions: to capture the signal from the microphone
Save the speech signal: save the signals from the microphone
Analyze the speech signal: Segmenting the signal into phoneme
Match the speech signal: match the signal to the data base
Match the feature vector : match the characteristics of the signal analyzed in the data
base
Extract the feature vector:analyze the signal in the extractant carectristiques
significant
Classify Feature vector: entities classify the signals analyzed by category
Different actors are:
User
The dictionary (code book)
II- The Generic Design
The generic design, which then defines the components needed to build the technical
architecture. This design is completely independent of the functional aspects. It aims to
standardize and reuse the same mechanisms for all systems. The technical architecture
built the backbone of the system, its importance is such that it is advisable to make a
prototype.
50
-
8/2/2019 Pfa 10.0 Beta (Ang)
50/62
Figure 28: The generic conception
Software layers
Sphinx-4 has been compiled and tested on Solaris, Mac OS X, Linux and
Windows. The execution, compilation and testing of Sphinx-4 require additional
software. The following software must be installed on the machine:
- Java SDK 5.1. http://java.sun.com.
- The various libraries that make up the Sphinx-4
Exploitation and Configuration Software :
a) Implementation of the library with Eclipse
51
-
8/2/2019 Pfa 10.0 Beta (Ang)
51/62
The implementation of Sphinx-4 in an arbitrary application is relatively easy. The first
state is to create a new project (menu File - New - Project). The figure below shows how to
create a new project in Eclipse.
Figure 29: Creation of a new project
The second step is to insert bookstores Sphinx-4 in the draft. For this, we make a right
click on the project and we will in the project properties. It then chooses the menu "Java Build
Path". Finally we click on "Add External JARs" to add the various libraries provided by Sphinx.Libraries to add are the following:
Figure30 -insert libraries in the Sphinx-4 project
js.jar.
jsapi.jar (This must be created by launching the application jsapi.exe located in the lib
directory of the downloaded archive).
This library is used by Java among others to record sound.
52
-
8/2/2019 Pfa 10.0 Beta (Ang)
52/62
sphinx4.jar.
TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar.
Only for recognition of numbers.
WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.
b) Writing Grammar
To perform recognition, we must write a grammar, that is to say a file
describing the terms that must be recognized by the program. Grammars are used
by Sphinx JSGF format (Java Speech Grammar Format must then create a file with
an extension.''gram''. This file contains the grammar used by the application, that is
to say the words or phrases that are potentially pronounceable.
b.1) Example of grammar
Figure31 Gramar file
The file grammar above allows understanding all of the following sentences:
53
-
8/2/2019 Pfa 10.0 Beta (Ang)
53/62
Figure32 list of sentences which we can pronounce it
This figure shows the grammar above graphically.
Figure33 GraphicalGrammar structure
c) Writing the configuration file for Sphinx
After writing the grammar file, we must create the configuration file
Filename.config.xml. The easiest way is to use a configuration file to one of several
demonstrations provided in the downloaded archive. This file specifies among other
things used the dictionary and grammar used.
54
-
8/2/2019 Pfa 10.0 Beta (Ang)
54/62
Figure34 XML configuration File
55
-
8/2/2019 Pfa 10.0 Beta (Ang)
55/62
Chapter VI THE MIDDLE PART
I- the design part:
Fig35 The design part
The model design system organizes the system in components, delivering technical
services and functional. This model combines the information from the right branch and left
branch. It can be considered as the transformation of the analysis model by projecting the
analysis classes on the software layers.
The preliminary design is a delicate step because it integrates the functional analysis
model in the technical architecture in order to draw the mapping of system components to be
developed.
Detailed design, which then examines how to make each component.
The encoding step, which produces components and tests as and when the code units
completed.
56
-
8/2/2019 Pfa 10.0 Beta (Ang)
56/62
The recipe step, which is finally to validate the functionality of the developed system.
1- Detailed Design:
57
-
8/2/2019 Pfa 10.0 Beta (Ang)
57/62
58
-
8/2/2019 Pfa 10.0 Beta (Ang)
58/62
Fig36 The Class diagram detailed
The class diagram detailed outcome of the general class diagram (described in Part "2.1-
the class diagram).
NB: It is noted that some classes will be transformed in the following forms:
Class Codebook: will become our dictionary (database).
Class Instruction: will become as grammar file59
-
8/2/2019 Pfa 10.0 Beta (Ang)
59/62
Class LanguageInstruction: will become as grammar file
II- Realization part
1-Description of the applications interfaces :
In this part of the project we will goanna to show you the first exemple of our
application .
This interface present the home interface of all users.
Fig36 The home interface
This interface show the process of the addition of a new application
60
-
8/2/2019 Pfa 10.0 Beta (Ang)
60/62
Fig37 The home interface
This interface show how to edit existed application.
61
-
8/2/2019 Pfa 10.0 Beta (Ang)
61/62
Conclusion62
-
8/2/2019 Pfa 10.0 Beta (Ang)
62/62
This project has leads to the creation of an application for manipulating vocally some
other application to see its MySQL, SKYPE.
Thus, a search job on the internet and a careful study on the working tools were made tochoose the most appropriate architecture for the system.
Throughout this project we have done our best to improve our application but we faced on
a major problem: the development of an acoustic model customized to each user of our
application.
Concretely, the difference between the applications present on the market as (Dragon
Naturally Speaking, Speak Q, etc. ..) is the degree of perfection of the acoustic model;
What may be considered as the most important task as it requests additional time beyond
the deadline of our project.