The future of robot assistants: Building a hands-free ...
Transcript of The future of robot assistants: Building a hands-free ...
The future of robot assistants: Building a
hands-free voice-controlled quadcopter
Frau Amar, Pedro
Academic Year 2015-2016
Director: EMILIA GÓMEZ GUTIÉRREZ
Degree on Enginnering of Audiovisual Systems
Treball de Fi de Grau
- 1 -
- 2 -
The future of robot assistants:
Building a hands-free voice-controlled quadcopter
Pedro Frau Amar
FINAL PROJECT
DEGREE ON ENGINEERING OF AUDIOVISUAL SYSTEMS
POLYTECHNIC SCHOOL UPF
JUNE 2016
Director:
Emilia Gómez Gutiérrez
Department of Information and Communication Technologies
- 3 -
- 4 -
Dedication
Every challenge needs self-effort. Every project needs compromise. Every goal is reached
with perseverance and hard work. No matter what you do, take the tools and knowledge
your environment gives you.
I want to dedicate my humble effort to my mother, who has always been looking for my
future and my well-being; to my uncle Rodney, who has always been interested on my
projects regardless of their nature; but I specially want to dedicate this project to my
beloved father, rest in peace. He taught me how to be a real engineer with concerns about
the future, and gave me knowledge in several fields. He shared with me a lot of hobbies and
helped me so many times with my projects.
Regardless of the final mark, this project is a success for me and has somehow a little bit of
my father on it.
We will never forget you and will always love you.
- 5 -
- 6 -
Special thanks to:
First of all, I would like to thank Mrs. Verónica Moreno. After my father’s death, she has
always been concerned about me and my academic career, and has always been very
helpful when I needed someone to guide me with my studies.
Secondly, I would like to thank my supervisor, Dr. Emilia Gómez. She helped me when she
was my teacher and guided me all along this project. She has been very attentive and
helpful and has always kept her office opened for me.
I would also like to thank Dr. Boris Bellalta and Dr. Antoni Ivorra as they also shared their
time with me to advise me on the steps of my project.
Finally, I would like to thank my mother, my father, my brother and my sister, the rest of
my family and my friends. They have always been interested about my trajectory and have
given me support on all my decisions.
- 7 -
- 8 -
Abstract:
The development and applications of drones has increased over the last few years. Their
uses are extensive and their versatility has no potential limits. The evolution of those
unmanned aerial vehicles causes a concern for the future. Although people are currently
using them for entertainment, new functionalities can be found. In this project we are
particularly interested in their potential to assist people with physical disabilities, we
propose then to transform the drone into a hands-free device using voice control. We show
potential applications for this solution and the challenges it involves. We build a 1 kg
quadcopter, let the user control it using her/his voice, and incorporate a voice synthesizer to
communicate the user relevant information and make it more human-friendly. The project
is framed on an increasing evolution of intelligent systems and robotics.
Resumen:
El desarrollo y aplicaciones de los drones ha incrementado estos últimos años. Sus usos son
extensos y su versatilidad no tiene límites potenciales. La evolución de estos vehículos
aéreos no tripulados crea inquietudes de cara al futuro. Aunque actualmente la gente está
usando los drones para su entretenimiento, nuevas funcionalidades pueden ser halladas. En
este proyecto nos interesa particularmente su potencial para asistir a personas con
discapacidades físicas, por lo que proponemos transformar el dron en un dispositivo manos
libres usando control por voz. Para ello nos enfrentamos a una serie de retos tecnológicos
que establecerá un rango con diferentes aplicaciones. En el proyecto hemos construido un
cuadricóptero de 1 kg, que el usuario puede controlar mediante su voz, e incorporamos un
sintetizador de voz para comunicar al usuario información relevante. El proyecto pretende
contribuir a la evolución actual de sistemas inteligentes y robótica.
- 9 -
- 10 -
Preface
With the omnipresence of camera-carrying multicopters at trade shows, we just can ask
ourselves how huge the drone industry will become and how long it will take to get there.
Multicopter technology is clearly focusing on cinematography even if it is still in its early
ages. This can be seen in the improvement of image-stabilization and the fall of the prices.
Nowadays, enterprising shooters have gotten some amazing footage by flying tiny
multicopters into places that no other machine can arrive because of its unsafety.
In addition, companies are raising funding because of the future of their product.
Multicopters are becoming more powerful and acquiring new skills. Skills among which we
can see hands-free technology appearance.
Most of the actual commercial multicopters include a GPS transceiver which is used to geo-
localize the device in the 2D space. Now, enterprises such as DJI and Parrot are improving
their technology by including simple image recognition algorithms in their mobile apps for
multicopter controlling. Those are first steps to hands-free controlling of this technology
and this is what we expect to achieve with this project.
Regarding this project and the challenges it involves, we have three major branches of
conflict. First of all we have to face all the “building the quadcopter” issues including all
the research and investigation that has to be done previous to the construction. Then,
programming and first tests of the quadcopter with all the issues related to quadcopter
stability (and later modification of drone hardware if needed). And finally everything
related to programming, testing and modifying the speech controller, including the training
of the recognizer.
- 11 -
- 12 -
Summary
Abstract………………………………………………………………………………………….8
Preface………………………………………………………………………………………...10
Figures List………………………………..…………………………………………………..14
Tables List……………………………..………………………………………………………14
Chapter 1. INTRODUCTION, MOTIVATION AND CONTEXT………………………...16
1. INTRODUCTION………………..……………………..………………………...17
1.1 - Context………………………………………………………..……………17
1.2 - Personal motivation………………………………………….……………18
1.3 - Goals of the project……………………………………………….……….19
1.4 - Structure of the report……………………………………………….……19
1.5 - A little bit of history…………………………………………….…………..20
1.6 - The culmination of an engineering degree……………..……………...20
1.7 - Requirements and challenges…………………………………………...21
Chapter 2. BUILDING THE QUADCOPTER..…………………..………………………..22
1. THE COMPONENTS………………………..……………………………...……24
1.1 - The processor…………………………………………………..………….24
1.2 - The Frame………………………………………….……………………...25
1.3 - Motors, Propellers and ESC’s…………………………………………...25
1.4 - The Gyro…………………………….……………………………………..27
1.5 - Others………………………………………………..……………………..28
2. BUILDING THE QUADCOPTER..………..…………………………...………..28
2.1 - Electronic circuit…………………………………….……………………..28
2.2 - The program……………………………………………..………………...28
2.3 - Testing and changes………………………………………….…………..29
2.4 - Future work………………………………………………..……………….31
Chapter 3. THE SPEECH CONTROLLER………………………………..……………...32
1. INTRODUCTION…………………..…………………………..………………...33
2. SPEECH RECOGNITION..……………………………..……………………….34
2.1 - Recognizing the voice……………………………………….……………34
2.2 - Options…………………………………………………….……………….35
2.3 - Tests and Fails…………………………………………………….………36
2.4 - Dictionary……………………………………………….………………….37
2.5 - Conclusions………………………………………….…………………….38
3. SPEECH GENERATION…………………………………………………..…….39
3.1 - Introduction…………………………………………………….…………..39
3.2 - The synthesizer………………………………………………….………...40
3.3 - Conclusions and future work……………………………………….…….40
4. PROGRAMMING AND TRAINING……………………………………….…….41
- 13 -
Chapter 4. SYSTEM INTEGRATION…………………………………….…….………....44
1. ARDUINO AND THE SPEECH CONTROLLER………….………...…………45
1.1 - The WiFi transceiver……………….………………………..……………45
1.2 - Linking Arduino with external software……………………….…………46
2. THE FINAL CONTROLLER…………………………….………..….…..………48
2.1 - Testing……………………………………………………….…..…………48
2.2 - Fixing problems…………………………………………….…...…………48
2.3 - Expected final result and future work…………………………...………48
Chapter 5: CONCLUSIONS……..……………………..…………..…….…………….…50
1. CONCLUSIONS………………………………………………………………….51
Bibliography.………………………………………………………..…………….…………52
Annex……………………………………………………………………………………….…54
- 14 -
Figures List
[1] Micro quadcopter...………………………………………………………………...17
[2] DJI phantom (similar dimensions with respect to project’s quadcopter)..................17
[3] Structure of the solution…………………………………………………………....23
[4] Motors rotating for stable up movement of the quadcopter. The opposite
configuration (i.e. all blue) would make the multicopter go down………...…..…..24
[5] Motors rotating for quadcopter to go forward. The opposite configuration would
make the multicopter go backward…………………..………………………….....25
[6] Motors rotating for quadcopter to go left. The opposite configuration would make
the multicopter go right……………...……………..……………………………....25
[7] Motors rotating for quadcopter to rotate left. The opposite configuration would
make the multicopter rotate right………………...………………………….……..26
[8] Quadcopter circuitry…………………………………………………………….….28
[9] Quadcopter seen from its side (avoiding landing gear)............................................30
[10] Quadcopter seen from the top………………………………………….…..30
[11] Full mounted quadcopter………………………………………………..….31
[12] Schematic showing the structure of speech processing in this project….…33
[13] Phases of the speech recognition……………………………………….......34
[14] Flow chart explaining the behaviour of the whole solution……………......37
[15] Conceptual behaviour of speech synthesis…………………………….…...39
[16] Flow chart explaining the structure of the program…………………..……41
[17] ESP8266 module mounted on the quadcopter………………………....…..45
[18] Including the WiFi transceiver to the circuit…………………………….....46
Tables List
[1] Recognizer APIs comparison table………………………………………….……..36
- 15 -
- 16 -
Chapter 1.
INTRODUCTION, MOTIVATION
AND CONTEXT
- 17 -
1. INTRODUCTION
1.1 - Context
Nowadays, we are all surrounded by technology. In the last few years, a phenomenon has
emerged and invaded our world, our homes, and even sometimes, our intimacy. This
phenomenon is commonly known as “drone”. A drone is an Unmanned Aerial Vehicle
(from now on UAV), this means that there is not a pilot aboard, and then, it may have
different kinds of autonomy going from a remote control operator to a fully autonomous
computer-driven machine.
We can find several types of drones: Military airplane-shaped drones, quadcopters,
hexacopters, octocopters, decacopters and even dodecacopters. Along this project, we will
focus on multicopters, more precisely on commercial quadcopters.
In the present, the use of those commercial multicopters has expanded as they are cheap,
reliable and they may satisfy several human needs such as aerial view, or even become a
radio control toy. Anyone can acquire a multicopter at any price and going from micro
quadcopters (approximately 40€) to big multicopters (between 150€ and 4000€
approximately depending on the brand, specifications, accessories, etc.).
Figure 1
1: Micro quadcopter Figure 2: DJI Phantom 2
An important point is that the multicopter has some advantages in front of other aerial
devices. Good multicopters are easily maneuverable and very stable and they usually have a
camera which is used as a First Person View (from now on FPV) and sends the image to
another device. So all in all, we can be anywhere and anytime without leaving home. And
this makes the multicopter as famous as it currently is.
1 The first image has been obtained from Sharper Image, http://www.sharperimage.com. Its
author is unknown.
The second image has been obtained from Wikipedia, http://wikipedia.org. Its author is
unknown
- 18 -
However, there is a problem. Generally multicopters have to be controlled using our hands
by holding a transmitter. This reduces the use and capabilities of the multicopter to persons
who can use their hands, excluding disabled people and hands-busy people. Let us imagine
a few contexts:
- What if you are a disabled person and want to enjoy multicopter
capabilities?
- What if you are in a rescue mission, holding your equipment and you want
to drive the multicopter on terrain recognition?
- What if you are doing bicycle, running or climbing and you want the
multicopter to follow you and take pictures during your route?
Some solutions have already been implemented on commercial multicopters for following
people. For example, DJI implements an image recognition algorithm in some of their
multicopters in order to make the device follow someone. But this does not give a full
control of the multicopter for the remote pilot.
1.2 - Personal motivation
After achieving three of the four years of the degree, after all the exams and new
knowledge I thought I needed to start a personal summer project and put my acquired skills
in practice.
Robotics is something that captivates me and this is why I bought myself an Arduino starter
kit. I thought it was a good way to start building and programming simple robots. In
addition, I have always played with radio-controlled devices and in the last few years I
started looking for a quadcopter.
Searching on the internet I found several tutorials explaining how to build a quadcopter
using Arduino, so I started the first phase of the project: building and programming an
entertainment radio-controlled quadcopter.
Then, starting the fourth year of the degree, I realized that I had to find an end-of-degree
project, so I thought to myself, why do not I complete my summer project? I started then
searching for possible needs of the quadcopter and I found that commercial multicopters
cannot be controllable without the use of the hands.
Hand-free controlled multicopters can be a revolution improving their versatility, but,
which is the best way to control a device without the use of the hands?
- 19 -
1.3 - Goals of the project
A first proposal for using speech recognition was made to my actual tutor. Then she told
me it might be interesting to use Brain-Computer Interfaces (from now on BCI), however,
after discussing about that possibility, we found the response of a BCI may not be enough
versatile for controlling all the movements of the quadcopter, so we returned to the first
proposal.
The main goal of this project is then finding a good speech recognition basis Application
Programming Interface (from now on API) in order to create an appropriate vocabulary and
program a speech-based controller for the quadcopter.
Most probably, what might give us more headaches will be training the speech-controller
once we find a good API. As we are talking about a quadcopter, the recognition accuracy
has to be almost a hundred percent or we may have problems such as hurting someone or
crashing the quadcopter. In addition, linking the speech controller with the actual
quadcopter controller (the one on the Arduino) may also be difficult.
All in all, the whole controller has to be very robust to be usable. Otherwise it may be a fail.
1.4 - Structure of the report
This report follows a logical structure based on divider parts of the project, each one with
its sub-parts.
The first chapter provides an introduction to the motivation and context of this project. We
present the technological background and link with the degree on audiovisual systems
engineering.
Chapter 2 focuses on the quadcopter itself. We present the main components and phases of
circuitry, testing and improvement.
Then, chapter 3 presents the speech controller, where we integrated and trained a state-of-
the art speech recognition engine and a speech synthesizer to give feedback and make the
device more human-friendly.
Finally, the fourth chapter provides an overview of the integrated system and discussion on
the main conclusions and future challenges of this project.
- 20 -
1.5 - A little bit of history
August 22nd 1849, the Republic of San Marco surrenders to Austria. The Republic comes
after a revolt against Austria in 1848 in Venice. The Austrians ended by besiege Venice,
fact that leaded to starvation and outbreaks of cholera. Here appear the first UAVs. A
certain number of bomb-filled balloons sent from Austria to attack Venice.
Then, in the early nineteen-hundreds, drone development and innovations started, as we can
imagine, for military purposes. We can talk about a pilotless torpedo invented by the
Dayton-Wright Airplane Company during World War I, or the A. M. Low's "Aerial Target"
in 1916, the first attempt of unmanned aerial vehicle. After that, a succession of UAVs
appeared over the years, a succession of weapons that became machines to kill people.
It is not till the year 2012 when the commercial multicopters become popular. Daniel
Mellinger and Alex kushleyev, two students from the University of Pennsylvania
developed the first quadcopter as we know them today. This first prototype was presented
on a TED talk by Professor Vijay Kumar2.
This is how UAVs became agile, light and small. Big UAVs weighing several tons and
measuring several meters became small multicopters weighing some grams and measuring
less than a meter.
After that, several companies such as Parrot3 or DJI
4 started developing their own
multicopters for commercial use.
1.6 - The culmination of an engineering degree
This project is an end, the culmination of my undergraduate studies. Over those past years I
have been learning to program in different languages; studying basics of calculus, algebra
and physics; learning how sound is created, transformed, propagated, recorded and studied.
I have learnt how image processing works, how video is recorded, stored and played. I have
studied how robots are designed, created and programmed and how their circuitry works. I
have seen the basics of internet communication systems.
Now let us think about the project. It relates the designing of a flying robot that has to be
controlled using a speech recognizer. All in all this project relates robotics, air physics,
calculus for the quadcopter controller, electronics for the whole circuitry, sound for the
speech controller, networks for the communication PC-Quadcopter, programming in
different languages for the whole project.
2 https://www.youtube.com/watch?v=4ErEBkj_3PY
3 http://www.parrot.com/
4 http://www.dji.com/
- 21 -
As we can see the project includes a little bit of everything seen in the degree except for the
image processing. This last part could be implemented in later versions of the drone using a
camera with image processing software for detecting targets for example.
This work is a complete engineering project that culminates four years of studies.
1.7 - Requirements and challenges
The idea is to build and test an affordable quadcopter which has to integrate a speech
recognition engine working in real time, in noisy acoustic conditions (motors and other
surrounding noises). In addition, it adds a synthesis voice engine and tries to document all
the process keeping it as open as possible to make the project reproducible in the future.
This project focuses on a four-leg quadcopter, each of them measuring 22 cm
approximately.
We need the following components:
- The structure (commonly known as f450 structure): It consists on a four legged
plastic structure weighing approximately 300 g
- The Battery: We will search for a battery keeping a good relation weight/duration
approximately 200 g and approximately 20 min (3000 mAh).
- The Gyroscope: Its weight is negligible but we will be searching for a gyro that can
give us accuracy in the measure of quadcopter orientation along X, Y and Z axis.
- The Electronic Speed Controllers (from now on ESC): With those controllers, we
send the signal that sets the speed of the motors. Their weight is approximately 4 x
30g
- The WiFi transceiver ESP8266: This piece allows us to communicate with the
computer via WiFi. Its weight is negligible
- The Arduino5 board: This board contains a microcontroller based on
ATmega328P which is an 8-bit microcontroller. Arduino is very versatile and can
be used in several fields such as robotics, networks and sensors and data acquisition.
The weight of our concrete board is approximately 40g.
So this makes a total of 700 g approximately including wires and others.
Now we can start searching for the best relation motors-propellers that can lift that weight
taking into account that we must search for f450 motors. Usually, for f450 we can use
propellers 9045 or 1045 (those are the common ones). The decomposition of the number
stands for the length of the propeller and the inclination of the blade.
5 https://www.arduino.cc/
- 22 -
Chapter 2.
BUILDING THE QUADCOPTER
- 23 -
1. THE COMPONENTS
This is an overview of the built in components of the quadcopter:
Figure 3: Structure of the whole solution
1.1 - The processor
A processor allows us to determine the conditions in which the motors accelerate or not.
Apart from that, we need it to get the response of the gyro and interpret it; we need to get
the signal received by the transceiver and eventually tell the transceiver to send information
about the quadcopter status to the controller. We need a board in which we can plug all the
components and write a program to control everything. This is why, for this project we use
an Arduino UNO board.
An Arduino board contains a microcontroller and provides a set of digital and analog
input/output pins that can interface to various expansion boards and other circuits. Those
boards include several communication interfaces for loading programs that are programmed
using the Arduino integrated development environment based on a programming language
named Processing6, which also supports the languages C and C++.
The idea of using this processor comes then from the ease and versatility given by those
boards.
6 https://processing.org/
- 24 -
1.2 - The Frame
There is not too much to say about the frame, but just a few things that have to be clear:
- We are talking about an f450 frame. This means that we are using a 22 cm long
legged structure which conditions the rest of components of the quadcopter.
- It has to be light and robust as the quadcopter may eventually fall.
- We have to be sure we get all the screws to fix the motors as not all the motor packs
come with them.
- In our case, the frame has an internal circuit to feed the components. This means
that we can sold the battery terminals to the board of the frame and also the the
terminals of each Electronic Speed Controller (ESC).
1.3 - Motors, Propellers and ESC’s
An important fact to take into account: what about the sense of rotation of each motor? Let
us think about all the possible movements of the quadcopter:
- The quadcopter has to be able to go up and down. This can be performed by
a “simple” increase or decrease of power of all the motors. See Figure 4.
Figure 4
7: Motors rotating for stable up movement of the quadcopter. The opposite
configuration (i.e. all blue) would make the multicopter go down.
7 Figures 4,5,6 and 7 have been obtained from Pinterest. Its author is Ricardo Cámara.
https://es.pinterest.com/pin/566679565588950278/. Even so, those images have been modified.
- 25 -
- The quadcopter has to be able to go forward and backwards. This can be
performed by an increase of power of the rear motors with respect to the
front ones to go forward, and the other way around to go backwards. See
figure 5.
Figure 5: Motors rotating for quadcopter to go forward. The opposite configuration would
make the multicopter go backwards
- The quadcopter must be able to go side to side. This is the same principle
used in forward/backwards movement but using side motors. See figure 6.
Figure 6: Motors rotating for quadcopter to go left. The opposite configuration would
make the multicopter go right.
- 26 -
There is a final case in which the quadcopter is able to rotate right and left, and here is
where the problem appears. To make this case happen, the sense of rotation of the motors
has a repercussion on the rotation of the quadcopter. For this reason and for better stability,
we have to get two motors rotating clockwise and two counterclockwise.
Each type of motor has to be confronted with its homonym in order to turn around the
central axis of the quadcopter. Given that, with clockwise motors turning faster than
counterclockwise, the quadcopter would turn counterclockwise. And the other way around,
with counterclockwise motors turning faster than clockwise ones, the quadcopter would
turn clockwise. See figure 7.
Figure 7: Motors rotating for quadcopter to rotate left. The opposite configuration would
make the multicopter rotate right.
1.4 - The Gyro
The gyro is the device that allows us to control the stability of the quadcopter. We are using
a 3-axis gyro, which means that it takes the response of the quadcopter on the x,y and z
axis.
The idea here is to get the actual angular speed of the drone in all three axis and send an
order to correct the inclination of the quadcopter if needed.
- 27 -
1.5 - Others
Apart from the mentioned above, we need something to control the quadcopter. In the first
version, we built a normal radio-controlled drone. This means that we bought a transmitter
with its receiver. In our case it was a six-channel transmitter but we just need 4 channels to
control the quadcopter. In the second version, this transmitter is not needed; instead, we use
a wifi transceiver to communicate with the computer.
- 28 -
2. BUILDING THE QUADCOPTER
2.1 - Electronic circuit
The following schematic shows the circuit that has been built for the quadcopter:
Figure 8: Quadcopter circuitry
As we can see, a battery is providing the circuit with power (11.1V). The Arduino board
has to be powered by a 5V source, and this is why we have a voltage divider. The gyro and
the receiver are powered by the Arduino itself. Finally, ESC’s are plugged to digital 4-7
pins, receiver inputs to digital 8-11 pins and gyro to analog 4 and 5 pins.
2.2 - The program
As I was building a quadcopter myself and wanted it to fly perfectly, I did not write the
code for controlling it myself. I used the one given by Mr. Joop Brokking (See videos on
bibliography).
We are not going to explain the whole controller because it is very complicated, but the
main idea is that we want to set up a proportional integral derivative (from now on PID)
- 29 -
controller which main goal is to keep the gyro angular rates the same as the inputs of the
transceiver.
The idea is to sum up, a proportional part with an integral part and a derivative part
following the next equation:
∫
Where stands for the proportional gain, for the integrative gain and for the
derivative gain and for the difference between the gyro output and the receiver
output.
The proportional part consists on the difference between the gyro output and the receiver
output multiplied by the gain of the proportional part which we get by test and error. What
we want to acquire with that is the possibility of keeping the multicopter oscillating around
the center position.
The integral part is the difference between the gyro output and the receiver output
multiplied by the gain of the integral part, which is also given by test and error, and
summed up with the previous integral output. What we see is that the multicopter
overcompensates just as the proportional part did.
Finally, the derivative part consists on the difference between the gyro output, the receiver
output, the previous gyro output and the previous receiver output, and is multiplied with the
derivative gain, which again, is given by test and error. This part only applies changes in
angular motion which means that it only fights on first move, but keeps the motors on same
throttle after that.
This essentially makes the quadcopter stabilize itself avoiding abrupt modifications on the
behaviour of the multicopter. This program needs to be modified retrieving the results of
the speech recognition later but the stabilization part remains always the same.
2.3 - Testing and changes
First design of the quadcopter included 4 f250 motors with 5030 propellers. In our first
tests, we lifted the quadcopter several meters, but it was not stable, it was always going
upside-down. We saw that f450 drones were using other type of motors and we discovered
our problem was related to the size of the frame, and not the weight. We tried with 6030
propellers but the problem was that we needed 9045 ones, and we could not use them on
our motors, so we had to buy the proper f450 ones.
- 30 -
Figure 9: Quadcopter from the side. It represents the part without the landing gear.
Figure 10: Quadcopter from the top
Once we solved this problem, we made tests again and saw that the quadcopter had
difficulties to lift from ground and was receiving too much hits on landing. We decided to
buy a landing gear to amortize the landing and so that the quadcopter is more elevated from
the ground.
- 31 -
Figure 11: Image of the full mounted quadcopter
For a visualization of the quadcopter flying, see link to SCIFO Youtube channel on annex.
2.4 - Future work
Currently the quadcopter has two more little problems which are directly related to the
transmitter. First of all, I’m using a 6 channel-transmitter adapted to airplanes. This means
that it has a throttle stick with no spring and neutral position is not acquired automatically.
This means we can not get the quadcopter to be stable at a position trying to put the stick on
neutral position. And this makes the quadcopter go slowly upside-down. Second, the
transmitter does not allow modifying flight modes. The correct flight mode would be
“stabilizing mode”, which would level out the quadcopter after directional modifications.
The transmitter I bought was not prepared for this and so it is not sending actual position of
the sticks to the controller, but just summing-up this value. For clarification, if you move
the stick to position X, the quadcopter will remain on this state provided that the stick
moves along the range [0,X]. The only way to stabilize it is to move the stick out of this
range.
- 32 -
Chapter 3.
THE SPEECH CONTROLLER
- 33 -
1. INTRODUCTION
In this chapter we explain the basis of speech recognition and speech synthesis. The idea is
to retrieve a voice input and generate a voice output which depends on the recognition
results. Mainly, we integrate a state-of-the-art speech recognizer, configure it with our
specific vocabulary, test it and train it with our own acoustic input in order to yield good
recognition accuracy.
We first get the input audio given by a speaker, analyze it and compare it with a set of
acoustic models generated through training with annotated audio excerpts. We interpret the
results in order to make a decision and map it to a set of quadcopter actions. We then
synthesize a message to provide some feedback to the user. Figure 12 provides an overview
of the process.
Figure 12: Schematic showing the structure of speech processing in this project
- 34 -
2. SPEECH RECOGNITION
2.1 - Recognizing the voice
To put in place the process of speech recognition we follow this diagram:
Figure 13: Phases of the speech recognition
- 35 -
As we can see in the diagram below, there is a phase previous to the speech recognition
which consists on training the recognizer. To do that, we have first to create a database of
recordings, a dictionary with the appropriate phonetic transcription and the language model.
The challenges we face during the project are:
- Correct training: a non-well-trained recognizer may give low accuracy
recognition with results up to 0-10% accurate which is not recommendable
at all.
- Noise: We need to take into account all possible scenarios. This includes
scenarios with noise that may modify the recognition accuracy. For that, we
have to do several tests on different places and with different acoustic
features.
- Speaker independence: There is a huge difference between a system that
must be used by several speakers and one that has an only speaker. In this
case, we train the recognizer for one speaker to simplify the training task.
2.2 - Options
The options that we have considered have been the following ones:
sphinx48: Widely used in the open source community, cmu sphinx is designed specifically
for low-resource platforms; it has a flexible design and focuses on practical application
instead of research. It also has active development and release schedule and a large and
active community.
voce9: It is based on CMUSphinx, but it is a prebuilt library that you can use but you can
neither adapt it nor modify. It is very easy to get and use.
Pypi10
: It is simply a package including several built-in recognizers that allows you to test
different tools. In our case we wanted it because of the possibility to use the google speech
recognition tool. One of its advantages is that it is written in python, which is a very simple
and versatile programming language.
8 http://cmusphinx.sourceforge.net/
9 http://voce.sourceforge.net/
10 https://pypi.python.org/pypi/SpeechRecognition/
- 36 -
2.3 - Tests and Fails
We made the first tests with Sphinx4, but we were not able in our first attempt to set it up
correctly, and so we were driven to use voce.
We found voce a very interesting and simple to use tool but it has a major problem, it can
not be trained, and the results we were getting were terrible. After testing more than 15
times with my own voice, we discovered that voce was using a complete US-English
dictionary and so there were too many words to compare with my non previously trained
patterns. The accuracy was almost 0% as the results were very random.
Then we found Pypi which gave us two possibilities, to use the built-in version of sphinx4
or, what was more interesting to us, to use the google speech recognizer. The google API
was perfect in terms of word recognition, but there was a major problem, it was taking
more than 5 seconds which can be considered too much time to show the result of the
recognition. It was recording the audio, sending it to the server, making the recognition, and
getting back the result. Controlling a quadcopter must have almost instantaneous response
so this solution was not adequate. Trying to use sphinx4 from Pypi we were unable to set
up correctly Pocketsphinx module. We found finally the way to use sphinx4 normally.
The following table compares the three options in terms of simplicity, accuracy, trainability
and profitability:
Simplicity Accuracy Trainability Profitability
Sphinx4 Very simple to
use
Acceptable
accuracy (60-80%)
but needs to be
trained
Can be trained
quite simply
Can be exploited
Voce Very simple to
use and includes a
synthesizer
0-20% accuracy Cannot be
trained as it is
precompiled
Cannot be
exploited
because of its
lack of
trainability
Pypi Very simple to
use and includes
google speech
recognition
80-100% accuracy Cannot be
trained but it
isn’t necessary.
Cannot be
exploited
because it takes
too much time
to give the result
Table 1: Recognizer APIs comparison table
- 37 -
2.4 - Dictionary
For this project we are using a simple vocabulary to control the quadcopter. The vocabulary
consists on a listener: SCIFO; and two words for controlling the multicopter. All in our
entire dictionary (.dict) file should have the following words:
scifo, start, stop, motors, go, rotate, up, down, left, right
So, some examples of orders would be: “scifo start motors”, “scifo go up” or “scifo rotate
left”.
The following flowchart shows the structure of the whole solution and the execution
sequence
Figure 14: Flow chart explaining the behaviour of the whole solution
- 38 -
2.5 - Conclusions
As a conclusion of this section, we can say that speech recognition is a very helpful tool but
it has major issues that must be taken into account. Not every API can be used in any
scenario; this means that we have to know perfectly which are our needs and our
capabilities. In addition, training is very important so it has to be done correctly.
For future work, it would be interesting to improve system training in order to have better
results.
- 39 -
3. SPEECH GENERATION
3.1 - Introduction
To give feedback to the user, we want to implement a simple solution that has to be human
friendly.
Text-to-speech (TTS) technology is improving in naturalness day after day and it is based
on the simple principle of translating a string of words into synthesized audio. A TTS
Engine converts written text to a phonemic representation, and then uses fundamental
frequency (pitch), duration of the string, position of phonemes in the syllable and
neighboring phones to convert the phonemic representation to waveforms that can be
output as sound.
The following schematic shows the conceptual behaviour of text-to-speech:
Figure 15: Conceptual behaviour of speech synthesis
As this project is not focused on audio synthesis, we do not spend much time on modifying
the synthesizer to get more natural results, even if it is the main challenge of a good
synthesizer. Here we are satisfied finding a synthesizer that can be understood easily and is
easy to use
- 40 -
3.2 - The synthesizer
We are using FreeTTS11
, which is a speech synthesis system entirely written in JAVA. The
only thing that we have to do to add it to our project is to import the jar libraries into our
project.
The idea is to make the quadcopter give us feedback of every modification on its behaviour.
So anytime the quadcopter gets an order, it replies with feedback information.
We don’t need a specific vocabulary for speech synthesis as we want the quadcopter to give
us understandable feedback. This means that we just need normal EN-us synthesis model.
3.3 - Conclusions and future work
Using FreeTTS we get good results in terms of understandability, but naturalness could be
highly improved. The resulting voice we get is very robotic. FreeTTS allows us to use
several voices which are already precompiled in the jar libraries or create new ones.
In our project we are using the synthesizer just for replying and simple interaction, for
future versions of the synthesizer, we could implement some new features in order to create
a more intelligent system to which we can talk. In addition, one interesting improvement
would be the one related to naturalness using a smoother voice.
11
http://freetts.sourceforge.net/docs/index.php
- 41 -
4. PROGRAMMING AND TRAINING Our program is structured as follows:
Figure 16: Flow chart explaining the structure of the program
First of all, we need to import all the libraries involved in our project, i.e. cmu sphinx
libraries, FreeTTS and java .net.
The second phase is to initialize all the parameters for the synthesizer and set up all the
configuration parameters of the recognizer. This means that we have to specify the path of
the used acoustic model, the dictionary and the language model.
Then we create a socket for the communication PC-Quadcopter specifying the quadcopter
IP and the communication port.
Finally, the interpreter that takes the information given by the speaker and send an order to
the quadcopter.
- 42 -
To train the recognizer we need to change the acoustic model. The process is quite
complex. We have two possibilities:
- We can create a new acoustic model to substitute the existing one. The problem is
that we have to record several hours of information and it might be unnecessary
having a second option.
- We can adapt the acoustic model, which is what we do.
Adapting the acoustic model is a complex process. Mainly, we have to record the entire
dictionary in separate wave files to create an adaption corpus. Then we have to generate all
the acoustic feature files using the tool sphinx_fe provided by cmu Sphinx.
Finally, we have to update the existing acoustic model with new parameters and we get our
modified acoustic model which we include to our java project.
- 43 -
- 44 -
Chapter 4.
SYSTEM INTEGRATION
- 45 -
1. ARDUINO AND THE SPEECH CONTROLLER
1.1 - The WiFi transceiver
Once we get the result of the speech recognition, we need to send an order to the
multicopter. Since the beginning, we were using a 6 channel transmitter with its receiver
and controlling the quadcopter with hands. Now, we need a simple way to send commands
from our PC to the quadcopter. We could use three different solutions:
- Send the information via cable using the arduino communication port. In this case
we would not be able to let the quadcopter fly freely, so this is not the best solution
for us.
- Send the information using bluetooth. This would be a good solution, but it would
not give us a large distance, maybe just a few meters.
- Send the information using WiFi. WiFi communication is easier to set up and give
us a wide range of possible distances depending on our needs and possibilities going
from several meters to several kilometers. In addition, WiFi is a reliable technology
in terms of communication issues.
What we use then is a simple and cheap WiFi module called ESP8266 that we connect to
the same local area network of our PC and send information from the computer to the
multicopter using Telnet protocol.
Figure 17: ESP8266 module mounted on the quadcopter
- 46 -
The following scheme shows the way Arduino board and ESP8266 are connected:
Figure 18: Including the WiFi transceiver to the circuit
1.2 - Linking Arduino with external software
As soon as we have the WiFi module connected to the Arduino UNO, we need to link the
speech recognizer with the quadcopter. With this link, we are able to send simple character-
oriented code that the multicopter can interpret and act in consequence.
To do this, we use a protocol called Telnet. It is an application layer protocol that provides
a bidirectional interactive text-oriented communication between two nodes using a virtual
terminal connection.
All in all, what we do is connect the pc to the IP of the quadcopter on a certain port. Then,
depending on the speech recognition result, we get a certain character that we send through
telnet to the multicopter.
- 47 -
We send the following information:
- “1”: Start motors
- “0”: Stop motors
- “w”: Go up
- “s”: Go down
- “a”: Go left
- “d”: Go right
- “q”: Rotate left
- “e”: Rotate right
- 48 -
2. THE FINAL CONTROLLER
2.1 - Testing
After the first tests we have seen that the solution works but with major issues. The
recognition works at first, but once we get the motors to work, there is too much acoustic
noise to continue with the program execution. This noise is created by the motors and the
state “start motors” cannot be interrupted.
In addition, we have to handle some exceptions that are not taken into account. For
example, in case the listener detects less than three words (for example “SCIFO GO”), the
program throws an “Out of bounds” exception that stops the execution of the recognizer.
This exception is due to the attempt to access a nonexistent position of the array of orders
(“orders[2]”).
2.2 - Fixing problems
Regarding the noise that may fail the recognition, we try to increase recognizer accuracy by
doing a better acoustic model adaptation. We increase the number of recordings and test
different scenarios so that every case is taken into account.
For the “Out of bounds” exception, we implement the behaviour so that the program can
handle the exception and act in consequence.
2.3 - Expected final result and future work
For the final result, we expect that the quadcopter can interact easily with the speaker and
have a normal behaviour. Most probably, as it is a handmade multicopter, it will have
stability problems. Those problems already exist in the hand-controlled version of the
quadcopter.
Regarding the future work, It would be interesting to improve the stability of the
quadcopter and depending on the final result, improve also the recognizer behaviour and
accuracy. In addition, it would be interesting to add some features to the quadcopter such as
an FPV camera and a GPS.
- 49 -
- 50 -
Chapter 5.
CONCLUSIONS
- 51 -
1. CONCLUSIONS
There is a bunch of possible configurations when it relates the programming of a
multicopter. What we have seen in this project is that we can always add capabilities to
machines in order to make them smarter and more helpful to humans.
As always, it is not difficult to implement a solution for a given need, what is more
difficult, is to improve this solution in order to make it work perfectly. During the project,
we have faced many difficulties and unexpected problems which are always present on all
projects. What we need to control always is the given time to achieve our goals. For
instance, in this project, time has been correctly invested as we have got promising
outcomes.
In addition, it is always important to get all the possible information before starting a
project of this scope as it might save you a lot of time and headaches. As a matter of fact, if
we had spent a little bit more time searching for motors information, we would not have
spent 180 € but just 100€ for that purpose.
This is just a possible implementation of a solution for speech recognition controlled
quadcopters but other possibilities may arise and improve this one. The main idea that we
want to share is that intelligent systems and robotics are the future. The implementation of
this solution searching for more user scenarios may be a revolution along with other
projects involving artificial intelligence.
- 52 -
Bibliography
Information
[1] Anonymous, Wikipedia [https://en.wikipedia.org/], March 2016, Unmanned aerial
vehicle.
[2] Anonymous, Wikipedia [https://en.wikipedia.org/], April 2016, First-person view
(Radio Control)
[3] Brett Holman, Airminded [http://airminded.org/], August 2009, The first aerial
bomb
[4] Anonymous, Wikipedia [https://en.wikipedia.org/], May 2016, History of unmanned
aerial vehicles
[5] Anonymous, Nesta [http://www.nesta.org.uk/], Unknown, Drones: a history of
flying robots
[6] CMU Sphinx developers, CMU Sphinx [http://cmusphinx.sourceforge.net/],
December 2010, Before you start
[7] CMU Sphinx developers, CMU Sphinx [http://cmusphinx.sourceforge.net/],
November 2015, Sphinx-4 application programmer’s guide
[8] CMU Sphinx developers, CMU Sphinx [http://cmusphinx.sourceforge.net/],
December 2011, Generating a dictionary
[9] CMU Sphinx developers, CMU Sphinx [http://cmusphinx.sourceforge.net/], March
2016, Adapting the default acoustic model
[10] FreeTTS developers team, FreeTTS
[http://freetts.sourceforge.net/docs/index.php], Unknown, What is FreeTTS
[11] Anonymous, Wikipedia [https://en.wikipedia.org/], March 2016, Telnet
[12] Lawrence R. Rabiner and Ronald W. Schafer. 2007. Introduction to digital
speech processing. Found. Trends Signal Process. 1, 1 (January 2007), 1-194.
DOI=http://dx.doi.org/10.1561/2000000001
[13] Anonymous, Wikipedia [https://en.wikipedia.org/], June 2016, Speech
Synthesis
[14] Anonymous, Wikipedia [https://en.wikipedia.org/], June 2016, Arduino
[15] Bryant Frazer, Studio Daily [http://www.studiodaily.com/], June 2015,
Studio Special Report: The State of the Art in Drone Technology
Else’s Images
[1] Ricardo Cámara, Pinterest, https://es.pinterest.com/pin/566679565588950278/,
Drone schematic.
[2] Unknown, Sharper Image,
http://www.sharperimage.com/si/view/product/RC+Micro+Drone/203803, Micro
drone image.
[3] Unknown, Wikipedia,
https://upload.wikimedia.org/wikipedia/commons/4/41/DJI_Phantom_2_Vision%2
B_V3_hovering_over_Weissfluhjoch_%28cropped%29.jpg, DJI phantom
- 53 -
Building the drone tutorials
Here are the 6 video-tutorials that I followed to build the drone.
Author: Mr. Joop Brokking
Date: April - May 2015
[Part 1]: https://www.youtube.com/watch?v=2pHdO8m6T7c
[Part 2]: https://www.youtube.com/watch?v=bENjl1KQbvo
[Part 3]: https://www.youtube.com/watch?v=nCPEJTUYch8
[Part 4]: https://www.youtube.com/watch?v=fqEkVcqxtU8
[Part 5]: https://www.youtube.com/watch?v=JBvnB0279-Q
[Part 6]: https://www.youtube.com/watch?v=2MRiVSyedS4
- 54 -
ANNEX
Link to my github repository:
https://github.com/pedrofrau/SCIFO
Youtube SCIFO videos:
https://www.youtube.com/channel/UC0ug-lOxs-FQt0cTlnkCLhQ
Link to prof. Vijay Kumar’s TED talk:
https://www.youtube.com/watch?v=4ErEBkj_3PY
Consulted web pages for multicopter examples:
http://www.parrot.com/
http://www.dji.com/
http://www.arduino.cc/
https://processing.org/
http://cmusphinx.sourceforge.net/
http://voce.sourceforge.net/
https://pypi.python.org/pypi/SpeechRecognition/
http://freetts.sourceforge.net/docs/index.php