video synchronization in the cloud using visible light communication

VIDEO SYNCHRONIZATION IN THE

CLOUD USING VISIBLE LIGHT

COMMUNICATION

Maziar Mehrabi

Master of Science ThesisSupervisor: Dr. Sébastien Lafond

Instructor: Le WangEmbedded Systems Laboratory

Faculty of Science and EngineeringÅbo Akademi University

January 2015

ABSTRACT

Video synchronization refers to time-based alignment of several audio/video streams.

The growth of heterogeneous social media networks demands faster and more efficient

synchronization methods that could satisfy the real-time requirements of media cloud.

Although there are many techniques and methods for synchronization that have been

in use or proposed, this thesis suggests a novel approach for video synchronization by

harnessing the capabilities of Visible Light Communication (VLC) as it can provide

more robust and efficient ways of video synchronization.

This thesis introduces the design and implementation of a VLC-based video syn-

chronization prototype. The synchronization of different video streams is provided

by the means of VLC through Light Emitting Diode (LED) lights and digital phone

cameras. This is achieved by embedding the necessary information as light patterns

in the video content. These patterns can later be recognized by processing the video

streams. In addition to synchronization, the transmitted information through VLC can

make many other applications available.

This method of synchronization is needed in cases where several heterogeneous

camera-equipped devices (e.g. cellular smart phones) are live-streaming video con-

tents to a media server or cloud environment. In addition, the means of VLC can

be exploited to carry information for other purposes rather than video synchroniza-

tion. The approach presented in this work does not require modification of software or

hardware components of the camera device.

Keywords: LED, Digital Camera, Rolling Shutter, Video Stream, Video Frame, Mi-

crocontroller, Carrier Frequency

i

CONTENTS

Abstract i

Contents ii

List of Figures iv

Glossary vi

1 Introduction 1

1.1 Objective of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background and Related Work 4

2.1 Synchronization of Social Video Streams . . . . . . . . . . . . . . . 42.2 Previous and Related Work . . . . . . . . . . . . . . . . . . . . . . . 72.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 VLC-based Video Synchronization 10

3.1 Visible Light Communication . . . . . . . . . . . . . . . . . . . . . . 103.1.1 The Rolling Shutter Effect . . . . . . . . . . . . . . . . . . . 12

3.2 A Video Synchronization System . . . . . . . . . . . . . . . . . . . . 153.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Design and Implementation 21

4.1 Physical Layer and Devices . . . . . . . . . . . . . . . . . . . . . . . 224.1.1 Electronic Components . . . . . . . . . . . . . . . . . . . . . 224.1.2 Circuit Isolation . . . . . . . . . . . . . . . . . . . . . . . . 244.1.3 Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 VLC Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2.1 Modulation Techniques . . . . . . . . . . . . . . . . . . . . . 274.2.2 Bandwidth Limitations . . . . . . . . . . . . . . . . . . . . . 294.2.3 Frequency Selection . . . . . . . . . . . . . . . . . . . . . . 334.2.4 Lifetime of Frequencies . . . . . . . . . . . . . . . . . . . . 354.2.5 Flicker Improvement . . . . . . . . . . . . . . . . . . . . . . 37

ii

4.3 VLC Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.1 Architectural Overview . . . . . . . . . . . . . . . . . . . . . 404.3.2 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3.3 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . 44

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5 Results and Evaluation 50

5.1 Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Evaluation Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . 525.3 GPU Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.4 Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6 Conclusion and future work 58

6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Bibliography 65

A Appendix 78

A.1 Direct Communication With The Arduino Board . . . . . . . . . . . 78A.2 Interfacing The Serial Connection . . . . . . . . . . . . . . . . . . . 81A.3 Character Lookup Table . . . . . . . . . . . . . . . . . . . . . . . . . 83

Appendix A 78

iii

LIST OF FIGURES

2.1 Use case scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Unsynchronized video streams . . . . . . . . . . . . . . . . . . . . . 62.3 Requirements of synchronization with reference clock . . . . . . . . . 72.4 Synchronization using audio patterns . . . . . . . . . . . . . . . . . . 8

3.1 Data transmission through visible light. . . . . . . . . . . . . . . . . 113.2 The rolling shutter effect on fast moving train. . . . . . . . . . . . . . 133.3 The rolling shutter effect on blinking LEDs . . . . . . . . . . . . . . 143.4 Fast blinking light sources captured by the rolling shutter . . . . . . . 143.5 Architecture of the video synchronization system . . . . . . . . . . . 163.6 Inter-frame synchronization . . . . . . . . . . . . . . . . . . . . . . . 173.7 Closed GOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.8 Open GOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1 SMD LED plate of IKEA Ledare E27. . . . . . . . . . . . . . . . . . 234.2 Circuit schematic for modulating low power LED. . . . . . . . . . . . 234.3 Darlington Pair implemented in TIP120 ICs. . . . . . . . . . . . . . . 244.4 Optocoupler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.5 Circuit schematic for the transmitter. . . . . . . . . . . . . . . . . . . 254.6 Hardware components of the prototype. . . . . . . . . . . . . . . . . 264.7 Modulation techniques . . . . . . . . . . . . . . . . . . . . . . . . . 284.8 Block-based video compression effect . . . . . . . . . . . . . . . . . 314.9 FT of the frame shown in Figure 3.4 . . . . . . . . . . . . . . . . . . 324.10 Block-based motion estimation effect . . . . . . . . . . . . . . . . . 334.11 Guard intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.12 Odd harmonic frequencies in a square wave. . . . . . . . . . . . . . . 354.13 Breaking of a frequency in two frames. . . . . . . . . . . . . . . . . . 374.14 Flicker compensation using duty-cycle modification. . . . . . . . . . 394.15 Receiving server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.16 Picture conversion and image processing. . . . . . . . . . . . . . . . 424.17 Finding the threshold value. . . . . . . . . . . . . . . . . . . . . . . . 444.18 A frame carrying three frequencies. . . . . . . . . . . . . . . . . . . 454.19 Time to frequency conversion. . . . . . . . . . . . . . . . . . . . . . 464.20 Spectrum centralization. . . . . . . . . . . . . . . . . . . . . . . . . 47

iv

4.21 DFT conversion of the picture shown in Figure 4.18. . . . . . . . . . 48

5.1 Test scenario for the video synchronizer . . . . . . . . . . . . . . . . 515.2 Unsynchronized video streams on playback . . . . . . . . . . . . . . 525.3 Synchronized video streams on playback . . . . . . . . . . . . . . . . 535.4 Performance profiling results . . . . . . . . . . . . . . . . . . . . . . 545.5 Sequence of frames with maximum one VLC frame . . . . . . . . . . 56

6.1 Adding sinusoid frequencies . . . . . . . . . . . . . . . . . . . . . . 606.2 Stitching images - Sliding window. . . . . . . . . . . . . . . . . . . . 616.3 Processing times in pipelined and non-pipelined task scheduling . . . 62

v

GLOSSARY

• 3G

Third Generation.

• 4G

Fourth Generation.

• BJT

Bipolar Junction Transistor.

• BT

Bluetooth.

• CCD

charge-coupled device.

• CFL

Compact fluorescent lamp.

• CMOS

complementary metal–oxide–semiconductor.

• DFT

Discrete Fourier Transform.

• EU

European Union.

• FEC

Forward Error Correction.

• FFT

Fast Fourier Transform.

vi

• FPS

Frames Per Second.

• FSK

Frequency Shift Keying.

• IC

Integrated Circuit.

• IoT

Internet of Things.

• IR

Infrared.

• ISI

Inter-Symbol Interference.

• LAN

Local Area Network.

• LED

Light Emitting Diode.

• MOSFET

Metal-Oxide-Semiconductor Field-Effect Transistor.

• OFDM

Orthogonal Frequency Division Multiplexing.

• QR code

Quick Response Code.

• SMD

Surface Mount Device.

• SNR

Signal to Noise Ratio.

• THD

Total Harmonic Distortion.

vii

• UV

Ultraviolet.

• VLC

Visible Light Communication.

• Wi-Fi

Wireless Local Area Network.

viii

1 INTRODUCTION

In recent years the number of hand-held devices with multimedia capabilities has in-

creased. In addition, these devices usually have abilities to share and access video

contents via the Internet [1]. This allows common mobile phone users to be able to

generate and distribute high-quality content [2] such as images, video, etc.. Studies in

[3] and [4] have forecast a dramatic growth in both the number of connected devices

and the share of video content in global consumer Internet traffic. Moreover, the

increasing popularity of social networking, media sharing and also network-enabled

cameras (i.e. IP camera [5]) leads to a situation in which many video streams are

available for a particular live event.

One of the challenges of maintaining a live media streaming service is to keep

multiple video streams synchronized [6]. The synchronization problem happens as the

video streams are distributed through different network infrastructures (e.g. 3G, 4G,

Wi-Fi, LAN, etc. or any combination of these [7]) with different characteristics (such

as jitter, delay, speed, etc.), hence each video stream might be exposed to a certain

value of delay [8] [9] resulting in unsynchronized video streams at the destination. The

other main reason for unsynchronized video streams -regardless of network facilities-

is the starting point of each recoding camera that leads to different timestamps for

identical frames among several video streams. One way to achieve video synchroniz-

ation is to make use of visual information available in the video [10] . In this work,

features of visible light is utilized to provide the necessary visual information for the

synchronization.

Visible Light Communication (VLC) refers to wireless communication using the

visible light spectrum i.e. wavelengths from 380nm (violet) to 780nm (red) [11]. One

of the main advantages of VLC is its ability to be combined with existing lightening

sources in our environment, making it efficient and suitable for ubiquitous comput-

ing applications. VLC can be utilized in transportation systems, machine-to-machine

communication [12], underwater communication [13] and so on. Unlike other radio

1

technologies, light cannot travel through non-transparent material. This feature makes

VLC an ideal solution for e.g. indoor wireless communication [14] [12].

The energy efficiency feature of LEDs compared to other lighting technologies

such as Compact Fluorescent Light (CFL) or incandescent light bulbs [15] makes it

the next dominant solution for lighting industry [16] [17]. LED’s share of the world

market size has been increasing for the few past years and it is predicted to be more

than 30% of total lighting market in 2016 [18] [19] [20]. Furthermore, as LED light

modules can be driven by DC power, their ability to be rapidly modulated makes them a

suitable candidate for smart lighting [21], smart spaces and cities [22] [23], Internet of

Things (IoT) and VLC applications. The combination of LEDs and data transmission

makes VLC an emerging technology and an interesting research topic.

Moreover, using ultraviolet and infrared technologies are considered to be out-

dated for many applications as newly manufactured devices equipped with digital

camera (such as smart phones) are provided with Ultra-Violet (UV) and infrared (IR)

blocking filters [24] [25] [26]. Also, as explained in chapter 8 of [27], characteristics

such as distance impact and noise vulnerability are relatively the same between visible

light and infrared (IR) communication technologies.

1.1 Objective of the thesis

The main objective of this thesis is to design, implement and evaluate a VLC-based

video synchronization system. In order to make VLC available, it is necessary to

implement a platform that can be utilized to build VLC-based applications. In this case

a real-time video synchronization system is implemented to exploit the capabilities of

such platform. Additionally, the VLC platform could be utilized for other VLC-based

use cases as well. This platform is designed and implemented while considering the

scalability and isolation characteristics of cloud computing.

This thesis explains the implementation details of the mentioned platform. This

platform consists of hardware and software components to transmit information through

physical light bulb modules. In addition, camera-equipped mobile phones are used to

generate video contents. The VLC receiver and video synchronization software com-

ponents of this platform are implemented in an isolated and scalable manner.In this

way, scalability and transparency features of cloud environment can be harnessed for

application deployment.

2

This thesis also explains the challenges and constraints of implementing a camera-

based VLC system. Additionally, solutions, instructions and suggestions are also

presented in this thesis in order to overcome the mentioned challenges.

1.2 Thesis structure

The thesis is divided into six chapters. Chapter 2 explains the background and previ-

ously performed works related to VLC and video synchronization. Chapter 3 describes

the proposed architecture for the video synchronization in details. Chapter 4 describes

the implementation details of all elements of the prototype namely electronic compon-

ents, transmitter software and a receiving server. Evaluation results are presented in

Chapter 5. Chapter 6 presents the conclusions and future work followed by a discus-

sion about the future topics of lighting and VLC systems.

3

2 BACKGROUND AND RELATED WORK

This chapter first states the problem of video synchronization and then surveys related

works and prior arts in video synchronization. Traditional synchronization methods

mainly focus on synchronization of audio and video stream (lip synchronization) [28]

or synchronization of separate video streams in a centralized system. However, these

solutions are not usually suitable for a decentralized heterogeneous system.

2.1 Synchronization of Social Video Streams

It is necessary for a multi-camera system to maintain the synchronization of separ-

ate video streams. The need for video synchronization can be because of a variety of

reasons such as analysing the visual information [10], identification, activity recogni-

tion and also live broadcasting of sport events [29]. In addition, providing time-based

alignment becomes more challenging in a non-centralized multi-camera system [30]

that is consisting of heterogeneous camera-equipped smart phones. Whitehead, A. et

al. define the video synchronization problem as the following: "Given k different video

sequences that overlap in time, identify one frame from each of the different sequences

that refer to the same point in (universal) time." [31].

Figure 2.1 shows a use case in which a number of individuals such as audience,

skycams and camera operators (called producers) are streaming live video content to

a cloud-based media server. The video streams are then tailored (e.g. transcoded) ac-

cording to the next component’s needs in software containers and sent to the director.

The director has the responsibility of selecting the optimal video stream based on cer-

tain constraints (e.g. video quality, best viewpoint, etc.) and re-streaming the selected

video stream to a broadcasting server. Therefore, it is necessary for the director to have

the video streams synchronized in order to make a fair decision.

4

Scene/Arena/Object

Producers

3G

4G

Wi-Fi

Containers

Director

Broadcaster

Figure 2.1: Use case scenario

The reason why video streams are delivered unsynchronized falls into several cat-

egories such as the impact of heterogeneity of network infrastructures, different start

time for each camera and frame rate variations. Figure 2.2 shows a scenario where two

different types of camera have started recording an event in different times and from

different viewpoints. In the meantime the recorded media streams are being streamed

to a media gateway through different network connections. However, when the streams

are arriving at the media gateway there is a time offset (shown as ∆t) among identical

frames in different video streams. In the example shown in Figure 2.2 ∆t has the

minimum value of 7 seconds (neglecting networks impact). The solution for the syn-

chronization problem is to make the early video stream delayed for ∆t. However,

finding the ∆t is the challenge that broadcasting services are struggling with.

5

Δt

Record time: 00:02:36

Record time: 00:02:43

LAN

4G

Current playback time

Video

Stream 1

Video

Stream 2

Media Gateway

Live

Video

Streams

Figure 2.2: Unsynchronized video streams

6

2.2 Previous and Related Work

As explained in Chapter 7 of [32], the synchronization problem imposed by the net-

works could be solved by defining a global clock (using e.g. methods in Network Time

Protocol [33]) as a reference for all cameras. This solution is shown in Figure 2.3.

Reference clock

Video timeline

Video buffer Network packet

Frame

available

Reference clock

Clock

synchronization

protocol

Sister protocol packets

Video buffer Network packet

Sister protocol packets

Frame

available

Δt

Video timeline

Figure 2.3: Requirements of synchronization with reference clock

Figure 2.3 shows a setting where two different cameras are streaming video content

over the network. These cameras stamp the video content (i.e. each frame) by referring

to a reference clock. Moreover, each network packet gets ready after the packet’s

header is updated with information on current time. The reference clocks of each

camera are being synchronized using a synchronization protocol. Hence any delay

caused by camera’s internal characteristics or network jitter can be detected in the

receiver using the globally synchronized timestamps. If there are any sister protocols

in use, the reference clock can be used to form the necessary packets. For example

the Real-time Transport Control Protocol (RTCP) is a sister protocol of the Real-time

Transport Protocol (RTP).

However, the solution shown in Figure 2.3 requires the system to be centralized as

each camera needs to communicate with a central server in order to make itself aware

7

of the global consensus on time. Therefore it requires the camera-equipped device

to have this feature implemented. On the contrary, smart phones are tend to have

independent implementations of their video recording software in which the starting

timestamp for every separate video is set to zero.

Another approach to achieve synchronization is through visual information. Many

of the related prior works rely on homograph pictures which requires the cameras to

be very close to each other (to ignore the parallax effect) and in a static position [10]

[34] [35] [36] [37] and/or limited to number of cameras leading to scalability and

mobility limitations [38]. These constraints limit harnessing the scalability features of

cloud computing. A novel approach in many of the mentioned methods is to identify

and track a feature or a moving object in the scene [39] [40] [41]. Nonetheless these

methods require the object to be in line of sight of all cameras.

Researchers in [42] and [43] have been developing a synchronization method that

uses the patterns in audio channels as a reference to find identical frames. The picture

shown in Figure 2.4 illustrates this method.

Audio track in

Stream 1

Audio track in

Stream 2

Audio track in

Stream 3

Selected audio pattern

Time

Figure 2.4: Synchronization using audio patterns

8

Figure 2.4 depicts three audio tracks where each of them belong to a different video

stream recording the same event. Although these audio streams are not exactly the

same but they often can contain similar patterns. These patterns (shown in rectangular

shapes in Figure 2.4) can be for example a high pitch tune or a specific announced

word. Once the synchronizer detects any common pattern between these audio stream

it can determine the conveyed delay and align the video frames corresponding to the

audio pattern. The advantage of this method over other pixel based methods is that

the audio information usually contain much less data to process and the algorithms are

relatively more straightforward to implement.

However, the drawback of this method is that the same audio footprint must be

present in all video channels. It is challenging to satisfy this requirement in big and

crowded events (e.g. a football match in a stadium), mute spaces and audio edited (i.e.

remixed/remuxed) channels. Furthermore, this methods is not suitable for real-time

video streaming because the audio tracks should be buffered enough to provide the

necessary data for pattern recognition.

2.3 Summary

This chapter explained the problem of video synchronization in social video streaming

environments. The need for video synchronization has attracted a lot of research effort.

However, traditional methods of synchronization are not always suitable for today’s ex-

pansion of social networking due to its decentralized and heterogeneous characterist-

ics. Computation costs and low system accuracy are also other drawbacks of traditional

synchronization techniques. In the following chapter we will explain how inter-frame

video synchronization can be done by using VLC.

9

3 VLC-BASED VIDEO SYNCHRONIZATION

This chapter introduces Visible Light Communication (VLC) systems and their ap-

plications and provides a brief description of the proposed architecture for the video

synchronization system. More details on the implementation of the transmitter and

receiver are given in Chapter 4. In this thesis, the term receiver is usually referring to

the pure VLC receiver (i.e. a camera and other video processing components), while

a video synchronization system (server) is an application that is developed on top of

the implemented VLC system using the provided Application Programming Interface

(API). This is because the implemented VLC prototype can be used for various types

of applications.

This chapter is divided into three sections. Section 3.1 presents the concept of VLC

and the rolling shutter effect. Section 3.2 explains an abstract of the architecture of the

synchronization system, while Section 3.2 is dedicated to the synchronization servers.

3.1 Visible Light Communication

VLC is an optical wireless communication that takes place on top of the illumina-

tion (i.e. visible) light [44]. In recent years, a growing research interest in VLC has

been seen among researchers [27]. The advantages of VLC have opened a new range

of applications in ubiquitous computing, IoT and wireless communications. The ad-

vancement of VLC has grown significantly with the growth of LEDs [44]. As LEDs are

suitable candidates for VLC transmitters [45], some of the advantages of VLC systems

are inherited from LED characteristics. The main advantages are the following:

(a) Safety: Visible light, as the most natural form of radiation, is known to be harm-

less to human body [46] [47]. The ability of LED lights to be modulated quickly

will ensure a level of immunity for the human psyche [11]. Moreover, compared

to other radio technologies VLC does not interfere with aircraft equipments and

medical devices [48].

10

(b) Availability: The deployment of dual-service VLC (lighting and data communica-

tion) applications becomes easier with the high availability of artificial luminance

light in humans’ environment [49]. The growing interest in solid-state lighting and

LEDs also helps this development [50].

In addition, exploiting VLC does not require a license or certificate of the radio

spectrum [48]. Furthermore, the massive usage of camera equipped devices makes

the availability feature of VLC stronger in our application.

(c) Efficiency: LED lighting is known to be more efficient compared to other sources

of lighting [15]. Long life expectancy, high humidity tolerance and minimal heat

dissipation are also other advantageous aspects of LED lights [49].

The receiving end of an optical transmission system is a light/photo detector. A

photodetector is a device that generates an electrical charge (i.e. current) when ex-

posed to visible light [27]. Although photodetectors can convert high rate light pulses

accurately, their sensitivity to background light noise results in a very low Signal-

to-Noise Ratio (SNR) [51], therefore the implementations should provide a level of

channel isolation. Complementary Metal-Oxide-Semiconductor (CMOS) and Charge-

Coupled Device (CCD) sensors in digital cameras are also examples of light detector

sensors [52] [53]. Nakagwa, M. and Haruyama, S. have invented a camera-equipped

cellular device capable of receiving VLC through the camera and a secondary light

receiving unit [54].

Data Source Modulator Light Source

Optical Medium

Light SensorDemodulatorOutput Data

Transmitter

Receiver

Figure 3.1: Data transmission through visible light.

Figure 3.1 shows how data transmission through visible light takes place. The

VLC transmitter modulates the input data using a modulation technique (explained in

Chapter 4.2.1) and transfers the modulated signals to the light source units (typically

11

LEDs). The modulated light then travels through the optical medium (i.e. free space,

water or other forms of transparent matter). On the receiver side, the modulated light

is captured and transformed into electrical signals using light sensor units (i.e. photo-

detector, CCD, CMOS, LED, etc.). These signals are then demodulated and decoded

into output data using dedicated hardware or software components.

Although LEDs are designed to emit light, they can also act as photodetectors at the

same time [55]. As suggested by Schmid S. et al. in [56] this feature can be exploited to

build bi-directional and cost-efficient optical communication channels. However, im-

plementing a VLC system using this feature requires precise synchronization between

send and receive phases and special circuitry for detecting the generated current by the

LEDs. On the other hand, a bi-directional communication can be achieved by design-

ing hybrid networks where the downlink if carried out by visible light and the uplink

is carried by WiFi [57] [58], infrared [59], etc..

The IEEE 802.15.7 standard [60] has defined three operating modes for VLC phys-

ical layer. The highest data rate in the standard is 96Mb/s. However, several studies

have reported higher data rates such as 575Mb/s in [61] and 875Mb/s in [62]. Never-

theless, the high data rates in the mentioned demonstrations are possible through using

photodetectors in the receiver and implementation examples that use a commercial

camera as the receiver are scarce.

Papers in [63], [64] and [65] demonstrate VLC systems that take advantage of high

speed cameras (i.e. 1000fps). However, high speed cameras are not widely available

on mobile phones.

3.1.1 The Rolling Shutter Effect

In order to set a digital camera as the VLC receiver, we utilize the Rolling Shutter

effect of the digital cameras. Although this effect was not intentionally designed to

be a feature in digital cameras, it can be exploited to enable communication through

visible light.

The image sensors built in digital cameras fall into two main categories namely:

CCD and CMOS [66]. In our application, the main difference between CCD and

CMOS technologies is their shuttering mechanism.

The pixels of a picture in CCD sensors are captured globally during one exposure

period, while CMOS sensors capture the scene by sweeping lines of pixels [67] (either

horizontally or vertically). Hence, the shuttering mechanism is called Global [68] in

12

CCD sensors and Rolling [69] in CMOS sensors. The energy-efficiency of CMOS

sensors [70] has made them the suitable candidate for battery powered devices such as

smartphones [71]. The rolling shutter effect becomes visible while the recorded video

contains a fast moving object such as a fan or helicopter propeller. An example of

the rolling shutter effect is shown in Figure 3.2 where the vertical lines on the moving

train (borders of doors and windows) appear to be tilted in relation to still objects in

the picture.

Figure 3.2: The rolling shutter effect on fast moving train.

Figure 3.3 shows how the rolling shutter effect captures a fast blinking LED. The

example shown in Figure 3.3 depicts the position of a horizontal rolling shutter in a

scene in respect to time. The rolling shutter scans the scene to form a picture consisting

of M rows. A fast blinking light source (i.e. LED) with a blinking period of T exists

in the background of the scene. This will cause the finalized picture to be divided into

sets of rows where each set of rows represents either on or off state of the light source.

Figure 3.4 shows the same action in real application where two blinking LED light

bulbs are captured by a digital phone camera. The on or off states of the LEDs appear

as dark and bright bands in the picture. Researchers have used this technique in order

to calibrate cameras for rolling-shutter rectification in [72].

13

Row 0

Row M

.

.

.

Resulting Picture

On State

Off State

Rolling Shutter

Position

Legend

Time

T

TPeriod of the blinking LED

Figure 3.3: The rolling shutter effect on blinking LEDs

Figure 3.4: Fast blinking light sources captured by the rolling shutter

Combining modulated visible light and the effect of the rolling shutter has been a

point of interest for researchers. For example, Rajagopal N. et al. in [73] presented

a technique for sending data from solid-state luminaries (i.e. LEDs) to rolling shutter

14

cameras on smartphones with a focus on communication channel using Frequency

Shift Keying (FSK). However, the server-end of their receiver is implemented using

Matlab which is not appropriate for a real-time system because of the inefficiency and

delay that Matlab imposes to the system. Moreover, offline recorded video contents

do not imply the characteristics of a real-time system. In addition, the camera-end of

their receiver is controlled by a specifically implemented application [74]. This way

the end-users lose their liberty to select their desired camcorder software applications

and are forced to use customized applications.

A similar approach in [75] is used to do accurate indoor positioning using LED

lights and smartphone cameras. In this technique, the LEDs are broadcasting a con-

stant landmark beacon while the camera captures pictures and sends them to a cloudlet

in which the images are processed in order to identify the landmarks. The accurate

position is then calculated based on the angle between a light source and the cam-

era. Moreover, Matsumoto Y. et al. have developed a CMOS sensor for positioning

purposes in visible light ID systems [76].

Moreover, researchers in [77] suggest a camera-based VLC system which uses

screen displays as the sender and transmits the data by modulating the density of color

channels (i.e. red, green and blue). Although this approach appears to be suitable for

barcode scanning [78] (e.g. Quick Response (QR) code), the limitations of display

screens in luminosity and refresh rate do not allow faster data transmission and further

advances in VLC systems built using display screens.

3.2 A Video Synchronization System

Figure 3.5 illustrates the architecture of the video synchronization system that uses

visible light as its reference for synchronization. The light sources are controlled by a

microcontroller which is placed between the power supply and light sources. In this

prototype a standard type of commercial LED light bulbs are used that can be plugged

into normal AC power sockets. Since LEDs are powered by DC current, the light bulbs

are equipped with a built-in AC to DC converter.

The microcontroller can be pre-programmed or can receive commands in real-time

through a serial connection (or an interface of the serial connection e.g. USB, WiFi,

etc.) from another device with human interaction, e.g. another microcontroller, PC

or handheld device. The microcontroller is then responsible for interpreting the com-

15

mands and translating the information into light-compatible pulses.

Signal Generator

Serial Connection

AC Supply

AC/DC

Converter

Smart Phone

Camera

Virtualized

Syncronization

Servers

Live Media

Stream

Line of

Sight

Media Stream

Players

Wired

Connection

Legend

Unsynchronized

Live Media

Stream

Synchronized

Live Media

Stream

LED Light Bulbs

Figure 3.5: Architecture of the video synchronization system

The transmitted information are then captured by smartphone cameras and streamed

to the synchronization servers through network. The synchronization server is placed

in a virtualized environment and receives the incoming multimedia streams. In the

meantime, the multimedia streams are broadcasted out of the server for playback. In

this way the receivers (whether a director in use cases explained in Chapter 2, an end-

user receiver or even an other broadcasting server) are not aware of the activities of the

middle servers, hence a level of transparency is provided at this stage. While the mul-

timedia streams are being streamed out, the system looks if any of the video streams

require synchronization. In case the video streams are out of sync, the system imposes

the needed amount of delay to the early stream. In this way, the outgoing video streams

get synchronized.

The isolation provided by the virtualized servers makes cloud environment a suit-

able platform for the deployment of this system. In addition, cloud computing can

provide a scalable and elastic environment for this application as the number of incom-

ing or outgoing video streams changes over time. This can be achieved by allocating

more virtual machines in the cloud environment as more users start uploading video

16

content to the cloud.

As explained in Chapter 2 the existing methods of video synchronization are not

suitable for a cloud-based and heterogeneous video streaming system. Hence, the

means of VLC are exploited to embed the necessary information required for video

synchronization in the video stream. The information sent by the light sources can

represent timestamps that can be used as checkpoints for the synchronizer server. Fig-

ure 3.6 shows how this process takes place.

Video stream

Decode frame

Process frame

Frame contains

checkpoints?

Video buffer

...

Read

frame

No

Send PTS to

main thread

Collect PTS data

Stream frame

Δt != 0

Impose

delay = Δt

PTS

Video

stream

PTS

Yes

Yes

From main

thread

From other

streaming threads

Figure 3.6: Inter-frame synchronization

In Figure 3.6, a streaming thread refers to a software thread which handles the

video streams. These threads are generated by the main thread (orchestrator) for every

incoming media stream. The incoming video frames are buffered, decoded and then

processed (explained in more details in Chapter 4.3). In case a video frame contains

the necessary transmitted information (a time stamp or a flag), the position of that

17

particular video frame in the video stream is used as a checkpoint for alignment with

other video streams. In this case the Play Time Stamp (PTS) of the frame is sent back to

the main thread. The main thread uses the returned PTSs to determine the actual time

difference between identical frames in separate video streams. This time difference (if

existing) is then used as reference for delaying the earlier media stream.

The codec structure of the video stream might provide several types of time stamps.

For example, in H.264, there are two types of time stamps available for a frame,

namely: Decode Time Stamp (DTS) and PTS. The PTS of a frame is a field in the

frame’s meta data section which indicates the time that the video frame should be

played back by the video player. Similarly, DTS refers to the time that the frame

should be decoded. Since some video frames might be used as a reference frame for

other future or past video frames, the value of DTS and PTS might not be the same.

The picture in Figure 3.7 illustrates an example of a closed Group Of Pictures

(GOP) in a video stream and Figure 3.8 illustrates an example of an open GOP. The

arrows in Figure 3.7 and Figure 3.8 are indicating the data dependency between each

of these frames. I-frames and P-frames are reference frames and B-frames are non-

reference frames. The λ value depends on the frame rate of the video stream (e.g.

33ms for a 30fps video stream). The ǫ value is here to make sure that the frame

is decoded before its playback time, in practice this value can be proportional to the

amount of time that it takes to decode one frame. For simplicity reasons, the starting

time for this GOP is assumed to be 0. However, in practice if this value is t (where

t =!0) the DTS values would be t − ǫ, t + λ − ǫ, t + 2λ − ǫ, t and so on. The same

pattern would also apply for PTS values.

I B B P B B P B B P

5λ 6λ3λ 4λλ 2λPTS: 8λ 9λ7λ

B P

10λ 11λ0

5λ-ε 3λ0

4λ-ελ-ε 2λ-εDTS: 8λ-ε6λ7λ-ε 10λ-ε

9λ0-ε

GOP n

Figure 3.7: Closed GOP

In an Open GOP the B-frames at the beginning of the GOP are depended on the

18

reference frames in the previous GOP.

I B B P B B P B B P

5λ 6λ3λ 4λλ 2λPTS: 8λ 9λ7λ 10λ 11λ0

5λ-ε3λ

04λ-ε

λ-ε2λ-ε

DTS:8λ-ε

6λ7λ-ε10λ-ε

9λ0-ε

GOP n

B BP IB B ...

GOP n+1GOP n-1

...

12λ 13λ 14λ 15λ

11λ12λ13λ-ε

14λ-ε

Figure 3.8: Open GOP

The PTS and DTS values are assigned to each frame by the encoder at encode

time. Moreover, the video frames might not be streamed in the same order as they

were encoded. It is more practical if the reference frames are sent sooner than the

other frames. This means that although the frame sequence after encoding phase is

IBBPBBP. . . , but the received frame sequence after network transmission might be

IPBBPBB. . . . Eventually, since there is no control over the video encoding and video

transmission phases, the PTS is selected as the main time stamp reference. The actual

time stamp has to be calculated based on the time base units of the video stream.

In order to impose the necessary amount of delay on the early video stream, the

responsible thread for the video streaming should skip outputting a certain number

of frames (equivalent to the required delay). This can be achieved by using a seek

method in the video buffer or repeating on sending the current video frame for a cer-

tain amount of time (i.e. freezing the picture). But manipulating the PTS/DTS fields

in video frame’s metadata might not work properly on all video players. This is be-

cause many video stream players assume that receiving a non-monotonically increas-

ing video stream is due to network packet loss, hence they ignore this time offset and

continue playing the video stream normally.

3.3 Summary

In this chapter we introduced the basics of VLC and explained how VLC can take

place through conventional digital cameras. This can be done by taking advantage of

the rolling shutter effect of digital cameras. Although the rolling shutter effect was not

19

a design intention at the beginning, its side effects can be harnessed for other applica-

tions. In addition, this chapter explained how video synchronization becomes possible

by the means of VLC. Finally, the initial set-up and architecture of the synchronization

mechanism was described. More details on implementation aspects of this design will

be presented in the next chapter.

20

4 DESIGN AND IMPLEMENTATION

This chapter explains the implementation details in three different subsections. Sec-

tion 4.1 describes the physical component requirements for setting up a controllable

LED lightening system. Section 4.1 also covers the details about considerations regard-

ing the high voltage characteristics of commercial LEDs. Other transmitting details

such as modulation, bandwidth and physical improvements are given in Section 4.2.

Finally, the aspects related to design and implementation of the receiver is given in

Section 4.3.

The implementation of this work is split into hardware and software parts. The

hardware design starts with setting up the electronics and LED light bulbs to form

a basic VLC transmitter. This part continues with interfacing and programming the

microcontrollers and defining the means of transmission in the physical layer. The

software implementation consists of video streaming (from smart phone cameras to

media servers and from media servers to external media players), bit pattern detection

and machine vision.

The performed improvements and optimizations in this work are also presented in

this chapter while the implementation for each part is being explained. Other optimiz-

ation methods and improvement suggestions that were out of scope of this thesis will

be presented in Chapter 6. Moreover, some performance optimization techniques are

given in Chapter 5.

This implementation applies outsourcing techniques to perform some operations

of the presented solution. A. Bingham and D. Spardlin in [79] explain how open

source can be beneficial as a tool for outsourcing. Hence, taking use of open source

software and hardware is one outsourcing technique that has been adopted in this work.

Although, it is crucial to perform several investigations to manage diversity and risk

sharing in the process of outsourcing [80], this thesis will not be covering the whole

process of investigation in details.

C and C++ languages are selected as the main programming languages. The open

21

source API of FFMPEG [81] is used to perform the network based streaming oper-

ations and transcoding computations. OpenCV [82] libraries are utilized to perform

machine vision operations. The main hardware components are built using the Ardu-

ino [83] open source platform. The lightening components are consisted of commercial

LED light bulbs (IKEA Ledare).1

4.1 Physical Layer and Devices

This section describes the electronic components and circuitry needed to implement

the infrastructure for the transmitter end. In order to be able to control the light bulbs

in a desired manner and send the information through them, a micro-controller or any

other central/distributed logical controller is needed. The high voltage/power nature of

commercial LED light bulbs requires a level of safety and isolation from the controller

side of the physical layer. This causes the circuitry for this project to be more complex

than a common micro-controller driving a small LED.

4.1.1 Electronic Components

An important feature of VLC is its ability to embed the communication within the

ambient lighting of the environment [84]. In order to exploit this feature and provide

enough luminance for this communication, high power LED light bulbs are advised to

be used. This prototype uses the IKEA Ledare E27 LEDs which can produce 400 units

of luminance. Figure 4.1 shows a sample Surface Mount Device (SMD) LED plate of

the same light bulb.

1http://www.ikea.com/se/sv/catalog/products/70266765/

22

Figure 4.1: SMD LED plate of IKEA Ledare E27.

Figure 4.2 shows basic circuitry for modulating a low power LED. However, nor-

mal micro-controllers are not able to provide enough electricity power to drive our

targeted high power LED light bulbs as these micro-controllers function under low

voltage levels. Therefore a middle component is needed between the micro-controller

and the LEDs with the ability to be pulsed by the micro-controller and also drive the

LEDs.

1KΩµC

Generated Pulse

Figure 4.2: Circuit schematic for modulating low power LED.

A common component to connect high voltage circuits to low voltage circuits is re-

lay [85]. However, the electro mechanical nature of relays makes them too slow to sup-

port high frequency pulsing, hence this task should be done by a fast electronic com-

ponent such as Bipolar Junction Transistor (BJT) [86] or Metal-Oxide-Semiconductor

Field-Effect Transistor (MOSFET) [86] transistors with support for high voltage and

high switching speed. In case the LED package comes with an electronic built-in

or separate driver, the micro-controllers can be simply placed between the driver and

LEDs. Otherwise, all low-speed electric components must be detached and replaced

with CMOS-based equivalent circuits. The Darlington Pair [87] is a well-known pair

of transistors consisting of two BJT transistors which behaves similar to a single tran-

sistor but with a high current gain. In our implementation high voltage TIP120 [88]

23

ICs are used. These ICs implement the Darlington Pair as shown in Figure 4.3.

B

7KΩ

70Ω

C

E

Figure 4.3: Darlington Pair implemented in TIP120 ICs.

4.1.2 Circuit Isolation

An optocoupler is used to isolate the low power circuits (i.e. microcontrollers) from

the high power section. Optocoupler (or opto-isolator) is an electronic component that

transfers electronic signals between two isolated circuits by the means of light [89].

The optocoupler (shown in Figure 4.4) has two main components: an LED that

converts electronic signals to light and a phototransistor. A phototransistor [90] is a

transistor (similar to a photodetector explained in 3.1) sensitive to light, meaning that

its base connection can be triggered by light wavelengths rather than electricity. In this

implementation the CNY17 [91] ICs are used for isolation purposes.

LED

Electric Insulator

Photo-

transistor

Input Signal

Output Signal

Vcc

GND

Figure 4.4: Optocoupler.

The CNY17 package provides an isolation voltage up to 5000 volts, the input can

24

be triggered by 1.4 volts (which is enough to be provided by an Arduino output pins)

and the switching characteristics are fast enough to support the needed modulation

speed in our application.

4.1.3 Circuitry

The complete circuit of the transmitter is shown in Figure 4.5. The microcontroller

used in this prototype is an ATmega32 [92] that is mounted on the Arduino Uno [93]

board. Arduino Uno is a single-board microcontroller that provides easy-to-use proto-

typing functionalities.

The board has a built-in ATmega16U2 [94] microcontroller that provides serial

connectivity to the ATmega32 chip through USB port. Moreover, the API libraries

provided in this work can interface the serial connectivity. This makes the application

development more portable and allows the developers to be able to communicate with

the ATmega32 microcontroller through other types of channel as well (e.g. Wi-Fi, BT,

etc.). Examples of usage of the API are given in Appendix A.2.

ArduinoUNO

GND

18

12Optocoupler

CNY171KΩ TIP120

B C E

Vcc: +40VVcc: +5V

Array

of

LEDs

GND

ATM

EG

A32

16U2

USBPower Jack

1KΩ

1KΩ 1

KΩ

GNDGND

Figure 4.5: Circuit schematic for the transmitter.

The role of resistors in Figure 4.5 is to cancel any unwanted noise at the gate of

the transistors (a.k.a. current-limiting resistor [95]). Depending on the threshold value

of the transistor, a small amount of noise might trigger the gate and switch on/off the

transistor. A picture of hardware components of the prototype is shown in Figure 4.6.

25

ATMEGA32 Mounted

On An Arduino

Light Bulb

Cover

Array of LEDs

Tip120 ICs

AC/DC

Converter

Optocouplers

Figure 4.6: Hardware components of the prototype.

4.2 VLC Transmitter

This section covers the aspects related to the definition, selection and transmission of

carrier frequencies. Other aspects such as quality of experience and bandwidth that are

directly related to the selection of carrier frequencies are also surveyed in this section.

This section starts with explaining the modulation techniques that are used (or have

been researched) for VLC systems. After selecting a modulation technique that fits the

requirements of a camera-based VLC system the bandwidth constraints are explained

followed by explanation of frequency selection process. Finally, we explain how ma-

nipulating some characteristics of the modulating frequencies can improve the quality

of experience. Some tutorials on usage of the implemented APIs for transmission are

given in Appendix A.1 and A.2.

26

4.2.1 Modulation Techniques

Modulation is a key factor for a VLC system. The proper selection of the modula-

tion technique can improve the performance and accuracy of a VLC system [96] [97].

There are several methods of modulation suggested in literature with comparison of

their advantages and disadvantages. However, not all of the modulation techniques are

suitable for a camera-based VLC system.

Authors in [11] categorize modulation techniques for VLC into three groups: On-

Off Keying (OOK), Pulse Position Modulation (PPM) and Color Shift Keying (CSK).

OOK being the simplest one is based on the brightness intensity of the light (high

density light luminance represents logical one and low density or off state represents

logical zero). The combination of OOK modulation and a coding scheme can provide

beneficial utilization of bandwidth [96]. However, the impact of ambient noise makes

OOK modulation unsuitable for non-isolated environments and camera-based VLC

systems. PPM variations are based on the pulse position. PPM techniques are generally

more complex to implement due to synchronization requirements of the sender and the

receiver [98]. Hence, PPM variations are not suitable for camera-based VLC systems.

However, they are widely used in systems with photodetectors [99] [100]. CSK is

achieved by modulating the color components of the white light. The limitations reside

in both sender and receiver side. The LED lights should support different colors and

provide an interface for the colors to be manipulated. The receiver should also be able

to detect different colors and demodulate the transmitted information.

Spatial Modulation [101] is a new modulation technique which is done by mapping

the bit patterns into constellations of light sources detectable by the receiver. The trans-

mitter and receiver have to be placed in a constant position in space without movement,

therefore this technique is not applicable for our purpose.

Because of the characteristics of the rolling shutter (explained in Chapter 3.1.1)

and LEDs the Frequency Shift Keying (FSK) is the best candidate for modulation in

our implementation [102] [73]. The OOK, PPM and FSK modulation techniques are

shown in Figure 4.7.

27

OOK

Time

1 0 0 0 01 1 1

FSK

Time

1 0 01

PPM

Time

Clock

1 0 1 0

Figure 4.7: Modulation techniques

In FSK different frequencies are used to represent different logics and symbols.

A blinking light has a frequency corresponding to the intervals between its "on" and

"off" states. Various sets of frequencies are used to present different bit patterns. On

the receiver side a Fourier Transform on the received image can help determining the

magnitude of the present frequencies in the frequency spectrum. More details on Four-

ier Transform is given in Section 4.3.3.

28

4.2.2 Bandwidth Limitations

This section explains the practical restrictions that limit the bandwidth of the trans-

mission frequencies. These restrictions are categorized into two groups, the first being

limitations that are imposed by the characteristics of the human body and the second

being limitations caused by video compression techniques. Both categories are ex-

plained in the following.

The second category of restrictions are caused by the limitations in the receiver

side. However, these limitations impose some design regulations that has to be dealt

with in the sender side. Therefore, in order to explain the design regulations related to

the transmitter some of the results acquired by the receiver are presented in this section.

The receiver side is explained in details in Section 4.3.

Restrictions Caused by Human Eye

A light source which blinks faster than a higher rate seems to be constantly on by the

human eye. This fact benefits the VLC to happen by being embedded in the existing

ambient light of the environment. However, a blinking light source with a frequency

lower than a certain threshold can be detectable by human eye [103] [104] [105] [106]

which can cause negative or harmful physiological changes in humans [11].

In order to avoid flickering, low frequencies should not be used as a carrier for in-

formation transmission. This will limit the lower band of the transmission bandwidth.

Although there is no consensus on what this limit should be [107] [108], recent studies

in [109] suggest that any frequency above 200Hz is considered to be safe. However in

our experiments frequencies below 1kHz were easily perceptible. Hence, 1kHz is set

to be the lowest carrier frequency used in our system.

Restrictions Caused by Video Compression

The bandwidth is more limited when the receiver sides in the post-compression stages

of the video stream. The side effect of the video compression is because many of the

common video codecs (e.g. H.264) use a block-based video compression technique

[110] by applying a low-pass filter which leads to a blurry effect in the portions of

picture that contain high frequencies.

Although the sample rate of the camera (one row of pixels at a time) defines the

highest possible frequency (i.e. 10.8kHz in our settings), narrowing the period of the

29

flashing lights down to two rows of pixels (one row for dark band and one row for

bright band) does not provide accurate results. Using very high frequencies leads to

having gray pixels instead of precise dark and bright ones. Equation 4.1 shows how the

maximum frequency for a horizontal rolling shutter can be calculated. In Equation 4.1

R is the number of rows in one video frame and Fps is the frame rate of the video

stream. The reason that this value is multiplied by 2 is that half of the period (one row)

is dedicated to bright band and the other half is dedicated to dark band. For example

in a 30fps video with 720p resolution the maximum blinking frequency for the LEDs

will be 10.8kHz.

Fmax =2

Fps ×R(4.1)

However, in our experiments the highest on-off keying frequency could not exceed

3.2kHz due to the side effects caused by video compression. Figure 4.8 shows this ef-

fect in practice. As it can be seen in Figure 4.8a, the portion of picture that contains the

high frequency data, the dark and bright bands are merged together causing a blurry

result which cannot be detected in the Fourier Transform of the picture. The Fourier

Transform of the same picture is shown in Figure 4.8b. On the contrary, the picture

shown in Figure 3.4 depicts a clear shot of the carrier frequency (1.8kHz in this ex-

ample) where dark/bright bands are distinguishable. The FT of the video frame shown

in Figure 3.4 is presented in Figure 4.9 where the existence of the carrier frequency

appears as bright dots in the corresponding frequency bins.

30

(a) Blured frame carrying a 3.5kHz signal

(b) Fourier Transform of the frame shown in (a)

Figure 4.8: Block-based video compression effect

31

Figure 4.9: FT of the frame shown in Figure 3.4

Moreover, the intra-frame motion estimation mechanisms used in video encoders

[110] causes another side-effect that can affect the accuracy of the receiver in detecting

the bit patterns. Our experiments reveal that motion estimation can introduce parasite

frequencies in non-reference frames (i.e. P and B frames) leading to a higher SNR. The

red marked rectangle in Figure 4.10 shows a portion of the frame in which the previous

frame (which was also a reference frame) was carrying a frequency. A comparison

between this portion and the rest of the frame shows the after-effect of the block-

based motion estimation. The Fourier transform of the frame in Figure 4.10a is shown

in Figure 4.10b. The red marked circles show the notch effect caused by the block-

based motion estimation. The marked notched points might confuse the receiver by

indicating the existence of a frequency.

32

(a) Non-reference frame coming after a high frequency carrying frame

(b) Fourier Transform of the frame shown in (a)

Figure 4.10: Block-based motion estimation effect

4.2.3 Frequency Selection

As explained in Section 4.2.2, the lower and upper bounds of the bandwidth are restric-

ted by many factors. In this prototype the carrier frequencies can be selected between

1kHz and 3.2kHz. In spite of that, there are few other factors that have to be taken into

33

account when choosing the carrier frequencies.

One key point in this process is considering guard bands. Guard bands are gaps

placed between the adjacent frequencies in order to prevent Inter-Symbol Interference

(ISI) [111]. This concept is shown in Figure 4.11. Consequently, the presence of guard

bands will consume more bandwidth [112].

Frequency domain

Magnitude

Guard band

f1 f2 f3

Figure 4.11: Guard intervals.

Authors in paper [75] suggest a 200Hz guard band between adjacent frequencies

for a VLC system based on mobile cameras. Given the limitations mentioned earlier,

theoretically there are ten separate frequencies that can be used in this implementation.

But, as explained in the following, in practice the transmission is vulnerable to more

limiting factors that has to be taken into account.

Another key point in frequency selection is the effect of harmonic frequencies.

Taking a fundamental frequency with value f , the harmonic frequencies are 2f , 3f

and so on [113]. For example, a fundamental frequency of 1kHz (first harmonic being

itself) has a second harmonic frequency of 2kHz, third harmonic of 3kHz and so on.

Generating a high frequency square wave such as 1kHz will naturally lead to genera-

tion of it’s odd harmonic frequencies (in this example 3kHz, 5kHz, etc.) as well, but

with a lower energy [114]. Figure 4.12 shows this phenomenon in time domain and

frequency domain.

34

Transmitted signal

Time

Frequencyf 3f 5f 7f

Received frequencies

Time

Frequencyf 3f 5f 7f

Figure 4.12: Odd harmonic frequencies in a square wave.

One should be careful when selecting the carrier frequencies, because for example

intending to transmit a 3kHz signal at the sender might cause the receiver to detect

3kHz and 1kHz mistakenly. Proper threshold definition on the magnitude property of

the received signals can solve this problem to some extend, however this problem gets

much more complicated when a combination of different frequencies are used to rep-

resent one symbol. This phenomenon is known as harmonic distortion and measured

by Total Harmonic Distortion (THD) [115].

Eventually the number of applicable frequencies is reduced to five. The set of

defined frequencies for this implementation is given in Table 4.1.

Index Frequency Period/2 Next guard interval

1 1.077kHz 464µs 440Hz

2 1.51kHz 330µs 480Hz

3 1.99kHz 251µs 410Hz

4 2.4kHz 208µs 620Hz

5 3.02kHz 165µs ∞

Table 4.1: Defined frequencies

4.2.4 Lifetime of Frequencies

The lifetime of a frequency refers to the time period that the light sources should blink

with that frequency. The frequency lifetime has an impact on accuracy and robustness

35

of the system, therefore proper selection of this value is necessary.

Long frequency lifetime reveals a higher magnitude per frame in the receiver side.

This is because of the increased SNR in the communication channel. If the receiving

carrier frequencies are distinguishable with a high magnitude contrast compared to

other (noisy) frequencies, the thresholding mechanism becomes easier and more robust

which results in a more accurate throughput. Thresholding is explained in more details

in Section 4.3.2.

In this work, the minimum value for the frequency lifetime is set to be two times

larger than the minimum detectable value by the receiver. Meaning that if the receiver

is able to detect frequencies with a minimum lifetime of λ, then the actual dedicated

time for a frequency will be set to 2λ. The reason is that the computational unit of

this implemented system is one video frame and there is a possibility that a transmitted

frequency will not be able to completely fit into one frame. Therefore, in the worst

case scenario it is possible that half of the lifetime of the frequency gets captured in

the nth frame and the other half gets captured in the n+ 1th frame. Hence, the receiver

should still be able to detect the transmitted frequency in one frame. This scenario is

shown in Figure 4.13. A solution to this problem is discussed in Chapter 6.

36

f1

lifetime λ/2

f1

lifetime λ/2

nth frame, detected frequencies: none

(n+1)th frame, detected frequencies: f2

f2

lifetime λ

Figure 4.13: Breaking of a frequency in two frames.

In our experiments, the minimum lifetime of a frequency could be narrowed down

to 4.125 milliseconds (i.e. one eighth of a frame taking 33 milliseconds). But be-

cause the number of available frequencies is limited to six, the 2λ value is set to 5.5

milliseconds (i.e. one sixth of a frame taking 33 milliseconds). Filling this gap by

lengthening the lifetime of frequencies increases the energy of signals and leads to a

higher SNR.

4.2.5 Flicker Improvement

One challenge in embedding VLC in surrounding ambient light is to avoid flicker and

change in illumination. The flicker in this case does not only mean the speed of flashing

37

light (explained in Section 4.2.2) but also a short change in illumination can also be

perceived as a flicker.

The reason is because although human eye cannot detect the on-off states of a fast

blinking light source, the blinking effect will be perceived as a constant beam of light

with a luminance equivalent to the average luminance of "on" and "off" states of the

light source. This means that if a blinking light source is on for 50% of the time with

its highest luminance capacity of Lmax, then the perceived constant luminance will be

Lmax/2. Using the Equation 4.2 [116], this method (known as Pulse Width Modulation

or PWM) has been in use for dimming purposes [11] where altering the current is not

possible [117].

Lv = Lmax

τonT

(4.2)

In Equation 4.2, T is PWM period, Lmax is the maximum luminance (100% duty

cycle), τon is the PWM pulse duration when the LED is on and Lv is the resulting

luminance. Obviously using frequency modulating techniques for VLC purposes will

cause a change in the level of perceived luminance and in case the communication

period is short, this change in luminosity will be perceived as a flicker.

There has been a lot of research done to provide a solution for the flicker prob-

lem and also to provide dimming support for VLC based LED light sources. Au-

thors in paper [118] suggest a modulation method called Pulse Dual Slope Modulation

(PDSM) for flicker improvement in VLC. PDSM is a variation of Pulse Slope Mod-

ulation (PSM) which modulates the signal by changing the slope of the leading edge

of the pulse. However, PSM modulation techniques do not seem to be appropriate

for camera-based VLC systems. Overlapping Pulse Position Modulation (OPPM) is

suggested in [97] as the best candidate for joint dimming control and high data-rate

communication, but PPM modulation families require sender and receiver to be highly

synchronized.

One way to improve flicker when using FSK as the modulation method is to modify

the duty-cycle of the modulating signal without changing its frequency [119]. This

means that when a frequency shift in the carrier frequency happens, the duty-cycle for

the new frequency should be changed in a way that could compensate for the bright-

ness alternation caused by the frequency change. The picture shown in Figure 4.14

illustrates this method. Note that L in Figure 4.14 refers to the maximum luminance

that LEDs can support in a 100% duty-cycle.

38

Frequency: 3F/2

Luminance: L/3

Duty-cycle: 50%

Frequency: 3F/2

Luminance: L/2

Duty-cycle: 62.5%

Frequency: F

Luminance: L/2

Duty-cycle: 50%

Figure 4.14: Flicker compensation using duty-cycle modification.

Figure 4.14 shows that increasing the frequency can reduce the perceived bright-

ness and how increasing the duty-cycle by 12.5% can compensate for the brightness

loss. However, the limitations of this methods and its effect on camera-based sensors

should be studied more. For example, the author in [119] claims that a duty-cycle

higher than 75% can lead to erroneous sampling output.

Another method to improve the flicker effect is to add a DC bias to the transmitting

signal [120]. Meaning that the logic zero would not be represented by the off state

of the LEDs but by a state with a reduced distant to the on state. Nevertheless, this

method can highly reduce the SNR and accuracy.

4.3 VLC Receiver

This section explains the design and implementation of the receiver. It starts with giv-

ing an abstract overview of the camera-based VLC receiver. The video synchronizer

software (explained in Chapter 3) is developed on top of this platform as a use case.

Furthermore, the machine vision techniques used in this implementation are also ex-

plained in this section.

39

4.3.1 Architectural Overview

As explained in Chapter 3 of this thesis, the receiver side of the system consists of two

parts. The first part is the smartphone camera which captures videos and streams them

for further processing. The second part receives the live streams from the cameras,

handles the machine vision tasks and performs the targeted application (i.e. video

synchronization in this case). This part sits in the server side of the system.

The flowchart shown in Figure 4.15 illustrates the procedure implemented in the

receiving server. The procedure starts by initializing the network configurations and

registering the necessary media codecs in the system. For scalability purposes, the

system creates an individual thread for each incoming video stream. The incoming

video stream is buffered in the memory until a decodable frame is ready. After the

video frame is decoded, it will be converted to YUV format as the image processor

only needs the Y plane (luminance) of the picture. The buffering, decoding and con-

versions are implemented using FFMPEG libraries. Meanwhile during buffering time

the same video stream is being restreamed and broadcasted outside of the system. In

case the quality enhancement techniques (explained in Chapter 5) are to be used, the

restreaming should be performed by encoding the previously decoded frames.

40

Initialization

Define receiving ends

Thread generator Video buffer

Picture conversion

Image processor

Video decoder

Message handlerBroadcast handler

Start

OnReceive

End

Figure 4.15: Receiving server.

The Y plane section of the picture is then converted to a matrix compatible with

OpenCV’s Mat data type. Using OpenCV API the matrix is then converted to its fre-

quency domain by a Discrete Fourier Transform (DFT) algorithm. These steps are

shown in Figure 4.16. Once the frequency domain is obtained, the post-DFT compu-

tations take action and the system looks for the magnitude of the desired frequencies

41

using a dynamic thresholding algorithm explained in Chapter 4.3.2. Post-DFT compu-

tations include centralization process (explained in Section 4.3.3), separation of mag-

nitude and phase planes (the result of DFT function is a complex number representing

the magnitude and phase components of the signal), and a logarithmic scaling to scale

the magnitude values between 0 and 1.

Video buffer

...Network packets

Decode frame

Convert to YUV

Separate Y plane

Convert to Mat

Time

domain

Frequency

domain

DFT

Decode message

Post-DFT

computations

Figure 4.16: Picture conversion and image processing.

The detected frequencies are then handed over the message handler where the mes-

sage decoding takes place. The messages are then passed to the broadcast handler. The

decisions -based on the received messages and the application- are made in this part.

In the case explained in Chapter 3.2 (i.e. video synchronization), the broadcast hand-

ler decides about delaying the necessary streams in order to provide synchronization,

and the appropriate delay will be calculated at this stage if necessary. Because FSK is

selected as the modulating technique, the message decoder should make a set of found

frequency elements and compare it to a pre-defined look-up table in order to decode

the transmitted symbols. In the example shown in Figure 4.19, the detected frequen-

cies are f1, f2 and f3 which can be (depending on the protocol) translated to 111 in

42

binary or 7 in decimal. An example of the lookup table in this implementation is given

in Appendix A.3 of this thesis.

4.3.2 Thresholding

In order to determine whether a picture contains a certain frequency or not, it is neces-

sary to evaluate the magnitude (a.k.a. energy or power) of the signal representing that

frequency. This evaluation can be done by comparing the magnitude value to a pre-

defined threshold value. As explained previously in this chapter, static thresholding

does not result in accurate measurements, therefore a dynamic thresholding mechan-

ism is used in this work.

The dynamic thresholding here means that instead of following a static approach

for every frame, the threshold value is determined for each frame by considering the

maximum range values, minimum range values and the quantity of these values in each

frame. This method (shown in Figure 4.17) was selected because transmitting the same

symbols in different takes (e.g. in different environments) does not always result in

exactly the same values in frequency domain in all cases (although the recorded takes

might look very similar to human eye). This is because the transmitted frequencies are

not immune to the noise imposed by the ambient background light and other elements

in the scene, and the non-deterministic nature of the captured videos can contribute

to this phenomenon significantly. This way of thresholding also helps neglecting the

effect of harmonic frequencies as explained in Section 4.2.3.

The process start by excluding a set of maximum and minimum values out of the

main set of frequencies. The median value of the remaining set is then used as an an-

chor to set the threshold value. Calculating the mean value in this stage is not advisable

because the high quantity of small and noisy signals might cause the anchor to fall in

favour of smaller signals. In this implementation, the threshold is set to be in the 20%

range above the median value.

43

Magnitude

Frequency

domain

Max values

Min values

Median value

Threshold value20%

Figure 4.17: Finding the threshold value.

However, extra care should be taken when evaluating the maximum values because

the DC coefficient component of the frequency domain might be mistakenly considered

as a maximum value, resulting in a poor threshold value selection. We solve this

problem by applying a notch filter on the DC bias bins of the matrix obtained after

centralizing the DFT. These steps will be explained in the following section. The high

value DC components are caused by sharp edges and borders of the picture.

4.3.3 Discrete Fourier Transform

The picture shown in Figure 4.18, represents a video frame carrying three different

signals. The key phenomenon in camera-based VLC applications is that the width

of dark and bright bands in the picture is proportional to the blinking frequency of

the LED light sources. In case the system is able to detect how often a dark/bright

band is repeated in the picture, it can roughly estimate the blinking frequency of the

transmitter.

44

f3

f2

f1

Figure 4.18: A frame carrying three frequencies.

To provide this ability, an implementation of the Fourier Transform algorithm

should be used. A Fourier Transform function as shown in Figure 4.19 takes a sig-

nal in time domain and gives back its converted frequency domain. The Equation 4.3

defines a two dimensional DFT [121]. In Equation 4.3, f is the signal in time domain

and m = 0 . . .M and n = 0 . . . N are the coordinates. The resulting Fourier transform

(F ) is also a two dimensional matrix of the same size.

F (m,n) =1

√MN

M−1∑

u=0

N−1∑

v=0

f (u, v) .e−2πi(mu

M+nv

N) (4.3)

There are several different algorithms and implementations of Fourier Transform.

In this work, the DFT function in the OpenCV’s API is used to get the desired result.

45

Time domain

Magnitude

f1 f2 f3

f1 f2 f3Frequency

domain

Threshold

DFT

Figure 4.19: Time to frequency conversion.

As explained earlier, in order to filter the DC components with a notch filter, it is

more convenient to centralize the resulted spectrum matrix [122]. This process starts

by dividing the spectrum matrix into four quadrants and ends by swapping these quad-

rants diagonally. The centralization process is demonstrated in Figure 4.20. In this way

the DC bias values gather in the center of the matrix and the transmitted frequencies

line up in the middle column.

46

Q1 Q2

Q3 Q4

Q4 Q3

Q2 Q1

Spectrum matrixDC DC

DCDC

DC

Figure 4.20: Spectrum centralization.

Digital pictures are considered as two dimensional matrices, therefore it is common

in machine vision techniques to perform a two dimensional DFT on pictures as well.

However, authors in paper [73] argue that performing a one dimensional vertical Four-

ier Transform is sufficient for VLC-based purposes because turning the camera will

not affect the rolling shutter effect and the captured dark/bright bands in the picture

will always remain horizontal.

This argument becomes reasonable by doing comparative research on computa-

tional and memory complexity analysis of the two methods. The computational com-

plexity of two dimensional DFT (assuming width = height) is O(N3) (which can

be reduced to O(N2log2N) if Fast Fourier Transform (FFT) is used) [123], while the

complexity of a one dimentional DFT is O(N2) and O(Nlog2N) with FFT [124].

Nevertheless, the authors in paper [73] do not suggest any techniques on how to

perform the one dimensional Fourier transform on a two dimensional matrix nor study

the trade-off between 2D/1D Fourier transform and accuracy (i.e. SNR value). Con-

version from two dimensional matrix to one dimensional vector (array) could be an

approach to exploit the efficiency of using one dimensional Fourier transform, yet how

this conversion should be performed is still a challenge. One way to convert the two

dimensional matrix into a vertical array can be computing the average value of each

row (or average of samples of each row) and storing that average value as a vector ele-

ment representing the corresponding row. However, the impact of background objects

might cause bright bands to form a gray pixel in average and decrease the SNR value.

47

OpenCV’s DFT function will select an implementation of FFT in case the matrix

size would make FFT faster than DFT [125]. This condition is satisfied when the

size of width and height of the matrix is a power of the number two (i.e. 2, 4, 8,

etc.). To take advantage of this property, the matrix is padded (if necessary) with extra

columns/rows with zero valued elements (i.e. black pixels). This process is called

zero padding [126] [127]. The picture shown in Figure 4.21 demonstrates the two

dimensional Fourier transform of the picture shown in Figure 4.18 using OpenCv’s

DFT function.

Figure 4.21: DFT conversion of the picture shown in Figure 4.18.

The brightness of each pixel in Figure 4.21 represents the magnitude of the fre-

quency in the corresponding bin. In case the camera has a vertical rolling shutter, the

shining dots will be located on the middle horizontal line of the frequency spectrum.

4.4 Summary

In this chapter we explained the details of the design and implementation process of

the camera-based VLC system. This chapter started by describing the necessary hard-

ware components that are needed to form the physical layer of the communication.

Furthermore, we explained the transmission concepts and issues such as modulation

techniques, bandwidth constraints and the process of carrier frequency selection. Fi-

nally, the design and implementation details of the camera-based VLC receiver were

48

presented. The implemented receiver mainly consists of video stream handling and

image processing modules. In the next chapter we evaluate the implemented system

by performing actual tests and benchmarking.

49

5 RESULTS AND EVALUATION

Performance and quality are key factors in evaluating a real-time system. Midway

computations in a live multimedia service can lead to intolerable latency and/or distor-

ted quality. For example, in a video streaming service it is necessary that the video can

be played back in the desired frame rate, therefore it is needed to optimize or eliminate

the blocking computations that can lead to an unacceptable experience.

This chapter evaluates the implemented prototype under different circumstances

such as performance and quality by analysing the behaviour of the implemented soft-

ware. Few relevant suggestions and techniques for further improvement are also given

in this chapter. More details on future work possibilities are discussed in Chapter 6.

5.1 Evaluation Setup

In order to provide an isolated test environment for the implemented VLC system

it is necessary to feed the synchronization servers with video streams that actually

have unaligned identical frames. Figure 5.1 illustrates this scenario. To provide the

fake delay between the video streams three seconds of blank frames are added at the

beginning of one of the pre-captured test videos. Moreover, a time stamp counter is

fused to the video frames in order to give a notion of time and synchronization events

to the tester (i.e. human eye).

50

1234

1234

1 2 3 4 5

1 2 3 4 5 5 5

6

6

56

5678

Video

stream 1

Video

stream 2

Video

Synchronizer

Video

stream 1

Video

stream 2

Synchronization point

Unsynchronized

playback

Synchronized

playback

Observer

Figure 5.1: Test scenario for the video synchronizer

In order to simulate the live video streaming environment for the synchronizer the

pre-recorded videos are streamed over the network using FFMPEG software. In this

way, the synchronizer can treat the video streams as they were live. Figures 5.2 shows

a screenshot of an example were two incoming video streams are played back in an

unsynchronized manner. The resolution of the videos in this experiment is 720p (i.e.

1280x720 pixels) and the frame rate is 30fps.

The example shown in Figures 5.2 indicates that approximately a delay of 3 seconds

exists between the two streams. Figures 5.3 show the screenshots of the same video

streams after being synchronized. The console log of the synchronizer server indic-

ates the detection of checkpoint information at around the 6th second of the first video

stream and 9th second of the second video stream. Hence, the synchronizer can calcu-

late the exact delay between two video streams (i.e. around 3.5 seconds) and impose

the necessary amount of delay to the earlier video stream.

51

Video stream 1

Video stream 2

Figure 5.2: Unsynchronized video streams on playback

5.2 Evaluation Benchmarks

In this section the performance analysis results are presented. Optimization and accel-

eration techniques are discussed in the next section. The characteristics of the testing

platform is summarized in Table 5.1. The CUDA programming model is exploited to

harness the features of GPU manycore units.

Processing Unit Number of Cores Speed Memory Vendor

CPU 4 2.6GHz 3.7GB Intel

GPU 96 1.25GHz 1GB nVidia

Table 5.1: Platform characteristics

The performance profiling results of the Intel platform are illustrated as a pie chart

52

Video stream 1

Video stream 2

Console log

Figure 5.3: Synchronized video streams on playback

in Figure 5.4. It can be seen that OpenCV’s DFT function is the most time consuming

followed by decoding and lookup operations.

DFT operation takes the first place by taking more than 53% of the application

execution time. Video decoding operations, stream handling and codec conversions

are (all performed by FFMPEG libraries) categorized into one group, taking 26.4% of

the whole execution time. Element lookup in Figure 5.4 refers to element-by-element

tracing of the magnitude matrix which results from the DFT operation. This operation

which is in total twelve times shorter than the DFT operation is ranked in the third

place. Any other operation that takes less than 4.5% of the whole application’s time

is categories under the Other group. This group mainly consists of other OpenCV

related operations (such as logarithm operations and type casts), thread handling and

other small calculations. Conversion of the video frames into OpenCV’s matrix data

structure also belongs to this category.

53

DFT

53.7%

Decoding and

codec conversions

26.4%

Other

15.4%

Element

lookup

4.5%

Figure 5.4: Performance profiling results

5.3 GPU Acceleration

Task consolidation is a key method in optimization and resource utilization in cloud

environments [128]. One way to achieve higher performance is to transfer data-parallel

tasks to co-processor units such as GPU [129]. The high computational requirements

of DFT/FFT operations and the advantages of GPGPU in performing data-parallel

tasks provides enough motivation to migrate the DFT operations to GPU units [130]

[131].

Table 5.2 shows the benchmarking results on different processing units. The µs/f

(microseconds per frame) unit in Table 5.2 indicates the time each task (or a set of

tasks) takes to be performed on one video frame. It can be seen that migrating DFT

computations to GPU units can speed up the performance by 62%. This comparison

also includes the time each frame takes to be uploaded into GPU memory and the time

it takes to download the results back to CPU memory.

Moreover, the data transfer between memory spaces can be reduced by 50% in

case the necessary calculations on DFT results are done by GPU. In this way there is

no need for huge matrices to be downloaded to CPU memory space again. At the time

of writing this thesis we were unable to get coarse access to matrix elements in GPU

54

memory space, therefore the matrices had to be downloaded back to CPU memory for

further analysis.

Computation Time

Method GPU CPUGPU Upload +

Download TimeGain

DFT5927

µs/f

24940

µs/f

3536

µs/f62,05%

Table 5.2: Results of GPU acceleration

Another advantage of acceleration with GPU is the ability to gain more perform-

ance by pipeline computing. The details of pipelining methods will be presented in

Chapter 6.2.

5.4 Quality

It is necessary to measure the impact of distortion caused by the transmitted VLC data

in the recorded media. Although the data transmission is not sensible by human eye in

the physical environment [132], the data transmission might become noticeable on a

playback of the recorded video. On the other hand, the accuracy of the system on the

receiver side can/should be increased by channel coding techniques such as Forward

Error Correction (FEC) [133] at the physical layer. These techniques usually require

allocating extra bits [134] which results in more dedicated space in the bandwidth.

Although there is a trade-off between the system accuracy and the quality of the

recorded video content, if the VLC transmission period is shorter than a certain amount

of time and the transmission intervals are long enough, the quality of the captured

video data would not be degraded by any distortion. The impact of distortion can be

measured by the proposed formula given in Equation (5.1):

QoS =t× α

Fps × τ(5.1)

wherein t is the lifetime of a video frame in milliseconds, α is the portion of a video

frame which contains a carrier frequency (value between 0 and 1), Fps is the frame

rate of said video in frames per second (fps) and τ is the time interval between each

55

two frames containing VLC signals in milliseconds. These relations are illustrated in

Figure 5.5.

Frame N

αt... ...

τt

Frame (M+N)/2 Frame M Frame M+1

Figure 5.5: Sequence of frames with maximum one VLC frame

Depending on the defined bit rate of the transmission, the lifetime of a carrier fre-

quency can be as short as a small portion of the total lifetime of a frame (e.g. one

sixth of a frame (α)). The lifetime of a video frame (t) in a common 30fps (Fps) re-

cording device is near 33 milliseconds. This makes the lifetime of our example carrier

frequency roughly 5.5 milliseconds. By setting the VLC transmission intervals (τ ) as

e.g. every 5 seconds, a video stream with a playtime of 5 seconds may contain only 5.5

milliseconds worth of VLC data. This short portion of data transmission is too small to

be noticed in a live playback and does not constitute more than 0.0036% of the whole

playtime.

In addition, before the final broadcasting action takes place, the so-called distorted

video frame can be dropped or replaced with its neighbouring frames. This method

does not impose noticeable changes in the playback of high frame rate videos and will

result in a distortion-free video sequence. However, this process should take place

after the video frames are decoded and encoded. Therefore, extra care should be taken

into account for the introduced delay to keep the real-time requirements satisfied. As

it is shown in Figure 4.15, one way to accelerate the broadcasting is to transmit the

unpacked video frames to the broadcaster before the video decoding and frame pro-

cessing take place. However, as long as the processing time for each frame (i.e. 25ms

shown in Table 5.2) does not exceed the amount of time equivalent to the lifetime of a

frame (i.e. 33ms for a 30fps video), the playback will operate smoothly.

56

5.5 Summary

In this chapter we explained how the implemented system can be evaluated in an isol-

ated environment. In addition, the proof-of-concept test results were presented in this

chapter. Moreover, the benchmarking results were presented with accelerations gained

by using GPU. Finally, quality measurement and improvement techniques were ex-

plained. Next chapter will summarize the conclusion, findings and future work.

57

6 CONCLUSION AND FUTURE WORK

This chapter presents the conclusions based on the performed work of this thesis in

Section 6.1, followed by a presentation of future work possibilities to improve and

advance the implemented VLC system in Section 6.2. Further discussions are also

presented at the end of this chapter in Section 6.3.

6.1 Conclusion

This thesis presents the implementation of a camera-based VLC system that can be

used for real-time applications in a cloud environment. Furthermore, the implemen-

ted VLC system was used to provide inter-frame synchronization of multiple video

streams. As explained in Chapter 2 earlier, one of the many use cases of video syn-

chronization is where a broadcasting director wishes to be able to view a number of

video streams in a synchronized manner.

The challenges and limitations in designing and implementing this system were

also presented. The given results in Chapter 5 revealed the sufficient accuracy and

agility of the system for a real-time cloud-based application. At the moment of writing

this thesis the final version of the implemented VLC system is able to support up to

150 bits per second (i.e. 5 bits per frame on mobile devices).

However, the concept of VLC is still emerging. Therefore, there is a need to per-

form more research on this topic. For example, this thesis discovered that communic-

ation through visible light and digital cameras is not immune to the impact of video

compression and motion estimation techniques. This phenomenon, however, needs to

be investigated and studied more. Moreover, the trade-off between flicker improve-

ment techniques and system accuracy needs more investigation for quality of experi-

ence purposes.

58

6.2 Future work

This section presents the further research ideas that can assist improving the existing

implemented camera-based VLC system.

Increasing The Bitrate

In this work the lighting sources were blinking with a unique frequency at a time.

However, one way to increase the bitrate of this communication is to transmit sev-

eral frequencies at a time. This can be achieved by individualizing the light sources

and dedicating each light source to a different frequency. Nevertheless, this method

requires the emitting light of all light sources to be in line of sight of the camera.

Another suggestion is to merge different frequencies into one blinking pattern and

make the light sources blink according to that pattern. Figure 6.1 shows how two differ-

ent sinusoid frequencies are added together. In Vision Science these spatial frequencies

(a.k.a. sinusoidal gratings) are used as visual stimuli [135] [136]. In Figure 6.1 each

horizontal section represents a single frequency while the third frequency is the result

of adding the first two frequencies. The right part of the picture illustrates the signals

in square pulses and the left part of the picture illustrates the equivalent seen on the

rolling shutter sensor. In this example adding frequencies means performing a logical

AND operator on the state of the square pulses at each time. However, in practice this

operation could be e.g. NOT AND. In addition, a different modulation technique (such

as FDM) could be more beneficial while applying this method [137].

59

F1

F2

F1

+

F2

Signals captured by rolling shutter Pulsed signals

Figure 6.1: Adding sinusoid frequencies

Both of these methods require more research on how these blinking frequencies

get merged together in the physical layer and how the camera would react and capture

the dark/bright bands. Moreover, the detection mechanisms (e.g. DFT) should also be

studied more.

Narrowing The lifetime of Frequencies

The suggested method of increasing the lifetime of frequencies presented in Sec-

tion 4.2.4 has the drawback of consuming spatial bandwidth. Researchers in [73]

suggest a method in which consequent frames are stitched together into a long single

image and then a sliding window (with size of one frame) is slid across the image

performing a FFT conversion for each step.

Although this method could solve the problem of frame discontinuity, the com-

putational complexity and memory requirements might not let the system to meet its

real-time requirements. Depending on the movement of the sliding window, the num-

ber of required FFT calls increases. Figure 6.2 illustrates a situation where two frames

are stitched together and a window is sliding for half of a frame in each step. Note that

for every n the nth frame has to be stitched to the following n+ 1th frame.

60

nth frame

(n+1)th frame

f2

lifetime λ

Window

f1

lifetime λ

Slide 1

detection: none

Slide 2

detection: f1, f2

Slide 3

detection: f2

Figure 6.2: Stitching images - Sliding window.

The trade-off between bitrate and computational complexity in this method needs

more investigation. Moreover, new optimization techniques could be exploited to over-

come the problem of high computational complexity. For example, a new technique

could be a combination of our method and the stitching method presented in [73] where

only a small portion of top/bottom borders of frames are stitched together and checked

for frequency discontinuation.

CPU/GPU Cooperation

In Chapter 5.3, we explained how task migration to GPU can accelerate the perform-

ance of the computational tasks in the VLC receiver. In addition, the CPU/GPU co-

operation can provide pipelining possibilities that can speed up the heavy computa-

tional tasks. With the DFT operations detached from CPU to GPU, the CPU can per-

form other tasks in parallel such as fetching and decoding the next video frame from

buffer. The task scheduling schematic shown in Figure 6.3a represents this scenario.

61

Figure 6.3b illustrates the same tasks being executed on CPU only.

Time

CPU

Video buffer

... ...

α(n)

GPUλ(n)

α(n+1) Ω(n) α(n+1)

λ(n+1)

α(n+2) Ω(n+1) α(n+2)

λ(n+2)

(a) Task scheduling in pipelined CPU-GPU cooperation

Time

CPU

Video buffer

... ...

α(n) λ(n) Ω(n) α(n+1) λ(n+1) Ω(n+1)

(b) Task scheduling using only CPU

Figure 6.3: Processing times in pipelined and non-pipelined task scheduling

In Figure 6.3 α, λ and Ω represent the pre-DFT processes, the DFT process and

the post-DFT processes respectively. The letter n indicates the nth video frame. For

example, λ(1) refers to the DFT process that belongs to the first video frame. As

explained in Chapter 4.3 pre-DFT processes include threading operations, video de-

coding, frame conversion, etc. and post-DFT processes include message decoding and

necessary computations on the result of DFT. In Figure 6.3 the computation time for

DFT operations on both CPU and GPU is assumed to be the same. The context switch-

ing times are also neglected.

6.3 Discussion

Visible Light Communication similar to any other new technology can provide endless

possibilities. However, the on-going research around this field does not only limit to

further development of the technology. Without enough consideration and research,

the advancement of a technology can end up endangering our health and other species

62

and/or technology obsolescence. Therefore it is necessary that our research include all

aspects of an evolving technology including its by-products.

One problem of advancement of outdoor lightening is the concept of light pollution.

Light pollution is an environmental degradation which refers to the unnecessary or

misdirected light [138]. The recent rise in popularity of LED lightening has (again)

raised a lot of awareness among researchers concerning the adverse effects of light

pollution and artificial lightening. The adverse consequences of light pollution are not

limited to:

(1) Affecting lifestyle of nocturnal and non-nocturnal species.

Inappropriate lighting conditions can affect the health condition and lifestyle of

many species specially mammals and nocturnal animals [139] [140]. Although it

is believed that the development of LED lighting can be helpful in reducing the

side effects of light pollution on animals, this concept still remains controversial.

For example, Researches in [141] show that although using LED light sources has

less effects on bat species compared to older technologies, there are still existing

ecological impacts imposed by the so called green lighting technologies that needs

to be considered in further research.

(2) Affecting the human psyche.

In Chapter 3 of this thesis some health related benefits of using communication

over light was presented. The fact that visible light does not have the drawbacks

of other radio waves makes it interesting for many applications. However, recent

researches have drawn a lot of attention on the side effects of artificial light on

human psyche [142]. These side effects can be for example changes of melatonin

levels [143] that can affect the sleeping routines or even lead to cancer [144].

(3) Skyglow.

Skyglow is the illumination of the night sky when artificial light is scattered by at-

mospheric molecules or aerosols and returned to Earth [145]. The negative effects

of Skyglow are for example stellar visibility reduction (to a extend that more than

one half of the EU population have already lost naked eye visibility of the Milky

Way) [146], affecting bird’s orientation in migration [147] and many other impacts

on biological and ecological systems [148].

63

The transition from traditional lighting to LED based lighting is not believed to in-

crease nor decrease the Skyglow effect per se, but the computerized controllability

of LED based systems in direction of emission, brightness and color can help the

reduction of Skyglow effect [149].

Nonetheless, many researchers were focusing on reduction and prevention of these

adverse effect by suggesting the following methods [150] [151]:

(1) Shielding and redesigning

(2) Limiting coverage area

(3) Dimming, shortening (time limitation) and even shutting down

(4) Growth limitation

(5) Spectrum shifting

It is inevitable to think that drastic change in design and rethinking the lighting

systems will have an impact on the development, advancement and applications of

indoor/outdoor VLC systems. Therefore it is needed to perform more research on

related topics [152].

64

BIBLIOGRAPHY

[1] Longfei Wu, Xiaojiang Du, and Xinwen Fu. Security threats to mobile multi-media applications: Camera-based attacks on mobile phones. Communications

Magazine, IEEE, 52(3):80–87, 2014.

[2] K. Breitman, M. Endler, R. Pereira, and M. Azambuja. When tv dies, will itgo to the cloud? Computer, 43(4):81–83, April 2010. doi:10.1109/MC.2010.118.

[3] Sylwia Kechiche. Cellular m2m forecasts: unlocking growth. Technical report,GSMA Intelligence, February 2015.

[4] Cisco. Cisco visual networking index: Global mobile data traffic forecast up-date, 2014 – 2019. Technical report, Cisco, February 2015.

[5] Ming-Jiang Yang, Jo Yew Tham, Dajun Wu, and Kwong Huang Goh. Cost ef-fective ip camera for video surveillance. In Industrial Electronics and Applic-

ations, 2009. ICIEA 2009. 4th IEEE Conference on, pages 2432–2435. IEEE,2009.

[6] K. Jeffay, D.L. Stone, T. Talley, and F.D. Smith. Adaptive, best-effort deliveryof digital audio and video across packet-switched networks. In P. Venkat Ran-gan, editor, Network and Operating System Support for Digital Audio and

Video, volume 712 of Lecture Notes in Computer Science, pages 1–14. SpringerBerlin Heidelberg, 1993. URL: http://dx.doi.org/10.1007/3-540-57183-3_1,doi:10.1007/3-540-57183-3_1.

[7] Yung-Chih Chen and Don Towsley. On bufferbloat and delay analysis of mul-tipath tcp in wireless networks.

[8] Yung-Chih Chen, Don Towsley, Erich M Nahum, Richard J Gibbens, and Yeon-sup Lim. Characterizing 4g and 3g networks: Supporting mobility with mul-tipath tcp. School of Computer Science, University of Massachusetts Amherst,

Tech. Rep, 22, 2012.

[9] Eralda Caushaj, Ivan Ivanov, Huirong Fu, Ishwar Sethi, and Ye Zhu. Evaluatingthroughput and delay in 3g and 4g mobile architectures. Journal of Computer

and Communications, 2014, 2014.

65

http://dx.doi.org/10.1109/MC.2010.118

http://dx.doi.org/10.1109/MC.2010.118

http://dx.doi.org/10.1007/3-540-57183-3_1

http://dx.doi.org/10.1007/3-540-57183-3_1

[10] Dmitry Pundik and Yael Moses. Video synchronization using temporalsignals from epipolar lines. In Kostas Daniilidis, Petros Maragos, andNikos Paragios, editors, Computer Vision – ECCV 2010, volume 6313 ofLecture Notes in Computer Science, pages 15–28. Springer Berlin Heidel-berg, 2010. URL: http://dx.doi.org/10.1007/978-3-642-15558-1_2, doi:10.1007/978-3-642-15558-1_2.

[11] Sridhar Rajagopal, Richard D Roberts, and Sang-Kyu Lim. Ieee 802.15. 7 vis-ible light communication: modulation schemes and dimming support. Commu-

nications Magazine, IEEE, 50(3):72–82, 2012.

[12] Hyun-Seung Kim, Deok-Rae Kim, Se-Hoon Yang, Yong-Hwan Son, and Sang-Kook Han. An indoor visible light communication positioning system using a rfcarrier allocation technique. Lightwave Technology, Journal of, 31(1):134–144,2013.

[13] Felix Schill, Uwe R Zimmer, and Jochen Trumpf. Visible spectrum optical com-munication and distance sensing for underwater applications. In Proceedings

of ACRA, volume 2004, pages 1–8, 2004.

[14] Soo-Yong Jung, Swook Hann, and Chang-Soo Park. Tdoa-based optical wire-less indoor localization using led ceiling lamps. Consumer Electronics, IEEE

Transactions on, 57(4):1592–1597, 2011.

[15] N Khan and N Abas. Comparative study of energy saving light sources. Re-

newable and Sustainable Energy Reviews, 15(1):296–309, 2011.

[16] Siddha Pimputkar, James S Speck, Steven P DenBaars, and Shuji Nakamura.Prospects for led lighting. Nature Photonics, 3(4):180–182, 2009.

[17] Maury Wright. Us government accelerates led street lightpush in doe program. LEDs Magazine, Online Articles,2015. URL: http://www.ledsmagazine.com/articles/2015/01/us-government-accelerates-led-street-light-push-in-doe-program.html.

[18] S. Haruyama. Progress of visible light communication. In Optical Fiber

Communication (OFC), collocated National Fiber Optic Engineers Conference,

2010 Conference on (OFC/NFOEC), pages 1–3, March 2010.

[19] Aníbal De Almeida, Bruno Santos, Bertoldi Paolo, and Michel Quicheron. Solidstate lighting review – potential and challenges in europe. Renewable and

Sustainable Energy Reviews, 34(0):30 – 48, 2014.

[20] Roland Haitz and Jeffrey Y Tsao. Solid-state lighting:‘the case’10 years afterand future prospects. physica status solidi (a), 208(1):17–29, 2011.

66

http://dx.doi.org/10.1007/978-3-642-15558-1_2

http://dx.doi.org/10.1007/978-3-642-15558-1_2

http://dx.doi.org/10.1007/978-3-642-15558-1_2

http://www.ledsmagazine.com/articles/2015/01/us-government-accelerates-led-street-light-push-in-doe-program.html

http://www.ledsmagazine.com/articles/2015/01/us-government-accelerates-led-street-light-push-in-doe-program.html

[21] E Fred Schubert and Jong Kyu Kim. Solid-state light sources getting smart.Science, 308(5726):1274–1278, 2005.

[22] Jurgen Hase. Intelligent lighting paves the way for the smart city. LEDs

Magazine, 73, 2014.

[23] M. Castro, A.J. Jara, and A.F.G. Skarmeta. Smart lighting solutions for smartcities. In Advanced Information Networking and Applications Workshops

(WAINA), 2013 27th International Conference on, pages 1374–1379, March2013. doi:10.1109/WAINA.2013.254.

[24] Walter Karlen, Joanne Lim, J Mark Ansermino, Guy Dumont, and Cornie Schef-fer. Design challenges for camera oximetry on a mobile phone. In Engineering

in Medicine and Biology Society (EMBC), 2012 Annual International Confer-

ence of the IEEE, pages 2448–2451. IEEE, 2012.

[25] Tim Hayes. Next-generation cell phone cameras. Optics and Photonics News,23(2):16–21, 2012.

[26] Damien Igoe, Alfio Parisi, and Brad Carter. Characterization of a smartphonecamera’s response to ultraviolet a radiation. Photochemistry and Photobiology,89(1):215–218, 2013. doi:10.1111/j.1751-1097.2012.01216.x.

[27] Zabih Ghassemlooy, Wasiu Popoola, and Sujan Rajbhandari. Optical wireless

communications: system and channel modelling with Matlab®. CRC Press,2012.

[28] Jingfeng Zhang, Ying Li, and Yanna Wei. Using timestamp to realize audio-video synchronization in real-time streaming media transmission. In Audio,

Language and Image Processing, 2008. ICALIP 2008. International Conference

on, pages 1073–1076. IEEE, 2008.

[29] Robert T Collins, Omead Amidi, and Takeo Kanade. An active camera systemfor acquiring multi-view video. In ICIP (1), pages 527–520, 2002.

[30] Marc Pollefeys, Sudipta N. Sinha, Li Guan, and Jean-Sébastien Franco. Chapter2 - multi-view calibration, synchronization, and dynamic scene reconstruction.In Hamid AghajanAndrea Cavallaro, editor, Multi-Camera Networks, pages29 – 75. Academic Press, Oxford, 2009. URL: http://www.sciencedirect.com/science/article/pii/B9780123746337000045, doi:http://dx.doi.org/

10.1016/B978-0-12-374633-7.00004-5.

[31] A. Whitehead, R. Laganiere, and P. Bose. Temporal synchronization of videosequences in theory and in practice. In Application of Computer Vision, 2005.

WACV/MOTIONS ’05 Volume 1. Seventh IEEE Workshops on, volume 2, pages132–137, Jan 2005. doi:10.1109/ACVMOT.2005.114.

67

http://dx.doi.org/10.1109/WAINA.2013.254

http://dx.doi.org/10.1111/j.1751-1097.2012.01216.x

http://www.sciencedirect.com/science/article/pii/B9780123746337000045

http://www.sciencedirect.com/science/article/pii/B9780123746337000045

http://dx.doi.org/http://dx.doi.org/10.1016/B978-0-12-374633-7.00004-5

http://dx.doi.org/http://dx.doi.org/10.1016/B978-0-12-374633-7.00004-5

http://dx.doi.org/10.1109/ACVMOT.2005.114

[32] Colin Perkins. Rtp: Audio and Video for the Internet. Addison-Wesley Profes-sional, first edition, 2003.

[33] D. Mills. Network time protocol (version 3) specification, implementation.Technical report, United States, 1992.

[34] T. Tuytelaars and L. Van Gool. Synchronizing video sequences. In Computer

Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004

IEEE Computer Society Conference on, volume 1, pages I–762–I–768 Vol.1,June 2004. doi:10.1109/CVPR.2004.1315108.

[35] Yaron Caspi and Michal Irani. A step towards sequence-to-sequence alignment.In Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Confer-

ence on, volume 2, pages 682–689. IEEE, 2000.

[36] Philip A Tresadern and Ian Reid. Synchronizing image sequences of non-rigidobjects. In BMVC, pages 1–10, 2003.

[37] Lisa Spencer and Mubarak Shah. Temporal synchronization from camera mo-tion. In Proceedings of Asian Conference on Computer Vision, pages 515–520,2004.

[38] Cheng Lei and Yee-Hong Yang. Tri-focal tensor-based multiple video synchron-ization with subframe optimization. Image Processing, IEEE Transactions on,15(9):2473–2480, Sept 2006. doi:10.1109/TIP.2006.877438.

[39] L. Lee, R. Romano, and G. Stein. Monitoring activities from multiple videostreams: establishing a common coordinate frame. Pattern Analysis and Ma-

chine Intelligence, IEEE Transactions on, 22(8):758–767, Aug 2000. doi:

10.1109/34.868678.

[40] Jingyu Yan and Marc Pollefeys. Video synchronization via space-time interestpoint distribution. In Advanced Concepts for Intelligent Vision Systems, pages501–504, 2004.

[41] Lior Wolf and Assaf Zomet. Correspondence-free synchronization and recon-struction in a non-rigid scene. In Proc. Workshop on Vision and Modelling of

Dynamic Scenes, Copenhagen, 2002.

[42] Prarthana Shrstha, Mauro Barbieri, and Hans Weda. Synchronization of multi-camera video recordings based on audio. In Proceedings of the 15th interna-

tional conference on Multimedia, pages 545–548. ACM, 2007.

[43] Ken Goldberg, Camille Crittenden, Abram Stern, and John Scott. The rashomonproject, Jan 2015. URL: http://rieff.ieor.berkeley.edu/rashomon/.

[44] Shlomi Arnon. Visible light communication. Cambridge University Press, 2015.

68

http://dx.doi.org/10.1109/CVPR.2004.1315108

http://dx.doi.org/10.1109/TIP.2006.877438

http://dx.doi.org/10.1109/34.868678

http://dx.doi.org/10.1109/34.868678

http://rieff.ieor.berkeley.edu/rashomon/

[45] Dobroslav Tsonev, Hyunchae Chun, Sujan Rajbhandari, Jonathan JDMcKendry, Stefan Videv, Erdan Gu, Mohsin Haji, Scott Watson, Anthony EKelly, Grahame Faulkner, et al. A 3-gb/s single-led ofdm-based wireless vlclink using a gallium nitride. Photonics Technology Letters, IEEE, 26(7):637–640, 2014.

[46] S. Okada, T. Yendo, T. Yamazato, T. Fujii, M. Tanimoto, and Y. Kimura.On-vehicle receiver for distant visible light road-to-vehicle communication.In Intelligent Vehicles Symposium, 2009 IEEE, pages 1033–1038, June 2009.doi:10.1109/IVS.2009.5164423.

[47] Navina Kumar, Nuno Lourenco, Michal Spiez, and Rui L Aguiar. Visiblelight communication systems conception and vidas. IETE Technical Review,25(6):359–367, 2008.

[48] Woo-Chan Kim, Chi-Sung Bae, Soo-Yong Jeon, Sung-Yeop Pyun, and Dong-Ho Cho. Efficient resource allocation for rapid link recovery and visibility invisible-light local area networks. Consumer Electronics, IEEE Transactions

on, 56(2):524–531, 2010.

[49] T. Komine and M. Nakagawa. Fundamental analysis for visible-light commu-nication system using led lights. Consumer Electronics, IEEE Transactions on,50(1):100–107, Feb 2004. doi:10.1109/TCE.2004.1277847.

[50] Jong Kyu Kim and E Fred Schubert. Transcending the replacement paradigmof solid-state lighting. Optics Express, 16(26):21835–21842, 2008.

[51] Ashwin Ashok, Marco Gruteser, Narayan Mandayam, Jayant Silva, MichaelVarga, and Kristin Dana. Challenge: Mobile optical networks through visualmimo. In Proceedings of the sixteenth annual international conference on Mo-

bile computing and networking, pages 105–112. ACM, 2010.

[52] Silvano Donati. Photodetectors. Prentice Hall PTR, 1999.

[53] Sanka Gateva. Photodetectors. InTech, 2012.

[54] Masao Nakagwa and Shinichiro Haruyama. Camera-equipped cellular terminalfor visible light communication, February 1 2005. US Patent App. 10/588,009.

[55] Paul Dietz, William Yerazunis, and Darren Leigh. Very low-cost sensing andcommunication using bidirectional leds. In UbiComp 2003: Ubiquitous Com-

puting, pages 175–191. Springer, 2003.

[56] Stefan Schmid, Giorgio Corbellini, Stefan Mangold, and Thomas R Gross. Led-to-led visible light communication networks. In Proceedings of the fourteenth

ACM international symposium on Mobile ad hoc networking and computing,pages 1–10. ACM, 2013.

69

http://dx.doi.org/10.1109/IVS.2009.5164423

http://dx.doi.org/10.1109/TCE.2004.1277847

[57] Aleksandar Jovicic, Junyi Li, and Tom Richardson. Visible light communic-ation: Opportunities, challenges and the path to market. Communications

Magazine, IEEE, 51(12):26–32, 2013.

[58] Michael B Rahaim, Anna Maria Vegni, and Thomas DC Little. A hybrid radiofrequency and broadcast visible light communication system. In GLOBECOM

Workshops (GC Wkshps), 2011 IEEE, pages 792–796. IEEE, 2011.

[59] Hany Elgala, Raed Mesleh, and Harald Haas. Indoor optical wireless com-munication: potential and state-of-the-art. Communications Magazine, IEEE,49(9):56–62, 2011.

[60] Ieee standard for local and metropolitan area networks–part 15.7: Short-rangewireless optical communication using visible light. IEEE Std 802.15.7-2011,pages 1–309, Sept 2011. doi:10.1109/IEEESTD.2011.6016195.

[61] Yuanquan Wang, Yiguang Wang, Nan Chi, Jianjun Yu, and Huiliang Shang.Demonstration of 575-mb/s downlink and 225-mb/s uplink bi-directional scm-wdm visible light communication using rgb led and phosphor-based led. Optics

express, 21(1):1203–1208, 2013.

[62] Yuanquan Wang, Yufeng Shao, Huiliang Shang, Xiaoyuan Lu, Yiguang Wang,Jianjun Yu, and Nan Chi. 875-mb/s asynchronous bi-directional 64qam-ofdmscm-wdm transmission over rgb-led-based visible light communication system.In Optical Fiber Communication Conference, pages OTh1G–3. Optical Societyof America, 2013.

[63] Shinya Iwasaki, Chinthaka Premachandra, Tomohiro Endo, Toshiaki Fujii,Masayuki Tanimoto, and Yoshikatsu Kimura. Visible light road-to-vehicle com-munication using high-speed camera. In Intelligent Vehicles Symposium, 2008

IEEE, pages 13–18. IEEE, 2008.

[64] Halpage Chinthaka Nuwandika Premachandra, Tomohiro Yendo, Mehrdad Pa-nahpour Tehrani, Takaya Yamazato, Hiraku Okada, Toshiaki Fujii, and Masay-uki Tanimoto. High-speed-camera image processing based led traffic light de-tection for road-to-vehicle visible light communication. In Intelligent Vehicles

Symposium (IV), 2010 IEEE, pages 793–798. IEEE, 2010.

[65] Toru Nagura, Takaya Yamazato, Masaaki Katayama, Tomohiro Yendo, ToshiakiFujii, and Hiraku Okada. Improved decoding methods of visible light commu-nication system for its using led array and high-speed camera. In Vehicular

Technology Conference (VTC 2010-Spring), 2010 IEEE 71st, pages 1–5. IEEE,2010.

[66] Albert J.P. Theuwissen. CMOS image sensors: State-of-the-art. Solid-State

Electronics, 52(9):1401 – 1406, 2008. Papers Selected from the 37th European

70

http://dx.doi.org/10.1109/IEEESTD.2011.6016195

Solid-State Device Research Conference - ESSDERC’07. URL: http://www.sciencedirect.com/science/article/pii/S0038110108001317, doi:http://dx.doi.org/10.1016/j.sse.2008.04.012.

[67] Heinz Helmers and Markus Schellenberg. Cmos vs. ccd sensors in speckleinterferometry. Optics & Laser Technology, 35(8):587–595, 2003.

[68] Marci Meingast, Christopher Geyer, and Shankar Sastry. Geometric models ofrolling-shutter cameras. arXiv preprint cs/0503076, 2005.

[69] Chia-Kai Liang, Li-Wen Chang, and H.H. Chen. Analysis and compensation ofrolling shutter effect. Image Processing, IEEE Transactions on, 17(8):1323–1330, Aug 2008. doi:10.1109/TIP.2008.925384.

[70] Dave Litwiller. Ccd vs. cmos. Photonics Spectra, 35(1):154–158, 2001.

[71] Omar Ait-Aider, Nicolas Andreff, Jean Marc Lavest, and Philippe Martinet.Simultaneous object pose and velocity computation using a single view froma rolling shutter camera. In Computer Vision–ECCV 2006, pages 56–68.Springer, 2006.

[72] P-E Forssén and Erik Ringaby. Rectifying rolling shutter video from hand-helddevices. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE

Conference on, pages 507–514. IEEE, 2010.

[73] Niranjini Rajagopal, Patrick Lazik, and Anthony Rowe. Visual light landmarksfor mobile devices. In Proceedings of the 13th international symposium on

Information processing in sensor networks, pages 249–260. IEEE Press, 2014.

[74] Niranjini Rajagopal, Patrick Lazik, and Anthony Rowe. Demonstration abstract:How many lights do you see? In Information Processing in Sensor Networks,

IPSN-14 Proceedings of the 13th International Symposium on, pages 347–348.IEEE, 2014.

[75] Ye-Sheng Kuo, Pat Pannuto, Ko-Jen Hsiao, and Prabal Dutta. Luxapose: Indoorpositioning with mobile phones and visible light. In Proceedings of the 20th

annual international conference on Mobile computing and networking, pages447–458. ACM, 2014.

[76] Yoshinori Matsumoto, Takaharu Hara, and Yohsuke Kimura. Cmos photo-transistor array detection system for visual light identification (id). In Net-

worked Sensing Systems, 2008. INSS 2008. 5th International Conference on,pages 99–102. IEEE, 2008.

71

http://www.sciencedirect.com/science/article/pii/S0038110108001317

http://www.sciencedirect.com/science/article/pii/S0038110108001317

http://dx.doi.org/http://dx.doi.org/10.1016/j.sse.2008.04.012

http://dx.doi.org/http://dx.doi.org/10.1016/j.sse.2008.04.012

http://dx.doi.org/10.1109/TIP.2008.925384

[77] Robert LiKamWa, David Ramirez, and Jason Holloway. Styrofoam: a tightlypacked coding scheme for camera-based visible light communication. In Pro-

ceedings of the 1st ACM MobiCom workshop on Visible light communication

systems, pages 27–32. ACM, 2014.

[78] Samuel David Perli, Nabeel Ahmed, and Dina Katabi. Pixnet: interference-freewireless links using lcd-camera pairs. In Proceedings of the sixteenth annual

international conference on Mobile computing and networking, pages 137–148.ACM, 2010.

[79] A. Bingham and D. Spradlin. The Open Innovation Marketplace: Creating

Value in the Challenge Driven Enterprise. Pearson Education, 2011.

[80] Joel West and Scott Gallagher. Challenges of open innovation: the paradoxof firm investment in open-source software. R&D Management, 36(3):319–331, 2006. URL: http://dx.doi.org/10.1111/j.1467-9310.2006.00436.x, doi:10.1111/j.1467-9310.2006.00436.x.

[81] Ffmpeg project. Ffmpeg, Jan 2015. URL: http://www.ffmpegs.org/.

[82] G. Bradski. The opencv library. Dr. Dobb’s Journal of Software Tools, 2000.

[83] Massimo Banzi. Getting Started with Arduino. Make Books - Imprint of:O’Reilly Media, Sebastopol, CA, ill edition, 2008.

[84] Kaiyun Cui, Gang Chen, Zhengyuan Xu, and Richard D Roberts. Line-of-sightvisible light communication system design and demonstration. In Commu-

nication Systems Networks and Digital Signal Processing (CSNDSP), 2010 7th

International Symposium on, pages 621–625. IEEE, 2010.

[85] F.F. Mazda. Electronics Engineer’s Reference Book. Elsevier Science, 2013.

[86] R.F. Pierret. Semiconductor Device Fundamentals. Addison-Wesley, 1996.

[87] G.R. Jones. Electrical Engineer’s Reference Book. Elsevier Science, 2013.

[88] STMicroelectronics. Complementary power Darlington transistors, 10 2008.Rev. 4.

[89] Rudolf F Graf and William Sheets. Encyclopedia of Electronic Circuits, Vol. 4,volume 4. Granite Hill Publishers, 1992.

[90] Kwok K. Ng. Phototransistor, pages 462–469. John Wiley & Sons, Inc.,2009. URL: http://dx.doi.org/10.1002/9781118014769.ch59, doi:10.1002/9781118014769.ch59.

[91] CNY17 Series. Optocoupler with phototransistor output. Vishay Telefunken,1999.

72

http://dx.doi.org/10.1111/j.1467-9310.2006.00436.x

http://dx.doi.org/10.1111/j.1467-9310.2006.00436.x

http://dx.doi.org/10.1111/j.1467-9310.2006.00436.x

http://www.ffmpegs.org/

http://dx.doi.org/10.1002/9781118014769.ch59

http://dx.doi.org/10.1002/9781118014769.ch59

http://dx.doi.org/10.1002/9781118014769.ch59

[92] Atmel, http://www.atmel.com/Images/doc2503.pdf. Atmel ATMega32 micro-

controller datasheet, February 2011.

[93] Alessandro D’Ausilio. Arduino: A low-cost multipurpose lab equipment. Be-

havior research methods, 44(2):305–313, 2012.

[94] Atmel, http://www.atmel.com/Images/doc7799.pdf. Atmel ATMega16U2 mi-

crocontroller datasheet, September 2012.

[95] Steve Winder. Power supplies for LED driving. Newnes, 2011.

[96] M Saadi, L Wattisuttikulkij, Y Zhao, and P Sangwongngam. Visible lightcommunication: opportunities, challenges and channel models. International

Journal of Electronics & Informatics, 2(1):1–11, 2013.

[97] Bo Bai, Zhengyuan Xu, and Yangyu Fan. Joint led dimming and high capacityvisible light communication by overlapping ppm. In Wireless and Optical Com-

munications Conference (WOCC), 2010 19th Annual, pages 1–5. IEEE, 2010.

[98] O. Bouchet. Wireless Optical Telecommunications. ISTE. Wiley, 2013. URL:https://books.google.fi/books?id=HBFSt4O64VgC.

[99] Shin-Yi Chang, Jo-Ping Li, Hua-Min Tseng, and Pai H Chou. Greendicator: En-abling optical pulse-encoded data output from wsn for display on smartphones.

[100] Ubolthip Sethakaset and T. Aaron Gulliver. Differential amplitude pulse-position modulation for indoor wireless optical communications. EURASIP

J. Wirel. Commun. Netw., 2005(1):3–11, March 2005. URL: http://dx.doi.org/10.1155/WCN.2005.3, doi:10.1155/WCN.2005.3.

[101] Xiao Zhang, Svilen Dimitrov, Sinan Sinanovic, and Harald Haas. Optimalpower allocation in spatial modulation ofdm for visible light communications.In Vehicular Technology Conference (VTC Spring), 2012 IEEE 75th, pages 1–5.IEEE, 2012.

[102] R.D. Roberts. Undersampled frequency shift on-off keying (ufsook) for cam-era communications (camcom). In Wireless and Optical Communication Con-

ference (WOCC), 2013 22nd, pages 645–648, May 2013. doi:10.1109/

WOCC.2013.6676454.

[103] JE Farrell, Brian L Benson, and Carl R Haynie. Predicting flicker thresholds forvideo display terminals. In Proc SID, volume 28, pages 449–453, 1987.

[104] DH Kelly. Diffusion model of linear flicker responses. JOSA, 59(12):1665–1670, 1969.

73

https://books.google.fi/books?id=HBFSt4O64VgC

http://dx.doi.org/10.1155/WCN.2005.3



http://dx.doi.org/10.1109/WOCC.2013.6676454

http://dx.doi.org/10.1109/WOCC.2013.6676454

[105] D.H. Kelly. Sine waves and flicker fusion. Documenta Ophthalmologica,18(1):16–35, 1964. URL: http://dx.doi.org/10.1007/BF00160561, doi:10.1007/BF00160561.

[106] Barry B. Lee, Joel Pokorny, Paul R. Martin, Arne Valbergt, and Vivianne C.Smith. Luminance and chromatic modulation sensitivity of macaque gan-glion cells and human observers. J. Opt. Soc. Am. A, 7(12):2223–2236, Dec1990. URL: http://josaa.osa.org/abstract.cfm?URI=josaa-7-12-2223, doi:

10.1364/JOSAA.7.002223.

[107] Samuel Sokol and Lorrin A Riggs. Electrical and psychophysical responsesof the human visual system to periodic variation of luminance. Investigative

Ophthalmology & Visual Science, 10(3):171–180, 1971.

[108] T. Keppler, N. Watson, and J. Arrillaga. Computation of the short-term flickerseverity index. Power Delivery, IEEE Transactions on, 15(4):1110–1115, Oct2000. doi:10.1109/61.891490.

[109] SAMUEL M BERMAN, DANIEL S GREENHOUSE, IAN L BAILEY,ROBERT D CLEAR, and THOMAS W RAASCH. Human electroretinogramresponses to video displays, fluorescent lighting, and other high frequencysources. Optometry & Vision Science, 68(8):645–662, 1991.

[110] I.E. Richardson. The H.264 Advanced Video Compression Standard. Wiley,2011. URL: http://www.google.fi/books?id=k7nOAiIUo9IC.

[111] Tzi-Dar Chiueh, Pei-Yun Tsai, and I-Wei Lai. Baseband Receiver Design for

Wireless MIMO-OFDM Communications. John Wiley & Sons, 2012.

[112] Digital Signal Processing. Laxmi Publications Pvt Ltd, 2007.

[113] Jack D Gaskill. Linear systems, fourier transforms, and optics. 1978.

[114] SJ Ranade and W Xu. An overview of harmonics modeling and simulation.IEEE Task Force on Harmonics Modeling and Simulation, page 1, 2007.

[115] Iaroslav V Blagouchine and Eric Moreau. Analytic method for the computationof the total harmonic distortion by the cauchy method of residues. Communic-

ations, IEEE Transactions on, 59(9):2478–2491, 2011.

[116] L Svilainis. Led pwm dimming linearity investigation. Displays, 29(3):243–249, 2008.

[117] Prathyusha Narra and Donald S Zinger. An effective led dimming approach. InIndustry Applications Conference, 2004. 39th IAS Annual Meeting. Conference

Record of the 2004 IEEE, volume 3, pages 1671–1676. IEEE, 2004.

74

http://dx.doi.org/10.1007/BF00160561

http://dx.doi.org/10.1007/BF00160561

http://dx.doi.org/10.1007/BF00160561

http://josaa.osa.org/abstract.cfm?URI=josaa-7-12-2223

http://dx.doi.org/10.1364/JOSAA.7.002223

http://dx.doi.org/10.1364/JOSAA.7.002223

http://dx.doi.org/10.1109/61.891490

http://www.google.fi/books?id=k7nOAiIUo9IC

[118] M Anand and Prasoon Mishra. A novel modulation scheme for visible lightcommunication. In India Conference (INDICON), 2010 Annual IEEE, pages1–3. IEEE, 2010.

[119] Richard D Roberts. Space-time forward error correction for dimmable under-sampled frequency shift on-off keying camera communications (camcom). InUbiquitous and Future Networks (ICUFN), 2013 Fifth International Conference

on, pages 459–464. IEEE, 2013.

[120] Christos Danakis, Mostafa Afgani, Gordon Povey, Ian Underwood, and HaraldHaas. Using a cmos camera sensor for visible light communication. In Globe-

com Workshops (GC Wkshps), 2012 IEEE, pages 1244–1248. IEEE, 2012.

[121] The discrete fourier transform in 2d. In Digital Image Pro-

cessing, Texts in Computer Science, pages 343–366. Springer London,2008. URL: http://dx.doi.org/10.1007/978-1-84628-968-2_14, doi:10.

1007/978-1-84628-968-2_14.

[122] C. Solomon and T. Breckon. Fundamentals of Digital Image Processing: A

Practical Approach with Examples in Matlab. Wiley, 2011. URL: https://books.google.fi/books?id=NoJ15jLdy7YC.

[123] Amir Averbuch, Ronald R Coifman, David L Donoho, Michael Elad, and MosheIsraeli. Fast and accurate polar fourier transform. Applied and computational

harmonic analysis, 21(2):145–167, 2006.

[124] Pierre Duhamel and Martin Vetterli. Fast fourier transforms: a tutorial reviewand a state of the art. Signal processing, 19(4):259–299, 1990.

[125] G. Bradski and A. Kaehler. Learning OpenCV: Computer Vision with the

OpenCV Library. O’Reilly Media, 2008. URL: https://books.google.fi/books?id=seAgiOfu2EIC.

[126] Paolo Prandoni and Martin Vetterli. Signal processing for communications.CRC Press, 2008.

[127] S. Qureshi. Embedded Image Processing on the TMS320C6000TM DSP: Ex-

amples in Code Composer StudioTM and MATLAB. Springer, 2005. URL:https://books.google.fi/books?id=w3BZ0PrmqtkC.

[128] YoungChoon Lee and AlbertY. Zomaya. Energy efficient utilization of resourcesin cloud computing systems. The Journal of Supercomputing, 60(2):268–280,2012. URL: http://dx.doi.org/10.1007/s11227-010-0421-3, doi:10.1007/s11227-010-0421-3.

75

http://dx.doi.org/10.1007/978-1-84628-968-2_14

http://dx.doi.org/10.1007/978-1-84628-968-2_14

http://dx.doi.org/10.1007/978-1-84628-968-2_14

https://books.google.fi/books?id=NoJ15jLdy7YC

https://books.google.fi/books?id=NoJ15jLdy7YC

https://books.google.fi/books?id=seAgiOfu2EIC

https://books.google.fi/books?id=seAgiOfu2EIC

https://books.google.fi/books?id=w3BZ0PrmqtkC

http://dx.doi.org/10.1007/s11227-010-0421-3

http://dx.doi.org/10.1007/s11227-010-0421-3

http://dx.doi.org/10.1007/s11227-010-0421-3

[129] Shane Ryoo, Christopher I Rodrigues, Sara S Baghsorkhi, Sam S Stone,David B Kirk, and Wen-mei W Hwu. Optimization principles and applica-tion performance evaluation of a multithreaded gpu using cuda. In Proceedings

of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel

programming, pages 73–82. ACM, 2008.

[130] Thilaka Sumanaweera and Donald Liu. Medical image reconstruction with thefft. GPU gems, 2:765–784, 2005.

[131] Yasuhiko Ogata, Toshio Endo, Naoya Maruyama, and Satoshi Matsuoka. An ef-ficient, model-based cpu-gpu heterogeneous fft library. In Parallel and Distrib-

uted Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages1–10. IEEE, 2008.

[132] Hany Elgala, Raed Mesleh, Harald Haas, and Bogdan Pricope. Ofdm visiblelight wireless communication based on white leds. In Vehicular Technology

Conference, 2007. VTC2007-Spring. IEEE 65th, pages 2185–2189. IEEE, 2007.

[133] Sunghwan Kim and Sung-Yoon Jung. Novel fec coding scheme for dimmablevisible light communication based on the modified reed-muller codes. Photon-

ics Technology Letters, IEEE, 23(20):1514–1516, Oct 2011.

[134] D.R. Smith. Digital Transmission Systems. Springer, 1993.

[135] Sang-Hun Lee and Randolph Blake. Detection of temporal structure dependson spatial structure. Vision research, 39(18):3033–3048, 1999.

[136] Davida Y. Teller. Vision and the Visual System. University of Washington,2014.

[137] Yuichi Tanaka, Toshihiko Komine, Shinichiro Haruyama, and Masao Nak-agawa. Indoor visible light data transmission system utilizing white led lights.IEICE transactions on communications, 86(8):2440–2454, 2003.

[138] K. Narisada and D. Schreuder. Light Pollution Handbook. Number v. 322in Astrophysics and Space Science Library. Springer, 2004. URL: http://www.google.fi/books?id=61B_RV3EdIcC.

[139] David E Blask, George C Brainard, Robert T Dauchy, John P Hanifin, Leslie KDavidson, Jean A Krause, Leonard A Sauer, Moises A Rivera-Bermudez, Mar-garita L Dubocovich, Samar A Jasser, et al. Melatonin-depleted blood frompremenopausal women exposed to light at night stimulates growth of humanbreast cancer xenografts in nude rats. Cancer research, 65(23):11174–11184,2005.

[140] Eric Warrant Marie Dacke. Visual orientation and navigation in nocturnal arth-ropods. Brain Behav Evol, 75:156–173, 2010.

76

http://www.google.fi/books?id=61B_RV3EdIcC

http://www.google.fi/books?id=61B_RV3EdIcC

[141] Emma L Stone, Gareth Jones, and Stephen Harris. Conserving energy at acost to biodiversity? impacts of led lighting on bats. Global Change Biology,18(8):2458–2465, 2012.

[142] Stephen M Pauley. Lighting for the human circadian clock: recent researchindicates that lighting has become a public health issue. Medical hypotheses,63(4):588–596, 2004.

[143] Helen R Wright and Leon C Lack. Effect of light wavelength on suppression andphase delay of the melatonin rhythm. Chronobiology international, 18(5):801–808, 2001.

[144] Richard G Stevens and Mark S Rea. Light in the built environment: potentialrole of circadian disruption in endocrine disruption and breast cancer. Cancer

Causes & Control, 12(3):279–287, 2001.

[145] Christopher CM Kyba and Franz Hölker. Do artificially illuminated skies affectbiodiversity in nocturnal landscapes? Landscape Ecology, 28(9):1637–1640,2013.

[146] Pierantonio Cinzano, Fabio Falchi, and Christopher D Elvidge. The first worldatlas of the artificial night sky brightness. Monthly Notices of the Royal Astro-

nomical Society, 328(3):689–707, 2001.

[147] Ron Chepesiuk. Missing the dark: health effects of light pollution. Environ-

mental Health Perspectives, 117(1):A20–A27, 2009.

[148] Catherine Rich and Travis Longcore. Ecological consequences of artificial

night lighting. Island Press, 2013.

[149] A Bierman. Will switching to led outdoor lighting increase sky glow? Lighting

Research and Technology, 44(4):449–458, 2012.

[150] Fabio Falchi, Pierantonio Cinzano, Christopher D Elvidge, David M Keith,and Abraham Haim. Limiting the impact of light pollution on human health,environment and stellar visibility. Journal of environmental management,92(10):2714–2722, 2011.

[151] Kevin J Gaston, Thomas W Davies, Jonathan Bennie, and John Hopkins. Re-view: Reducing the ecological consequences of night-time light pollution: op-tions and developments. Journal of Applied Ecology, 49(6):1256–1266, 2012.

[152] Franz Hölker, Timothy Moss, Barbara Griefahn, Werner Kloas, Christian CVoigt, Dietrich Henckel, Andreas Hänel, Peter M Kappeler, Stephan Völker,Axel Schwope, et al. The dark side of light: a transdisciplinary research agendafor light pollution policy. 2010.

77

A APPENDIX

A.1 Direct Communication With The Arduino Board

The first layer of the API could be used to program the Arduino board to send inform-

ation by directly communicating with the Arduino. A program built with this API can

take care of the characteristics of the frequencies, their lifetime, output pins and etc.

In this section we demonstrate the use of the API for this purpose by making ex-

ample applications. The second layer of the API could be used in the later stages

for a higher level of communication. The first step is to include the header file and

instantiate necessary objects.

#include <VLCTX.h>

FREQS freqs;

VLCTX vlctx;

There are mainly two classes in this API and instances of both classes are necessary

for a program to function properly. Class FREQS simply defines the frequencies that

are going to be used for this communication and class VLCTX defines the lifetime of

frequencies, output pins and methods for generating these frequencies. This class also

has some built-in methods for developing test units which can automatically generate

bit patterns for test purposes.

The second step is to construct the objects with desired characteristics. This is to be

done in the void setup() function with other initiative tasks. There are two constructors

for this purpose called FREQS_init.

If no arguments are passed to the constructor method, all variables will be set by

their default values. This is helpful for fast prototyping where one does not want to get

involved with much details. The second constructor will take the number of different

frequencies as the first argument, and then the period of half-cycle of each frequency

in microseconds. In the example below we will have three frequencies namely 3KHz,

78

2.4KHz and 1.992KHz.

freqs.FREQS_init(3, 165, 208, 251);

This construction has to be done before constructing the object of the VLCTX class

as the construction of the VLCTX class requires a reference to an object of FREQS

class as its first argument. The second argument is the lifetime of each frequency in

terms of microseconds and the third argument is the number of output pins on the

Arduino board. The next arguments are the actual pin numbers of the output. In this

example the lifetime of a frequency is set to 6600 microseconds which is one fifth of

the lifetime of an entire frame of a 30fps video, there are two output pins and they are

pin number 8 and 12.

vlctx.VLCTX_init(freqs, 6600, 2, 8, 12);

The communication with the Arduino board is made through the serial port. There-

fore the initialization of the serial port is also done in the void setup() function. In

case one desires to use any other means of communication (e.g. WiFi, Ethernet), they

should provide the necessary interfaces to their Arduino sketch. The final setup() func-

tion would be similar to the one provided below.

void setup()

freqs.FREQS_init(3, 165, 208, 251);

vlctx.VLCTX_init(freqs, 6600, 2, 8, 12);

Serial.begin(9600);

Serial.flush();

The next step is to generate all the necessary bit patterns in the loop() function. The

simplest way of doing so is to read characters one by one from the serial port. For this

purpose one can use the Start_Logic(int) function of the API in the following form.

void loop()

if(Serial.available()>0)

vlctx.Start_Logic(Serial.read());

79

Note that the serial communication, timing and delays should be taken cared of the

programmer. By default, during the idle mode, the program will keep all the output

pins in their HIGH mode. Note that this API does not have any prevention mechanisms

to inform the user about sending ”wrong” characters through the serial port and any

character that could not be found in the defined scope will eventually translate to some

random frequency.

In case one does not want to send any information through communication, but

to provide the information in a built-in manner at pre-compile time, they can use

Start_Str(String) method which takes the bit pattern as a string as its argument. In

the example below F0, F2 and F1 will be generated right after each other.

vlctx.Start_Str("021");

In order to generate automatic bit patterns one of the Auto_Start methods could be

used. These methods act by taking advantage of different permutations of a bit pattern

and are useful for testing the robustness of the system. For example, the function call

below generates the 5th permutation of the bit pattern ”012” i.e. ”201”, repeats this

permuted bit pattern for 6 times and places a 200 milliseconds of delay between each

repetition.

vlctx.Auto_Start(5, "012", 200, 6);

Note that if the user sets the delay to 0, the program will automatically place a 493

milliseconds of delay (i.e. 15 frames including the delay introduced by the microcon-

troller) between each repetition. The reason for this is to make the testing environment

more isolated by providing enough gap between each iteration. In this way we reduce

the chance of these bit patterns overlapping on each other and confusing the receiver, it

will be also easier to study the behaviour of different cameras if enough gap is provided

in the test bench. In case the user does not want to place any delay between the itera-

tions they can pass a negative integer value for this argument.

The function call below will iterate 25 permutations from the 10th to the 35th of

”01234” (i.e. all permutations from ”02413” to ”12430”) with 400 milliseconds of

delay between each iteration. This process will be repeated twice before the function

returns.

80

vlctx.Auto_Start(10, 35, "01234", 400, 2);

It is also possible to generate a portion of all possible permutations. In the example

below, the final quarter of all 120 possible permutations of ”01234” will be generated.

vlctx.Auto_Start("01234", 400, 3, 4);

This method does not support built-in repetition at the moment and the user should

be careful about selection of the portions. For example the following statements do not

cover all possible permutations in the intended scope as 120 is not divisible by 7.

for (int i=0; i<7; i++)

vlctx.Auto_Start("01234", 400, i, 7);

A.2 Interfacing The Serial Connection

The second layer of the transmission API makes it possible to make applications that

can communicate with the Arduino board through serial connection. This however

requires the Arduino board to be pre-programmed properly.

To begin, it is necessary to include the header file of the API.

#include "srlintrfc.h"

After that, a number of objects can be created using the SerialInterface class. This

class takes care of serial port initialization and communication.

vlc::SerialInterface srlInt;

This class provides four polymorphic constructors that can initialize the serial con-

nection. The developer has the liberty in selecting either baud rate or serial port or

both. In case no initial arguments are given to the constructor, the default values for

baud rate and serial port will be 9600 and ”/dev/ttyACM0” respectively.

srlInt.SerialInterface("/dev/ttyACM0", 9600);

81

The following methods can be used to send data over the serial connection. The

input can be a standard string or a pointer to a character (beginning of the string). The

string that is to be transmitted can be a set of numbers that indicate the frequency index

to form the bit pattern for Arduino board, for example ”23145” means generation of

F2, F3, F1, F4 and F5 in the same order. This method returns a negative integer value

on error or the total number of sent characters on success.

srlInt.serialport_write("23145");

This API also prototypes a simple lookup table for character conversion. This

means that if the user wishes to send the word ”Hello!” the built-in functions of the API

can convert every character of the string into proper bit patterns that can be transmitted

by the Arduino board. The same lookup table is implemented in the receiver side to

detect and translate the bit patterns into characters. An example of this lookup table

is presented in Section A.3. To use this functionality the following method can be

used. This method returns a negative integer value on error or the total number of sent

characters on success.

srlInt.serialport_directwrite("Hello VLC!");

Finally, the following method can be used to read from the serial connection. How-

ever, this requires the Arduino board to be programmed in way to send data (acknow-

ledgement) back. This method is used mostly for debugging purposes. The first argu-

ment is a pointer to a pre-defined buffer. This buffer is used to store the values that

are read from the serial port. The second argument is a character specified to indicate

the end of message. The Arduino board should send the ”until” character at the end

of each acknowledgement. The return value of this method is the total number of read

characters.

srlInt.serialport_read_until(buffer, until);

82

A.3 Character Lookup Table

Table A.1 shows an example how the presence of each frequency can be interpreted as

a character in upper layers of communication.

The binary value for each frequency in Table A.1 indicates its presence in the cur-

rent video frame. If no frequencies are detected then the frame is considered to be

empty with no characters assigned to it. The defined set column in Table A.1 refers to

a set of frequencies that will be requested from the microcontroller to be transmitted.

For example ”00000” means F0 should be transmitted five times sequentially (filling

exactly one video frame). Similarly ”00012” means F0 will be transmitted three times

followed by a F1 and a F2. The reason that F0 is repeated more than other frequencies

in this case is to increase the SNR for weaker (i.e. higher) frequencies.

Note that the sequence of frequency appearance does not matter in this transmis-

sion. Meaning that ”00011” and ”01010” are considered to carry the same logic. This

is because one a video frame is converted to its Fourier form with DFT function, we

lose the spatial information of the video frame. Meaning that there are no information

about which frequency appeared first and which one appeared last. The solution to this

problem is to narrow down the DFT window as explained in Chapter 6.2 down to the

lifetime of a frequency (explained in Chapter 4.2.4) and then move the DFT window

row by row followed by a Hanning window. This method yields in higher bitrate to the

cost of computation complexity.

83

Frequencies

F4 F3 F2 F1 F0 Defined Set Assigned Character

0 0 0 0 0 none RESERVED

0 0 0 0 1 ”00000” ’b’

0 0 0 1 0 ”22222” ’d’

0 0 0 1 1 ”00011” ’g’

1 0 0 0 1 ”00044” ’j’

0 1 1 0 0 ”22233” ’n’

0 0 1 1 1 ”00012” ’q’

1 1 1 0 1 ”00234” ’2’

1 1 1 1 1 ”01234” ’a’

Table A.1: Character lookup table.

84

video synchronization in the cloud using visible light communication

Documents

Transcript of video synchronization in the cloud using visible light communication