A model for complex event processing · A MODEL FOR COMPLEX EVENT PROCESSING Advisor: Dr. Roy...
Transcript of A model for complex event processing · A MODEL FOR COMPLEX EVENT PROCESSING Advisor: Dr. Roy...
Atlanta University CenterDigitalCommons@Robert W. Woodruff Library, AtlantaUniversity Center
ETD Collection for AUC Robert W. Woodruff Library
5-1-2009
A model for complex event processingMohammad Ali SazegarnejadClark Atlanta University
Follow this and additional works at: http://digitalcommons.auctr.edu/dissertations
Part of the Computer Sciences Commons
This Thesis is brought to you for free and open access by DigitalCommons@Robert W. Woodruff Library, Atlanta University Center. It has beenaccepted for inclusion in ETD Collection for AUC Robert W. Woodruff Library by an authorized administrator of DigitalCommons@Robert W.Woodruff Library, Atlanta University Center. For more information, please contact [email protected].
Recommended CitationSazegarnejad, Mohammad Ali, "A model for complex event processing" (2009). ETD Collection for AUC Robert W. Woodruff Library.Paper 1510.
ABSTRACT
COMPUTER AND INFORMATION SCIENCE
SAZEGARNEJAD, MOHAMMADALI B.S. PORTLAND STATE, 2005
A MODEL FOR COMPLEX EVENT PROCESSING
Advisor: Dr. Roy George, Ph.D.
Thesis date: May 2009
Advances in sensor technology will revolutionize the way that real-world events
are collected and interpreted. The ability to ubiquitously capture data will generate an
unprecedented amount of data making distributed data management and decision making
key challenges in the deployment of this technology. The demands for intelligently
managing real-time data and integrating it into applicable business processes have
propelled the emergence of a new breed of distributed software systems. The challenges
are broader than simply creating a software platform to manage and integrate the sheer
volume of sensor data. Mechanisms that permit the application of contextual and
application knowledge into the distributed decision making infrastructure are required.
The design of such software is based on the theory of event which permits events to be
states, or processes.
In managing real-time data and information from distributed heterogeneous
sensors, the notion of the event is attractive for several reasons. First, modeling data in
terms of events parallels the way humans conceptualize and relate information. Second,
the notion of events, especially the differentiation between significant and non-significant
1
events may be used to filter data. Third, the definition of an event provides an implicit
data wrapper may be used to link sensor data through event relationships. These
relationships may be used to reason in an enterprise application context. Finally, the
event-based approach is well suited to associating autonomous, heterogeneous sensor
nodes by means of the inherent properties of events such as time and space. Thus these
sensor nodes may be integrated into a complex decision making networks through event-
based communication.
In this thesis, the design and development of a distributed software platform which
can acquire data from heterogeneous sensors, integrate, and provide distributed decision
support is described. Raw data is processed at multiple levels of abstraction and using
context infonnation combined to form higher-level events that enable real time decision
making. A multi-layered event representation and reasoning model is implemented that
feeds sensory data derived from low level sensors into higher-level event structures.
Then, it can be exploited by appropriate event handlers. Alternate approaches to the
“sense making” problem are discussed and the advantages of the proposed model is
explained.
2
A MODEL FOR COMPLEX EVENT PROCESSiNG
A THESIS
SUBMITTED TO THE FACULTY OF CLARK ATLANTA UNIVERSITY
IN PARTIAL FULFILLMENT OF REQUIREMENTS FOR
THE DEGREE OF MASTER OF SCIENCE
BY
MOHAMMAD ALl SAZEGARNEJAD
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCES
ATLANTA, GEORGIA
MAY 2009
© 2009
MOHAMMAD ALl SAZEGARNEJAD
All Rights Reserved
AKNOWLEDGEMENTS
First and foremost I praise the God for giving me the opportunity to start a
wonderful journey filled with enormous lessons and extraordinary experiences as well as
giving me the strength to reach this point successfully. I would like to thank my
incredible wife for her endless love, support, and great sacrifices she has made especially
during this chapter of our life. My special thanks to my parents for their kindness and
never ending love, who helped me in every moment of this journey. I am grateful to Dr.
Roy George, my advisor, and Dr. Khalil Shujaee for providing me the opportunity to join
the graduate program at the Department of Computer and Information Science of Clark
Atlanta University. They always supported me by their enlightening advice and creating a
warm and friendly environment to advance education.
11
TABLE OF CONTENTS
ACKNOWLEDGMENTS
Chapter
1. INTRODUCTION: COMPLEX EVENT PROCESSING (CEP)
2. BACKGROUND 4
Literature Review
Stream Processing Models
Comparison of Stream Processing Models
Enhanced Stream Processing Model
Definition of Terms and their Relation
3. THEORETICAL FRAMEWORK: EVENT PROCESSING MODEL... 22
Introduction
Event Relationships
Temporal Relationship
Spatial Relationship
Causality Relationship
Event Hierarchical Model
4. ARCHITECTURE OF EVENT PROCESSING ENGINE 30
Introduction
Event Filtering
Event Assimilation
111
Event Interpretation
5. DISCUSSION 35
6. FUTURE WORK 36
APPENDIXA 38
BIBLIOGRAPHY 41
iv
LIST OF FIGURES
Figure Page
1. Straight-through processing of messages 5
2. Windows define the scope of operations 6
3. Aurora system model 7
4. Network connection to a server using single thread by Java NIO 8
5. Overview of SensorML schema 9
6. Illustration a Rete nodes and states 11
7. Tandem-style can ensure high availability for real-time stream processing 12
8. A basic architecture of a database system 14
9. Basic architecture ofa rule engine 15
10. Basic architecture of Stream Processing Engine (SPE) 15
11. Causality and conflict relationships 27
12. Hierarchical architecture of the event model 29
13. Architecture of event processing engine 31
14. Event Processing Engines(EPEs) interacting via web services 37
v
LIST OF TABLES
Table Page
1. The capabilities of variouse systems 17
2. The thirteen possible temporal relationships 25
vi
CHAPTER 1
INTRODUCTION: COMPLEX EVENT PROCESSING (CEP)
Sensor based automation is increasingly been deployed in everyday life. Sensors
of differing types are being used in applications ranging from green homes, alarms and
security systems, navigation systems and smart houses.
The demand for automation in the enterprise and industry dwarf the needs of
everyday users. For example in a modern fleet management system, by adding Global
Positioning System (GPS) sensors, or engine monitoring systems to vehicles and trucks,
supervisors can locate the vehicle in real-time and keep track of vehicles’ performance
status. This permits the fleet operators such as United Parcel Service (UPS) and Federal
Express (FedEx) to adjust schedules dynamically taking into account such parameters as
traffic and vehicle breakdown. Real-time engine monitoring could prevent costly
breakdown maintenance on the vehicle vehicles in time. Real-time monitoring and
automation can thus improve utilization of resources and improve productivity.
In healthcare, new sensor based monitoring systems help patients, doctors and
nurses to keep track of patients’ health and records more efficiently and accurately
respond to patients’ needs. Prioritizing treatments based on time and location of patients
would be a major factor in resource allocation and improved response time.
1
2
Radio-frequency identification (RFID) tags are omnipresent in libraries,
warehouses, grocery stores, datacenters, etc. to monitor and locate items, enforce security
or update inventory levels. In warehouses where large numbers of assets are constantly
moved around, it is crucial to be able to locate items. Security, space management, and
specialized handling requirements dictate the need for asset management.
In software, there are several applications that produce “sensor like” data. The
Option Price Reporting Option Price Reporting Authority (OPRA) which is in charge of
monitoring and collecting all stock exchanges had a projection of 1.3 billion messages
per day in 2006 (Corrigan 2005). It is impossible for humans to handle and monitor this
magnitude of transactions and processes.
All these examples have similarities in terms of functionality and operation. They
all rely on a data source which could be can be a sensor, engine monitoring system, fraud
detection software or asset tracking. They all share the need for processing and analyzing
captured information as well as dynamically responding to events and conditions as the
data is being processed. The outcome of analysis may be automatic actions or alerts in
the functional system.
The objective of this thesis is to identify a framework of components for sensor
aided decision making. These components include data acquisition, data storage, and
processing modules, analysis engines and the methodologies of setting criteria for
processing. In this thesis, existing processing models are evaluated and a new model for
data stream processing is proposed. A multi layered event representation and reasoning
model is implemented that feeds sensory data derived from low level sensors such a into
3
higher-level event structures is described. Theoretical issues in the development of event
based processing are described.
CHAPTER 2
BACKGROUND
Literature Review
The scenarios discussed previously, while superficially different, have several
features in common that have been the focus of research in recent years. In all cases
large amounts of data are being generated over short periods of time and it is difficult to
deal with such massive amount of information, let alone perform “sense making.” The
spatial data characteristics and the time sensitivity of the information are critical in “sense
making.” Finally, the ability to use processed data for decision making and taking
actions is a critical need.
To close the gap between capturing, processing and decision making, there is a
need for a framework capable of linking these components to improve performance of
stream processing. Recently, Complex Event Processing (CEP) was proposed to analyze
stream data as it flows into a system ( CEP Interest). The initial focus of CEP was on
stock market exchange rates and later on expanded to a broader definition to include any
live data stream processing (CEP Interest). CEP is a new approach in data analysis and
information systems such as “Business Activity Monitoring, Business Process
Management, Enterprise Application Integration, Event-Driven Architectures, Network
and Business Level Security and Real time conformance to regulations and policies.”
(ComplexEvents.com)
4
5
CEP has become the solution for many challenging applications. Eight
characteristics have been identified as the basis of complex data processing (Zdonik,
çetintemel and Stonebraker 2005). The first requirement is minimizing the latency as
much as possible to keep data flow going by avoiding costly storage operations as shown
in Figure 1. There are “passive” systems which constantly look for a condition to meet in
order to take actions or initiate a process. In these systems costly overhead exists due to
the fact that some resources will be in use without participating in overall processing.
The solution is to develop an “event driven processing” architecture where resources are
utilized on demand. Figure lillustrates how preventing latency factors can speed up the
processing phase (Zdonik, cetintemel and Stonebraker 2005).
~. Stream Processing ApplicationRealtime AlertsFeeds Actions
Memory
DiskOptional Storageand Queries
Figure 1. “Straight-through processing of messages” (Zdonik, çetintemel and Stonebraker 2005)
The second rule is using a high level language like StreamSQL to reduce
development time by exploiting the built-in functionality of the SQL operators. The
temporal component of data stream is a key driver in most processing engines and, SQL
6
queries can construct tables of records within a time interval. Thus, there is a need for
monitoring elements to use the available operations in SQL like aggregation or joins and
while keeping track of the data sequence in the stream. This component can be defined
as “scope” or “window” over data stream and is responsible for securing the order of
processing as shown in Figure 2.
Arriwal filfl~ 3:10:15 3:11:12 3:11:24 3:11:42 3:12:13 3:12:34 3:12:53 3:13:13 3:13:45 3:14:13 3:15:18 3:15:43
Datavalue j~
Window X
Window X4i
Window X+2
Figure 2. Windows define the scope of operations1 (Zdonik, çetintemel and Stonebraker 2005).
Depending on the nature of data stream and processing engine, different models of
time window can be applied for best results. Figure 2 shows how a persistent time
window can cover the data stream by sliding over time. The overlap factor may be used
to boost recent history lookup in case there is need for integration of live data with recent
captured data while processing in real-time.
Aurora is a research project using the stream approach to develop Database
Management Systems (DBMS). In this model trigger boxes, which can keep state of
‘The window has a size of 5 messages and slides by 1 each time the associated operator isexecuted. Consecutive windows overlap (Zdonik, çetintemel and Stonebraker 2005)
7
information in memory for certain amount of time, are defined (Figure 3). This
overcomes the high cost of look up of previous states.
Input datastreams
I>
Figure 3. Aurora system model(Abadi, et al. 2003)
The storage of historical data in memory allows a faster lookup for the state of data
in the recent past. In addition, these trigger boxes try to persist recent inputs and prepare
for new data. The system can generate new trigger boxes on demand or use existing
boxes for operations such as aggregation of old and live data (Abadi, et a!. 2003).
Dealing with “stream imperfection” is the third component of this approach.
Examples of this include out of order packets and blocking Input/Output reads which
introduce a very expensive overhead to the system. Defining timeouts or having
feedback system to adjust the out of order packets are few ways to solve these problems
(Zdonik, cetintemel and Stonebraker 2005).
Sun Technology (Sun Microsystems, Inc. 2002) has developed a new Input/Output
operation concept which performs much faster on both network sockets and disk
operations thereby circumventing the problem of blocking Input/Output operations.
8
This innovative method, which requires less thread management overhead, can serve
multiple network connections and data channels. It supports real-time stream processing
characteristics such as accessing disk storage or network resources with shorter latency
and improves performance with fewer numbers of threads per sets of channels. Figure 4
shows an overview of this new approach (Travis 2003).
Channel 1 ‘‘~ Selector manages the keyselection, and worker threadshandle the channel
• processing
Channel 2 “~,
• Key Key Key Key
Channel3 IChannel4
Figure 4. Network connection to a server using single thread by Java NIO
In the Java New-TO library (Sun Microsystems, Inc. 2002), a selector iterates
through a list of connected channels and picks up a fixed amount of data in buffers within
each key object and then transfers the data. The key object has information about its
corresponding channel; the selector knows about status of each channel and collaborates
Input/Output operations over live connections.
Universal assurance of predicted data is an important property in such systems. If
the system expects certain order in the data path, such property should be maintained
9
throughout the system (Zdonik, çetintemel and Stonebraker 2005). Data refinement and
persistency is one of the crucial challenges in real-time data processing.
Each enterprise has a domain of operation with specific hardware, unique sensors
and other sources of information. However, developing a data acquisition component for
large number of sensor information sources is very hard and time consuming. Lack of
universal standards for formats of sensor data will have direct affect on development and
scalability of applications. Although there are new initiatives such as Sensor Markup
Language (SML) (Botts 2003), in Figure 5 that permit standardization, the utilization of
such resources are still to be universally adopted.
[~ä~men~Cwis~raInedBy Imea~wes I Ioparat~dBy
Chem~tr~Moc18I
I da~iun~ntedBy
—~at~n~U~iW1
Figure 5. Overview of SensorML schema (Botts 2003)
The SML standard, initiated in academic sector, has support from the Open GIS
Consortium, Inc (OGC), the National Imaging and Mapping Agency and NASA (Figure
5). The basic SML schema (Botts 2003) is a comprehensive schema for serving variety
10
of sensors regardless of type, unit of measurement along with the special and temporal
context associated with each sensor.
The state of information is maintained to be accessible and adjustable during
processing time with respect to the length of the sliding window (or time-window). By
maintaining processing states other data operations such as integrating or aggregating
stored data with live data can be speeded (Zdonik, çetintemel and Stonebraker 2005).
Other solutions may be deployed that eliminates the need for database access to
recent history. The Rete algorithm2, as shown in Figure 6, is one of the commonly used
algorithms to develop rule engines. Rete algorithm has a capability of storing states and
routing data through chain forwarding dynamically. In Rete by looking up valid stored
states in memory, all pre-computed results may be made available very rapidly (ILOG,
Inc. 2005).
2Forgy, Charles. Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem.1982.
11
ReteType Node Select Node
RootNode
AlphaMemory
Assertion &Retractions
ConflictAgenda Resolution
Figure 6. Illustration a Rete nodes and states (Young 2008).
Defining redundant data paths in overall system architecture is necessary in case of
failure. Alternate routes help systems by protecting integrity of data through the control
path. Installing more than one gateway for data acquisition, data streaming and
processing is useful for load balancing during uptime of systems and in case of broken
data pipeline system can diverting data path to a working path (Figure 7).
12
Seccw~daiy
checkpoints
~KSfIi*~* Primary ~strean~ng
Figure 7. Tandem-style ensures high availability for real-time stream processing (Zdonik,çetintemel and Stonebraker 2005).
Exploiting low cost equipment and upgrading existing systems by creating
processing clusters, using load and balance techniques, and utilizing new processors by
developing multithreaded applications are techniques in stream processing. It is essential
that the code be optimized with respect to the hardware platform (Zdonik, cetintemel and
Stonebraker 2005)
In real life scenarios, sensors are located in remote locations and deployment of
ordinary computer systems is not feasible. The sensor data processing has to be done by
using systems that can survive remote locations and harsh environments. In addition,
these systems should be able to communicate with other systems and possibly to a central
system for transferring collected data. The alternate approach is to connect large sets of
sensor networks to a central computer, such as clusters and super computers, which is
cable off handling large amount data.
The embedded systems’ concept has been introduced and added to the framework
because of its special characteristics. Simplicity, real-time response and reactivity, and
low power consumption are characteristics of embedded system characteristics. From
13
Automated Break Systems in cars to handheld and mobile phones embedded systems
play a key role in data processing today (Vaandrager 2007).
Embedded systems are good candidates for data acquisition and the primary stage
of processing. In building stream processing engines, it is obvious that the software
platform for computer clusters with large resources will be different from an embedded
system
Stream Processing Models
Three models have been actively used for stream data processing traditional
Database Management Systems (DBMS5), Rule Engines and Stream Processing Engines
(SPEs) (Zdonik, cetintemel and Stonebraker 2005).
In DBMSs processing models one or more DBMSs is used in combination of
loading applications to direct the data stream for storage and be available to other
application for processing (Figure 8). A disadvantage of having a DBMS model is the
latency caused by constant disk access. In order to compensate this key factor a new
generation of Main-Memory databases has been developed (Zdonik, cetintemel and
Stonebraker 2005). MeObject, HSQL, SQLite are few examples of such developments.
McObj ect as one of the leading developers of in-memory database systems
(IMDS) has developed 64-bit technology to provide in-memory management of
databases. The benchmark test shows improved performance, managing a 1.17 terabyte,
15.54 billion row database (McObject LLC 2008). These records are impressive
especially when the in-memory databases are also available for embedded computing.
14
Figure 8. A basic architecture of a database system (Zdonik, çetintemel and Stonebraker 2005)
Rule Engines have been around as early as 1970’s in the artificial intelligence
community (Zdonik, çetintemel and Stonebraker 2005). Rule engine is a “sophisticated
interpreter of ~f then statements. The ~f then statements are the rules” (Mahmoud 2005).
There are many flavors of rule engines for the purpose of information processing.
Almost all major software development companies have developed rule engines which
are used to processing data using business rules. For example, in order to monitor sales
we can create a rule consist of an IF/THEN clauses and the rule engine will take actions
accordingly. Forexample: IF(”saies amount >=$ioo”),THEN(offer discount 5%)
(Mahmoud 2005).
Figure 9 illustrates how the data stream is being fed to a rule engine. The rule
engine would verify the input with respect to the rules in the rule base and take action(s)
accordingly. The outcome may cause further rule execution or modification of internal
states of parameters (Zdonik, cetintemel and Stonebraker 2005).
data
15
* RulestrearnLng Engi~J outputs
data alerts
RuI~Base
Figure 9. Basic architecture of a rule engine (Zdonik, çetintemel and Stonebraker 2005)
Stream Processing Engines (SPEs) uses SQL style processing the data stream is
not stored by default but stored as needed. Existing SQL queries do not have the
capability of monitoring time-window and special primitive types to deal with real time
data stream. However, the SPEs, as shown in Figure 10, are developed by adding new
directives and functionalities to existing SQL queries (Zdonik, cetintemel and
Stonebraker 2005).
streaming outputsdata alerts
Figure 10. Basic architecture of Stream Processing Engine (SPE) (Zdonik, çetintemel andStonebraker 2005)
16
Comparison of Stream Processing Models
In a quick comparison between these systems, DBMS is categorized as “process
after-store” model which is not clearly optimized for stream processing which could lead
to delay and in some cases blockage of data flow. On the other hand, neither of rule
engines and SPEs requires storing data before processing and both models process data
on the fly. The primary intent of SQL is dealing with finite sets of data and applying
predefined operation on the finite dataset.
In rule engines and SPEs there is a place to introduce notion of time-out in case of
dealing with irresponsive operation such as Input/Output. However, DBMSs cannot
integrate such concept due to the nature of SQL design. This can be resolved only if it is
know apriori that a time-out is required forcing the SQL query to have look ahead
knowledge about sets of operations. Despite the semantics of triggers in DBMSs, time
out behavior is not reliable by using triggers. Real-time processes cannot therefore be
handled adequately by database systems (Zdonik, cetintemel and Stonebraker 2005).
The predictability of outcomes is one of the important features that a system needs
to preserve while processing real-time data stream. In the DBMS processing model there
are many applications that interact with the stored data.
Atomicity, Consistency, Isolation, Durability (ACID) are properties of DBMSs
transactions. Traditionally, each database application will process stored data in isolation
of other applications. Since there is no coordination between applications, the temporal
order of the data during processing is undermined. This is one of the major differences
between SPEs and traditional database models. The DBMS models needs a monitoring
17
system to coordinate order of execution as data being accessed among processing
applications. Table 1 shows a summary of different models and their capabilities (Zdonik,
çetintemel and Stonebraker 2005).
Integration of archive data and live data for stream processing is important. Most
rule engines have adopted the Rete algorithm which applies a fixed time-window for
storing processing states. Depending on the length of the time-window, rule engines
cannot perform seamless integration of processed data with real-time data stream beyond
a point. Both DBMS model and SPE models with their ability to store processing states
without limitation can operate over historical and live data.
Table 1. The capabilities of variouse systems. *
DBMS Rule engine SPE
Keep the data mo~ lag No Yes Yes
SQL on streams No No Yes
Handle stream imperfectIons Dif&~tlt POSSibi~ Poss~bie
Pt edictable outcome Difficult Possible Possible~
High availablflty Possible Possible Possible
Stored and streamed data No No Yes~
Distiibntion and scalabflitv Possible Possible Possible
Instantaneous response Possible Possible Possible~
*Source: (Zdonik, çetintemel and Stonebraker 2005)
Enhanced Stream Processing Model
Most current technologies are focusing on SPEs which has strong flavor of SQL
queries and often work as wrappers for traditional DBMS. Aggregation operations on
live and archive data stream and storing states of data are the only two factors that a rule
18
engine lacks. A new version of SPE can be defined by providing these two factors for a
rule engine.
The new model will maintain the robustness of rule engine and is capable of
integrating archive data and live stream without any problem. A new component to the
processing system will be an in-memory database system (IDMS) and instead of defining
new primitive data types and directives along with the new embedded database, new sets
of SQL queries will be added to rule engine structure to fetch stored data past the current
time-window on demand. All aggregation functionalities are available internally and
always can be embedded into the rule engine structure.
Another advantage of the new model is its deployment as a very light engine for
embedded systems to handle local data and low data bandwidth. At the higher
performance end, the new system may be deployed as a cluster for regional processing
within enterprises with capability of handling very large data bandwidth.
Definition of Terms and their Relation
In this section definitions of event processing technology are provided (Luckham
and Schulte 2008).
Anything that happens, or is contemplated as happening is called Event. Event is a
very generic term which relies on the domain in which occurs. Any physical or virtual
phenomena can be considered an event to be consumed by an SPE. Examples of events
are: airplane takeoff or landing, a financial transaction or exchange rate changes, natural
occurrences like earthquake or wind direction, sensor output data, social or historical
19
happenings such as elections. There are lots of domains where these entities can be
divided as events (Luckham and Schulte 2008).
An Event object represent any events or happening for purpose of processing in
computer systems. For instance a purchase order can be classified as an event object and
considered as a financial trade event type. Email notification for an airline ticket
reservation is a representation of reservation in form of an email object which may be
processed by a computer as an event. There are many ways to represent an event as an
object for processing (Luckham and Schulte 2008). However, it depends on the domain
of processing engine and event definitions included in that domain that results into
acceptance or rejection of an object by a processing engine. A virtual event is an
extension to the event definition to cover areas which are not physically existent like
weather prediction by a weather simulation (Luckham and Schulte 2008).
The key note definition in event processing is the notion of Event Type which
allows event engines to identify the primarily type of an event object for the purpose of
initiate the processing phase. Event type is defined by a strongly type computer language
such as XML schema or Java class. This definition has an abstract structure in which a
category of event objects would fit in with certain properties and attributes (Luckham and
Schulte 2008). In the following discussions the Event Object is referred to as event.
Different dimensions and properties of an event are referred to as Event Attributes.
Attributes are the factors which allow processing to distinguish an event from another
(Luckham and Schulte 2008).
20
These attributes are used for evaluation of events with type definition or
comparing events which have different type definition, but they have comparable
attributes. For example, to monitor environmental elements of an area different events
such as wind, humidity, temperature are gathered. There two particular attributes of these
events that defines the overall status of the given area. One is the time of occurrence of
events which is the temporal context and used to check things like validity of an event in
a time-window. The second attribute is the geographical coordinates which is the spatial
context of these events and defines area of interest. In addition, event attributes allow
comparison between events with the same definition.
The temporal context has some primitive elements used in order of event
processing. The most basic element of the temporal dimension is Clock which is defined
as “A process that creates an ordered ascending sequence of values of type Time with
a uniform interval between them and each value is produced at a tick (or clock tick).”
Granularity is “the length of the interval between clock ticks” and the most used attribute
of an event and an indication of temporal context is Timestamp and defined as “A time
value attribute of an event, recording the reading of a clock in the system in which the
event was created or observed”(Luckham and Schulte 2008).
In order to define a Complex Event there few more definitions are needed to build
a relation between events. An event can be Cause of another event, like sending an email
is the primary cause of receiving reply for the same email. Therefore, the chain of event
generation introduces notion ofAbstraction where “an event is an abstraction of a set of
events if it summarizes, represents, or denotes that set of events”. Based on definition of
21
abstraction a Complex Event is defined as “An event that is an abstraction ofother events
called its members.” For instance, an online purchase is done successfully is a complex
event which abstracts a set events such as item selection, payment transaction and etc.
Keep in mind that temporal, casualty, abstraction are used as event relations components.
For processing events, Rules are required as prescriptions method of processing for
any given event. In order to define rules a high level language is needed which is called
Event Processing Language (EPL). In some of the SPEs SQL type languages are the
primary base definition of EPL3. For the complete list of definitions and explanations,
please refer to appendix A.
The new proposed model will use rule engine. Rule engines have built-in language which utilizesdefme rules creation and interpretation.
CHAPTER 3
THEORETICAL FRAMEWORK: EVENT PROCESSING MODEL4
Introduction
The design of the proposed event processing model is based on the notion of
events for modeling the spatial-temporal information of sensor data in the context of
enterprise applications. An expansion of definition of an event is “an occurrence or
happening of significance that can be defined as a region or collection of regions in
spatial-temporal-attribute space” (Jam 2003).
Given k events 1, ..., Ic, the 1th event is formally denoted as e.(t,s,a~,...a,,) and
uniquely identified by its event identifier, called elD. t characterizes the event
temporally, s denotes the spatial location(s) associated with the event, and a ,..., a,, are
the attributes associated with the ‘th event. Conceptually, an event may be considered to
be semi-structured data.
For simplicity of implementation, we create the event data type based on the
following mandatory attributes: elD, space, time, sensor data, event-name, event-
granularity, event-associations, and event-topic. The attribute space provides three
‘~ Stream processing in reality is Event processing and has the same meaning and semantic fromoverall processing perspective. In this document stream processing and events processing areinterchangeable.
22
23
meanings to locate an event: geographic information, abstract symbolic information like
the postal address or room number, and virtual address on the Internet like MAC address
or IP address.
In a real-time system which reads data within a time interval, the attribute time is a
concept of timestamp which may include start and end times for an event. The attribute
sensor data describes the sensor data gateway and the associated reads by the gateway.
The attribute event-name is meant to provide a human readable description. The attribute
event-granularity presents the complexity of an event by marking it as sub-event and
super-event. The attribute event-topic provides a simple means to provide a semantic
description to the event. Currently, we interpret it as a set of keywords; however, topics
may also be organized to reflect a domain dependent taxonomy.
Expressing data in terms of events introduces a structure on the information space
that is not only similar to the cognitive way of organizing information, but also has these
additional characteristics: (1) Events strongly utilize the notion of transciusion with
respect to sensor standards based on which they are defined(Nelson 1992). On one hand
transcluding allows events to transcend the different sensor standards on which the data is
based. On the other, a strong link can be maintained between particular sensor data
through event relationships, which also could be used for reasoning in an enterprise
application context. (2) The relationships among events are made prominent. The
flexible organized data can be shared between events. (3) The mapping and reasoning
model is viable through the evolution of events.
24
Event granularity is an important attribute to delimitate the primitive event and
complex event in CEP (Complex Event Processing) (D. Luckham 2002). The complexity
of event derives from two aspects. First, physical objects associated with events are often
an aggregation of multiple objects. For example, a car consists of thousands of parts
which may all be tagged with RFID tags. Second, an event itself is associated with
multiple granularities. An event means an event could be broken down into several sub-
events, each of which in turn could be divided into finer sub-events.
From a theoretical perspective, an event can be divided into an arbitrary smallest
unit. However, event granularity is generally not considered in terms of their smallest
unit acquired by technological possibility but in terms of business requirements. In event
decomposition processes, the granularity graph of an event is created(Jain 2003).
Event Relationships
Event processing is different from signal processing because it must deal with the
relationships between events. The ability of perceiving business activities as consisting
of discrete events that have some orderly relationships aids the generation of hypotheses,
guides our understanding of what is happening, helps control actions and forms the basis
of our later recollection of what took place. Based on our research, the three common
and important relationships between events are the following.
Temporal Relationship
All events have temporal boundaries. In the event structure, the fundamental
element is the event which is then related to other events by a temporal ordering such that
events occur in a particular sequence, e.g. event e2 happened after event el means either
25
that e2 occurs after el or that the occurrence of el is necessary for the occurrence of e2.
Thus, a critical part of handling event relationships is to identify beginnings and endings
of events, and to understand their temporal relations. Allen’s temporal interval algebra
for representing and reasoning about temporal relationships may be applied for
presenting the temporal relationship between events (Allen 1983).
Thirteen relationships including seven basic relations and their symmetrical
relations are shown in Table 2. In addition, temporal relationship between events
depends on the timing unit, e.g. hour, month, or year.
Table 2. The thirteen possible temporal relationships*
Symbol Symbol forInverse
RelatIon
el before e2
Pictoral example
< > ~Ll~~
elequate2 = = ~ ee2~
el meats e2 m ml e2~:
el overlaps e2 0 01 elA—,~ ‘ e2 ~“
el during e2 d di e2~
/—-——~el
el starts e2 s Siel
el flnishese2 f fi~
, el
*Source: (Wu, et al. 2005)
26
Spatial Relationship
To describe the relationships between events, we enhance the temporal
relationships by the order of the space since events are, on the other hand, bounded in
space. Spatial relations are classified into two types: topological relations that describe
neighborhood and incidence (e.g. overlap, disjoin) and directional relations that describe
order in space (e.g. south, northwest). Though there is additional virtual spatial relations
like IP address emerging from cyberspace, we will ignore it here because it can be
transferred into an actual address represented by above spatial relations. A general
spatial relationship considers 12 directional relationships and 6 topological relationships
(Li, Ozsu and Szafron 1996). The directional relationships are classified into the
following three categories: strict directional relations (north, south, west, and east), mixed
directional relations (northeast, southeast, northwest, and southwest), and positional
relations (above, below, left, and right). The topological relationships include equal,
inside, cover, overlap, touch, and disjoint.
Causality Relationship
Causality is the relation between causes and effects. It is a complex logical
concept5, and we adopt the definition in6 that Causality is a dependence relationship
Kshemkalyani, D. “Efficient evaluation of causality relations between nonatomic events.” InProceedings of the seventeenth annual ACM symposium on Principles of distributed computing. 1998. 322.
6 Luckham, D. “The Power of Events: An Introduction to Complex Event Processing in Distributed
Enterprise Systems.” Addison-Wesley Pub Co. 2002.
27
between events in an application. For example, if the activity signified by event El had
to happen in order for the activity signified by event E3 to happen, then El caused E3.
Corresponding to the dependency relationship indicated by causality, there is a
conflict relation such that the occurrence of both the events participating in this relation is
prohibited. This means that, E4 cannot happen at the same time as E5 if there is a
conflict relationship between them. In addition, there are obvious temporal relationships
between events which have causality relationship. The causality and conflict
relationships between events are shown in Figure 11. If El caused E3, for example, El
must happen before E3. Using the symbols in Figure 11, the possible temporal
relationship between them may be three possibilities: El > E3, El m E3, El o E3.
—~ cause
- - - - conflict
Figure 11. Causality and conflict relationships
Event Hierarchical Model
Events produce real-time, heterogeneous data with spatial and temporal
dimensions. However, low-level sensory information is not directly useful. The raw
sensory inputs have to be processed at multiple levels of abstraction and using context
information combined to form higher-level events that model useful responses both
28
system and human initiated. We propose a three layered event representation and
reasoning model to maximize the flexibility and responsiveness of the system during the
course of event management and processing in this proposed event processing model. As
shown in Figure 12, this layered event architecture takes data from low level sensors such
as RFIDs into higher-level event structures, which can then be exploited by appropriate
event handlers.
In Figure 12, the Data Event Lever is responsible for creating basic events that can
be computed directly from sensor input. The Elemental Event Level processes data
events, and then maps or collates them. The event mapping relies on the event
relationships detected in sensor data streams. The Domain Event Level interprets the
interactions of entities’ primitive events in a business context or any application context
in which the engine resides. The interactions are decided by domain knowledge or
formalized through an Event Ontology.
In summary, the three-tier architecture detects domain events at the most primitive
level using data and elemental events. The Data Events are then composed into
Elemental Events and Domain Events through a process that aggregates and abstracts
data using event ontology at different levels of abstraction. When abstracting data events
from raw data, there is also filtering process which can remove the (informational) noise
through specialized event detectors.
29
Figure 12. Hierarchical architecture of the event model (Wu, et al. 2005)
The process from data events to elemental events is a ftising process, since forming
the elemental events requires the aggregation of several data events. The elemental
events provide the basic template to map data events into different domain events. The
domain level is based on the domain ontology and prepares a dynamic domain event
model that capture aggregation behavior involving patterns and relationships.
ii~4.~ abstraction -4-—- -~. Domain leaf TE~: Data Event No. i
Operation symbol: ~ inference 4 ~ relation EE~: Elemental Event No. i
DE1: Domain Event No. i
30
CHAPTER 4
ARCHITECTURE OF EVENT PROCESSING ENGINE
Introduction
The architecture of the Event Processing Engine (EPE) is shown in Figure 13.
The EPE is located in the local server where the local sensor data is collected. In order to
realize tasks of data filter, aggregation and interpretation described in previously, the
event processing engine is composed of two modules, the event filter engine and real
time event management. The raw sensor data is converted into meaningfully associated
event, so that it can be handled in real time in the event management module.
In this module, a three-tier event model is exploited for the mapping from low
level event to semantic domain event supported by event ontology. The event data type
is defined for representing attributes and structures of events, and then populating events
into event-based in-memory database.
In summary, such event processing engine has to address three key issues: event
filtering, event assimilation, and event interpretation. We discuss these issues in the
following sections.
31
Legend ----+ wireless data stream
distributed sensor reader
Sensor hite rogator
1~
t
Business Service Layer
—+ data stream — Web
~ localserver ~ sensor
Figure 13. Architecture of event processing engine(Wu, et al. 2005)
Event Filtering
Event filtering is performed by the event filter engine module to reduce the volume
of raw sensor data. Removing the unusable data could help to lower data collisions,
reduce the data latency on the network and reduce the stress on data storage. There are
three practical ways to filter events. First, a criteria or threshold derived from event
ontology is set to automatically select only important events, such as validated RFID tag
arrivals and departures.
ln-memc~v
database
32
Second, defined business rules could help separate meaningful information from
unwanted data as close to the readers as possible. In this case, event filter engine control
the propagation of events by allowing consumers to subscribe to the exact subset of the
events in which they are interested.
Finally, the scale of event granularity could be set based on rule requirements to
filter sensor data. The related configuration of filtering condition could be characterized
as multiple conditions and be specified for each individual sensor object (Wu, et al.
2005).
In the implementation of event filtering in an 00 programming language, the
event filter is designed as a character string, a function, and an object. Character strings
can provide a textual representation of filter constraints. The event filter class
represented by object carries out filtering functions by executing its functions.
Event Assimilation
Event assimilation is an event analysis and mapping process from data event level
to domain event level. With event assimilation, the processing engine can use all of the
gathered real-time raw data streams to construct a model of the sequence of actions that
constitute one entire event in domain level. Event Ontology plays a central role in this
automatic process. The ontology-based event assimilation infrastructure uses shared
concepts expressed through common ontology as a basis for interpretation of data and
metadata.
In this model, events are presented, or to be more precise event content, using a
self-describing data model called Event-Graph. Event-Graph refers to concepts from
33
local/domain ontology to enable the semantically correct interpretation of event content.
In this approach, an Event-Graph parses the data as it is coming and assimilates data to
build an environment model that reflects knowledge about the event on the basis of
information collected so far (Wu, et al. 2005).
The Event-Graph attributes of an events are represented as triples of the form EG
<5, E, t>, with S referring to the evolution of an event, E representing the actual event
value defined by event data type, and t providing the temporal passage of an event. In
such 5, E, t sets of EG, State Space (5), Event Array (E) and Time (T) are three basic sets.
Event Interpretation
Events may be grouped together and interpreted based on event relationships in
collections, called event categories. Formally, an event category can be represented as:
C = (el, e2, ..., ek}, where (ej, e2, ..., ek} is the set of events that comprise the category.
The inclusion of events in a category is defined either in terms of event
associations (or constraints) or by simple enumeration and inclusion. As an example of
the former, consider an event called car assembling that includes multi-channel RFID tag
data describing the spatial associations of parts. An event category may be defined to
contain all events of its parts, where the topological relationship of parts is used to
automatically associate all pertinent events in the assembling event with each other. An
event category that names “the price of orders is more than $10,000” is an example of
one that is defined through enumeration and inclusion. The more complex event category
can be formed by integrating several event relationships together. For example, “a sales
34
order happened in Atlanta” groups all events associated with a location and a business
event (Wu, et al. 2005).
Event categories can be used as complex queries on the data. In this context, event
categories are similar to stored queries. As new data in entered in the system, re
computing event categories brings to notice events of interest. Event categories can also
be used as a tool for data exploration and hypothesis formulation.
The definition of an event category can be motivated by a hypothesis. The events
that comprise the category are a subset of the data that conform to the hypothesis. This
data can be analyzed in terms of its distribution in the spatial-temporal-attribute space.
Furthermore, different event categories can be related to each other in terms of the events
that comprise them.
CHAPTER 5
DISCUSSION
Increasing application of sensor networks such as RFID networks to the enterprise
has ramifications for back-end and enterprise systems. Sensors nodes generate large
amounts of real or near real-time data. However, understanding and managing these data
in the context of enterprise applications bring many challenges for the middleware which
has to connect sensor networks with enterprise applications. The proposed idea in this
thesis is an event processing paradigm of organizing sensor data based on the notion of
an event. The primary characteristics of an event, the temporal property, spatial property
and casual property, and their relationships are discussed. A three layered event
representation and reasoning model is proposed to map raw sensory information to
semantic events which is applied in application domains (Wu, et al. 2005).
Filtering sensory data to generate event streams suitable for the proposed
processing engine is a challenge by itself and as application domain changes new sensors
and data sources will be introduced to the model. Due to lack of a universal sensor data
standard all SPEs need a custom wrapper around the sensor channel to generate readable
event objects for the processing engine. Thus, the need for a set of strongly typed
language models like SensorML or EventML which have a schema and precise definition
is crucial. Until then the overhead of custom modules for data acquisition will remain in
any event processing engines.35
CHAPTER 6
FUTURE WORK
This model presented in this paper has the core components of a fully functional
Event Processing Engine. However, there are several components to complete the
framework. The first part is developing the business layer interface which enables any
enterprise to communicate with the engine in order to collect events, modify the filtering
and processing criteria.
Another component which needs attention is the Event Portal Server. It is often
used for representation and visualization of the live data processing status and a direct
access to server configuration. Things like adding new object sensors, adjusting existing
sensor configuration. Again rule definition and processing criteria can be updated via
this portal. Event portal should be able to act as a bi-directional web services component.
The importance of the web service module is more obvious for large and even
international enterprises. In another word, the web service module acts as an event
channel to fire abstract events to other processing agents outside or inside the enterprise
and at the same time be able to receive events from other agents in the business layer and
integrate the external abstract events with live locally acquired data (Figure 14).
36
37
~D=~L~•~ Ct] Ct]LZ]
t~ ct ~ttSefleor Field Seneer Field
I~n~
EPE
-
- ~
Seneor Field Seneor Field
S
Figure 14. Event Processing Engines (EPEs) interacting via web services
This is also an initial step toward distributed event processing in which dealing
with raw data and generating primary events is done by using embedded systems. Since
the initial processing is very specific to sensors, embedded system will be best candidate
for this phase. In addition to low cost of embedded system and their low power
consumption, embedded systems can reduce traffic in event channels between processing
agents by primary filtering.
server
38
APPENDIX A
This list is populated from Event Processing Glossary v.1 .1 by David Luckham
and Roy Schulte.(Luckham and Schulte 2008)
Abstraction: An event is an abstraction of a set of events if it summarizes, represents, ordenotes that set of events.
Architecture (from IEEE): The fundamental organization of a system embodied inits components, their relationships to each other and to the environment, and theprinciples guiding its design and evolution.
Architecture style (from Roy T. Fielding): A coordinated set of architecturalconstraints that restricts the roles/features of architectural elements and theallowed relationships among those elements within any architecture that conforms tothat style.
Cause: An event A is a cause of another event B if A had to happen in order for Bto happen.
Clocks: A process that creates an ordered ascending sequence of values of typeTime with a uniform interval between them. Each value is produced at a tick (orclock tick).
Complex event: An event that is an abstraction of other events called its members.
Complex-event processing (CEP): Computing that performs operations oncomplex events, including reading, creating, transforming or abstracting them.
Composite event: A derived, complex event that is created by combining baseevents using a specific set of event constructors such as disjunction,conjunction, sequence, etc. A composite event always includes the base (member)events from which it is derived.
Constraint (event pattern constraint): A Boolean condition that must be satisfiedby the events observed in a system.
Derived event (synthesized event): An event that is generated as a result of applyinga method or process to one or more other events.
39
Event: Anything that happens, or is contemplated as happening.
Event (event object, event message, event tuple): An object that represents encodes orrecords an event, generally for the purpose of computer processing.
Event attribute (event property): A component of the structure of an event.
Event channel (event connection, event pathway, event topic): A conduit inwhich events are transmitted from event sources (emitters) to event sinks (consumers).
Event cloud: A partially ordered set of events (poset), either bounded or unbounded.
Event-driven: The behavior of a device, software module or other entitywhose execution is in response to the arrival of events from external or internal sources.
Event-driven architecture (EDA): An architectural style in which some ofthe components are event driven and communicate by means of events.
Event sink (event consumer): An entity that receives events.
Event source (event emitter or event producer): An entity that sends events.
Event processing: Computing that performs operations on events, includingreading, creating, transforming and deleting events
Event stream: A linearly ordered sequence of events.
Event stream processing (ESP): Computing on inputs that are event streams.
Event pattern: A template containing event templates, relational operators and variables.An event pattern can match sets of related events by replacing variables with values.
Event pattern triggered reactive rule: A rule that prescribes actions to betaken whenever an instance of a given event pattern is detected.
Event processing agent (EPA) (event processing component, event mediator):A software module that processes events.
Event processing language (EPL): A high level computer language for definingthe behavior of event processing agents.
Event processing network (EPN): A set of event processing agents (EPAs) and a set ofevent channels connecting them.
40
Event template: An event form or descriptor some of whose parameters are variables.An event template matches single events by replacing the variables with values.
Event timing (timing): The time value attributes of an event.
Event type (event class, event definition, or event schema): An event type is a class ofevent objects.
Granularity (chronon): The length of the interval between clock ticks.
Pattern instance (event pattern instance):A set of related events resulting froman event pattern by replacing the variables by values.
Raw event: An event object that records a real-world event.
Relationships between events: Events are related by time, causality, abstractionand other relationships. Time and causality impose partial orderings upon events.
Rule (in event processing): A prescribed method for processing events.
Simple event: An event that is not an abstraction or composition of other events.
Timestamp: A time value attribute of an event recording the reading of a clock inthe system in which the event was created or observed.
Virtual event: An event that does not happen in the physical world but appears tosignif~’ a real world event; an event that is imagined or modeled or simulated.
Window: A bounded portion of an event stream.
41
BIBLIOGRAPHY
CEP Interest. Complex Event Processing (CEP) History.
http ://www.eventstreamprocessing.com!cep-history.htm (accessed 12 20, 2008).
Abadi, Daniel, et a!. “Aurora: a new model and architecture for data stream
management.” The VLDB Journal, 2003.
Allen, J.F. “Maintaining knowledge about temporal intervals.” Communications of
the ACM~ 1983: 832-843.
Botts, Mike. “A Sensor Model Language: Moving Sensor Data onto the Internet.”
www.sensorsmag.com. April 1, 2003.
http :/Iwww. sensorsmag.comi’articles/0403/3 0/main. shtml (accessed August 2007).
ComplexEvents.com. About CEP. http://complexevents.coml?page_id3 (accessed
december 13, 2008).
Corrigan, J. P. OPRA Traffic Projectionsfor 2005 and 2006. 8 3, 2005.
http://www.opradata.com/specs/projections_2005_2006.pdf (accessed 12 20, 2008).
Forgy, Charles. Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern
Match Problem. 1982.
ILOG, Inc. “Policy-Oriented Enterprise Management.” ILOG JRules Performance
Analysis and Capacity Planning. 2005.
http ://logic. stanford.eduJPOEM/externalpapers/iRules/jrules_cap_wp.pdf (accessed
November 10, 2008).
42
Jam, R. “Out-of-the-Box Data Engineering - Events in Heterogeneous Data
Environments.” Keynote Talk at 19th International Conference on Data Engineering.
Bangalore, India, 2003.
Kshemkalyani, D. “Efficient evaluation of causality relations between nonatomic
events.” In Proceedings ofthe seventeenth annual ACM symposium on Principles of
distributed computing. 1998. 322.
Li, J. Z., T. Ozsu, and D. Szafron. “Modeling of video spatial. relationships in an
object database management system.” the 1996 International Workshop on Multi-Media
Database Management Systems. 1996. 124.
Luckham, D. “The Power of Events: An Introduction to Complex Event Processing
in Distributed Enterprise Systems.” Addison-Wesley Pub Co. 2002.
Luckham, David, and Roy Schulte. “Event Processing Glossary - Version 1.1.”
Complex Event Processing. July 2008. http://complexevents.coml?p409 (accessed
October 2008).
Mahmoud, Qusay H. “Getting Started With the Java Rule Engine API (JSR 94):
Toward Rule-Based Applications.” Sun Developer Network(SDN). July 26, 2005.
http:/!j ava.sun.comldeveloper/technicalArticles/J2SE/JavaRule.html (accessed September
2008).
McObject LLC. In-Memory Database Scales Massively For Social Networking
Web Site. november 10, 2008. http://www.mcobject.comfNovemberl0/2008 (accessed
Dec 2008).
43
Moreno DAaz, Roberto, Franz Pichier, and Alexis Quesada Arencibia. Computer
Aided Systems Theory - EUROCAST 2007. Vol. Vol. 4739. Berlin: Springer-Verlag,
2007.
Nelson, T. “Literary Machines.” Mindful Press. Sausalito CA, 1992.
Sun Microsystems, Inc. . New JO APi 2002.
http://java.sun.com/j2se/1.5.0/docs/guide/nio/index.html (accessed March 2007).
Travis, Greg. “Getting started with new I/O (MO).” ibm.com/developer Works. July
9, 2003. http ://www.ibm.comldeveloperworks/edulj -dw-j ava-nio
i.html?S_TACT 1 05AGY82&S_CMP=GENSITE (accessed April 2008).
Vaandrager, Frits. Master Theme Embedded Systems. May 5, 2007.
http ://www.cs.ru.nl/’~-’fvaanIembedded_systems_theme.htm1 (accessed june 2008).
Wu, Bin, Z.J. Liu, R George, and K. Shujaee. “eWeliness: Building a Smart
Hospital by Leveraging RFID Networks. IEEE EMBS 2005 Conference. Shanghai,
China, 2005.
Young, Charles. Rete algorithm. September 18, 2008.
http://en.wikipedia.org/wiki/Rete_algorithm (accessed November 10, 2008).
Zdonik, Stan, Ugur cetintemel, and Michael Stonebraker. “The 8 requirements of
real-time stream processing.” ACM SIGMOD Record, December 2005: 42-47.