Network and Infrastructure Considerations for Hard and Soft...

8
Network and Infrastructure Considerations for Hard and Soft Information Fusion Processes Jeffrey C. Rimland College of Information Sciences & Technology Pennsylvania State University University Park, PA. USA [email protected] James Llinas Center for Multisource Information Fusion University at Buffalo Buffalo, NY. USA [email protected] Abstract— The changing landscape of defense and security applications, as well as existing needs in other domains, has given rise to the need to design and develop information fusion systems that entail the combination of hard (physical sensor) data and observations from humans (soft data) in a distributed networked environment ([1], [2]). Such fusion processes stress both traditional fusion-based algorithmic design strategies as well as imputing requirements onto the design of appropriate decentralized architectures. In designing a framework for such an environment, the system architecture must be considered both at the level of information infrastructure and at the network/software level of implementation. From the information infrastructure perspective, primary concerns include which observations will be fed into the system, the intended goals (e.g. data mining, hypothesis generation/testing), how humans interact with the system (e.g. information gathering, analysis, process evaluation/refinement), and which levels of state estimation and fusion are appropriate at a given stage of the process [3]. Bisantz et al [4] discuss aspects of how humans interact with such new distributed networked information fusion systems and identify “touch points” where innovations can support improved system performance. An effective network/software architecture is tightly intertwined with the information infrastructure. The former must meet the functional requirements of the latter without adding unnecessary complexity, prohibiting scalability or extensibility, putting undue performance limitations on the system, or compromising system security. Additionally, the system must have the agility and flexibility to successfully and rapidly complete complex tasks that were not possible to anticipate at design-time. This paper presents an information infrastructure for hard and soft fusion that balances the features offered by the current state of the art in computing paradigms (e.g. SOA, ESB) and data representation (e.g. RDF, OWL, TML, EML) with the requirements of our overarching concept of employment for hard and soft fusion capability. The paper provides guidance on determining the optimal software and network architecture design for enabling human-centric and distributed operations in time-critical, vastly heterogeneous, and highly unpredictable conditions. The designs described have been implemented using both synthetic data from our SYNCOIN data set ([5], [6]) and actual sensor/observational data collected at a Penn State test site. Keywords: hard soft information fusion, SOA, infrastructure, service-oriented, participatory sensing I. INTRODUCTION Although the service-oriented approach to system design has been in use for over a decade, the role of this paradigm continues to evolve with the complex systems required to handle emerging technology needs. One such need is the requirement for fusing physical sensor data with human observations in a distributed environment. Fusing data from physical sensors in order to obtain improved state and feature estimates is a mature field with many well-established techniques, time-proven methods for test and evaluation (T&E), and extensive real-world utilization [7]. However, evolving data fusion needs for defense, security, and other domains require the integration of human observations and reports (or “soft” data) with the “hard” data generated by physical sensors. Although the inclusion of soft data provides certain types of information that are impossible or infeasible to obtain from physical sensors, this new paradigm also presents a new series of challenges. These challenges include representation of “fuzzy” terms [8], characterization and representation of uncertainty in human reports, tasking and knowledge elicitation of humans, and fusing hard sensor data with soft human-based information [9]. Additionally, the environments in which these systems must operate are typically geographically dispersed and often include areas of low- bandwidth or intermittent network access. The service-oriented paradigm has tremendous benefits for such applications. The modular and loosely-coupled nature of SOA enables scalability, survivability, rapid system upgrades and updates, increased uptime, and improved quality of service. Perhaps most importantly, SOA supports the utilization of orchestration services to dynamically create composite services for the completion of tasks that may not have even been anticipated when the system was created. This is of tremendous benefit to complex dynamic systems, and will be the subject of a later section in this paper. First, it is necessary to clarify what is meant by SOA and define the current taxonomy of related technologies and tools. 627

Transcript of Network and Infrastructure Considerations for Hard and Soft...

Page 1: Network and Infrastructure Considerations for Hard and Soft …fusion.isif.org/proceedings/fusion12CD/html/pdf/085_296.pdf · 2014. 10. 2. · needs of complex dynamic distributed

Network and Infrastructure Considerations for Hard and Soft Information Fusion Processes

Jeffrey C. Rimland College of Information Sciences & Technology

Pennsylvania State University University Park, PA. USA

[email protected]

James Llinas Center for Multisource Information Fusion

University at Buffalo Buffalo, NY. USA [email protected]

Abstract— The changing landscape of defense and security applications, as well as existing needs in other domains, has given rise to the need to design and develop information fusion systems that entail the combination of hard (physical sensor) data and observations from humans (soft data) in a distributed networked environment ([1], [2]). Such fusion processes stress both traditional fusion-based algorithmic design strategies as well as imputing requirements onto the design of appropriate decentralized architectures. In designing a framework for such an environment, the system architecture must be considered both at the level of information infrastructure and at the network/software level of implementation. From the information infrastructure perspective, primary concerns include which observations will be fed into the system, the intended goals (e.g. data mining, hypothesis generation/testing), how humans interact with the system (e.g. information gathering, analysis, process evaluation/refinement), and which levels of state estimation and fusion are appropriate at a given stage of the process [3]. Bisantz et al [4] discuss aspects of how humans interact with such new distributed networked information fusion systems and identify “touch points” where innovations can support improved system performance.

An effective network/software architecture is tightly intertwined with the information infrastructure. The former must meet the functional requirements of the latter without adding unnecessary complexity, prohibiting scalability or extensibility, putting undue performance limitations on the system, or compromising system security. Additionally, the system must have the agility and flexibility to successfully and rapidly complete complex tasks that were not possible to anticipate at design-time.

This paper presents an information infrastructure for hard and soft fusion that balances the features offered by the current state of the art in computing paradigms (e.g. SOA, ESB) and data representation (e.g. RDF, OWL, TML, EML) with the requirements of our overarching concept of employment for hard and soft fusion capability. The paper provides guidance on determining the optimal software and network architecture design for enabling human-centric and distributed operations in time-critical, vastly heterogeneous, and highly unpredictable conditions. The designs described have been implemented using both synthetic data from our SYNCOIN data set ([5], [6]) and actual sensor/observational data collected at a Penn State test site.

Keywords: hard soft information fusion, SOA, infrastructure, service-oriented, participatory sensing

I. INTRODUCTION Although the service-oriented approach to system design

has been in use for over a decade, the role of this paradigm continues to evolve with the complex systems required to handle emerging technology needs. One such need is the requirement for fusing physical sensor data with human observations in a distributed environment. Fusing data from physical sensors in order to obtain improved state and feature estimates is a mature field with many well-established techniques, time-proven methods for test and evaluation (T&E), and extensive real-world utilization [7]. However, evolving data fusion needs for defense, security, and other domains require the integration of human observations and reports (or “soft” data) with the “hard” data generated by physical sensors. Although the inclusion of soft data provides certain types of information that are impossible or infeasible to obtain from physical sensors, this new paradigm also presents a new series of challenges. These challenges include representation of “fuzzy” terms [8], characterization and representation of uncertainty in human reports, tasking and knowledge elicitation of humans, and fusing hard sensor data with soft human-based information [9]. Additionally, the environments in which these systems must operate are typically geographically dispersed and often include areas of low-bandwidth or intermittent network access.

The service-oriented paradigm has tremendous benefits for such applications. The modular and loosely-coupled nature of SOA enables scalability, survivability, rapid system upgrades and updates, increased uptime, and improved quality of service. Perhaps most importantly, SOA supports the utilization of orchestration services to dynamically create composite services for the completion of tasks that may not have even been anticipated when the system was created. This is of tremendous benefit to complex dynamic systems, and will be the subject of a later section in this paper. First, it is necessary to clarify what is meant by SOA and define the current taxonomy of related technologies and tools.

627

Page 2: Network and Infrastructure Considerations for Hard and Soft …fusion.isif.org/proceedings/fusion12CD/html/pdf/085_296.pdf · 2014. 10. 2. · needs of complex dynamic distributed

II. SOA CLARIFICATIONS AND TAXONOMY

A. Key Tenets of SOA

The concept of Service-Oriented Architecture (SOA) is clouded by misinformation, folklore, and an alphabet-soup of acronyms. Complex enterprises look to the SOA paradigm for responsiveness, scalability, quality of service (QoS), and cost-effectiveness, but these attributes do not come automatically with the decision to adapt service-oriented methodologies. In fact, SOA can even rapidly propagate incorrect or out-of-context data if not properly designed and configured [10]. This section will clarify the taxonomy of related and intertwined technologies; explain the benefits, drawbacks, and misconceptions of SOA; and present guidelines for avoiding some common SOA pitfalls. The key tenet of SOA is that certain distributed applications are best implemented using a multitude of stateless, loosely coupled software components that each provides functionality in the form of a service [11]. Various mechanisms are then used to allow for the discovery (finding available services), selection (choosing an appropriate service for a task – often autonomously), binding (temporarily linking or coupling between services), and composition (solving complex tasks through the combined outputs of multiple services). While older designs tended to result in a compromise-driven “lessor of evils” approach to coordinating the physical network architecture with the software intended to run on it, SOA allows the software to run optimally as the network dynamically expands, contracts, or reconfigures as a result of changing user needs, resource availability, and system connectivity. An overall information architecture to meet the needs of complex dynamic distributed systems requires a synergistic relationship between the network topology, information pipeline, and software-level tools.

B. Taxonomy of Related Terms

The recent surge in popularity of “cloud computing” has led to increased confusion over terminology describing distributed architectures. The National Institute of Standards and Technology (NIST) definition of cloud computing [12] is rather broad and includes deployment models for private clouds, community clouds, public clouds, and hybrid clouds. It is informative to briefly discuss these models. A private cloud is intended for the sole utilization of a single organization. It may, however, be hosted and/or managed by a third-party. A community cloud is shared between multiple organizations that (generally) are working together toward a shared goal or pursuit. In a public cloud, the cloud service provider offers usage of the cloud to the general public in accordance with whatever terms of service (e.g. pricing, privacy, functionality) the provider decides on in accordance with their business model. This is currently the most common model and includes services such as web-based email (e.g.

Gmail, Yahoo Mail, etc.) and cloud-based storage (e.g. dropbox.com, box.net). The final model included by the NIST definition is the hybrid cloud, in which some combination of private, public, and community clouds are interconnected for a variety of purposes. Connecting multiple clouds in a hybrid configuration can provide the cost and time saving benefits of a third-party cloud provider with the security of an in-house private cloud. However, this configuration requires special attention to data interoperability and providing adequate authentication across domains. Design-time considerations must be made to ensure that data intended for the private cloud does become accessible to less secure areas. Although cloud computing may sometimes appear to encompass SOA, there is an important distinction. While the cloud model describes the end-user experience and nature of how that end-user accesses data and services, SOA offers a well-defined architectural style for the actual design and implementation of such systems. Another related category of network architecture worth discussing is that of grid computing. While both cloud and grid computing are a form of resource virtualization, they do so for a different purpose. While cloud computing emphasizes accessibility to data and services with very little reliance on client-side computation (i.e. “thin” vs. “thick” client), the goal of grid (or the similar cluster) computing is to maximize raw computing capability for highly-intensive tasks that would be impossible or infeasible to perform on a single workstation or server [13]. There is also some confusion regarding the term Enterprise Service Bus (ESB). By its most essential definition, an ESB supports SOA by providing a communication layer between multiple services [14]. ESBs provide a good deal of the flexibility and other advantages that exist in SOA by virtualizing the services that are available to the service requestors. This encapsulation of the underlying details of the service allows rapid reconfiguration and dynamic growing/shrinking of the architecture to occur to meet changing conditions and demands. While ESBs can sometimes deliver API-level access to functionality offered by service providers and a host of other additional features such as intelligent routing of messages and mapping of data, security enforcement, and system monitoring; an ESB may also be much simpler. In the simplest sense, an ESB may be nothing more than an agreed-upon protocol for messaging that will be adhered to by all participants in the system. The balance between complexity and simplicity in ESBs is never a one-size-fits-all solution. Certain vendors offer middleware solutions labeled as ESBs that having varying functionality and applicability to the facilitation of communication between services, which has led to some degree of ambiguity over the term. It should be noted that each of these paradigms and models mentioned above (and summarized in Table 1) are highly

628

Page 3: Network and Infrastructure Considerations for Hard and Soft …fusion.isif.org/proceedings/fusion12CD/html/pdf/085_296.pdf · 2014. 10. 2. · needs of complex dynamic distributed

complimentary to each other, and having a working knowledge of all of them is beneficial to designing/implementing systems in any given one of them.

Table 1. Taxonomy of terms

C. Brief Argument for SOA in Distributed H+S Fusion

While there are drawbacks and challenges to the SOA, there are benefits that far outweigh the downside in a well-designed system. As mentioned earlier, the service-oriented paradigm allows orchestration services to dynamically create composite services for the completion of tasks that may not have even been anticipated when the system was created. Also of great value is the capability to enable communication and collaboration across multiple domains of varying organizations, security levels, physical network topologies, software platforms, and user access requirements. Additionally, the modular nature of an SOA facilitates scalability, survivability, rapid system updates/upgrades, uptime, and quality of service.

In systems that are homogeneous in nature, relatively

constant in demand, and primarily concerned with handling predictable and pre-defined conditions, the costs and added complexity of a SOA/ESB implementation are likely to outweigh the benefits. However, in the environment of distributed hard and soft information fusion, the benefits of such a design are vital to the success of the system. The following section describes the problem at hand in greater detail and presents a high-level information architecture. Details of a specific SOA implementation of this architecture are then given in section V.

III. ADDRESSING THE HARD-SOFT INFORMATION FUSION PROCESS AND ITS ARCHITECTURAL IMPACTS

Experiences in Iraq, Afghanistan, and other places in the

world in dealing with insurgency/counter-insurgency problems have required the (ongoing) formulation of new paradigms of intelligence analysis and dynamic decision-making. Depending on the phases of counter-insurgency (“COIN”) operations [15], the nature of decision-making ranges from conventional military-like to socio-political (“kinetic” to influential). Since automated Information Fusion (IF) processes provide some of the support to such decision-making, requirements for IF process design must address these varying requirements, resulting in considerable challenges in IF process design. Further, these experiences have also shown that some of the key observational and intelligence data in COIN operations comes from dismounted soldiers reporting on their patrol activities; such data, reported in sometimes formatted, sometimes unformatted ways is rather unstructured and involves all the nuances of language and human perceptive/cognitive limitations, and has come to be called “soft” data and information. (The broader definition of “soft” data has not yet settled to an agreed standard but has come to include data from social media such as Twitter, Facebook, blogs, etc.; like data from human observers these data are also not very well calibrated or verified.) However, the input side of these surveillance systems also includes data from the usual repertoire of modern physical, electromechanical sensor systems to include radio frequency (RF) sensors, video and other imaging systems, as well as SIGINT and satellite imagery; these data have come to be called “hard” data. There are distinctive and non-trivial differences in these data types that impact on processing details, architectural options, and the downstream concern of human interpretation of fusion process results. A summary of some of the main distinctions in these data types is shown in Table 2 below:

Table 2: Distinctions in Hard and Soft Data Types

Perhaps the most important impact of this new disparate data environment has to do with the last row of the above table where there are significant differences in what could be called the overall quality of each data type. Traditionally, fusion process designs have exploited the a priori knowledge designers had about the error characteristics of the varied (hard) input data; for example knowledge of calibrated detection probability has been a crucial parameter affecting the

629

Page 4: Network and Infrastructure Considerations for Hard and Soft …fusion.isif.org/proceedings/fusion12CD/html/pdf/085_296.pdf · 2014. 10. 2. · needs of complex dynamic distributed

choice of technique for data association and other fusion operations. Simply put, there is no such data for uncalibrated human observers or the social media sources, so some different way has to be found by which the quality of soft data is assessed and accounted for in designing the various fusion operations.

One initial and important question in developing an architectural approach is to ask: “At what point in the processing flow of such Hard and Soft Data streams do you fuse the data?”; i.e., what is the architectural framework? It is often argued that the data should be joined at the closest point to the source observing mechanism, i.e. where the data are of a “raw” nature. This approach is often advanced on the basis of an information-theoretic argument, claiming that any operation on data loses valuable information. However, there are at least two mitigating factors that prevent the choice of this option. One is that, while there is access to raw data for the hard sensors, such as what could be called the primitive perceptual data of say a radar (I and Q data) or an imaging sensor (blobs of some type in the pixelized data), there is no equivalent raw data access on the soft data side. That is, there is no access to the primitive perceptual and early-cognitive operations in the mind of the human observer. By and large, access on the soft side is at the reported-entity level; humans generally report about “things”. The other mitigating factor is simply technical risk. Knowledge of how to process and manipulate raw data from hard sensors is pretty well-known but engaging in attempts to access and process such raw data during human observation is considered of very high technical risk. (As will be argued and as has been experienced, there is enough difficulty even in processing the entity-level data from human observers.) Thus, as regards this architectural choice, we have chosen to fuse the data at the entity level. As just remarked, this level is inherent to the soft reporting, but this choice imputes a need to process the hard data stream to the entity level, meaning in effect that the hard data are operated on to the point of generating entity-level estimates. This can be done either from a single hard sensor or be the result of multiple-hard-sensor fusion operations to the state estimation level. The entity-level focus is considered quite natural for the domain of intelligence analysis [16], so it seems a good choice of abstraction level as a basis for architectural partitioning as well. On the soft data side, a major difficulty and design choice is defining a robust natural-language-processing (NLP) capability; such methods are also called text extraction methods. The problem is that the realization of a natural language understanding capability of course has been a goal of ongoing research for many years, and still not achieved. On the other hand, automated fusion in software is a garbage-in/garbage-out process and so one hopes to achieve the best possible way to extract rich semantic meaning from the reported linguistic data. In our case we have made an effort to do this with a process that is called “TRACTOR”, that has been reported on in the literature [17-19].

Another major driving factor affecting the nature of the fusion process design and its architecture is the Counterinsurgency (COIN) problem domain being addressed. Historically, fusion processes have been approached with the expectation of the availability of a reliable, deductive dynamic

model of the problem domain; broadly speaking, this leads to an expectation-based approach that compares deductively-derived expectations with fusion-based estimates of reality. However, developing reliable dynamic models of modern insurgency and terrorism-based problem spaces has proven unworkable (eg see [20-22]). This moves the fusion approach to one based on discovery and dynamic-learning in order to evolve the needed situational pictures for decision support. We have adopted the “Thinking-Loop” paradigm of Bodnar ([23, 24] although such ideas can be traced to Russell et al as early as 1993 [25]) as regards the nature of the COIN Intelligence Analysis process, which involves two interleaved cycles of “foraging” and “sensemaking”, as shown in Figure 1.

Figure 1: The Think Loop model of Intelligence

Analysis (adapted from [24])

These ideas give shape to the notions of services on the user-side (or Community of Interest side) of our architectural approach. Our approach is also influenced by various papers that address the nature of knowledge discovery services in a grid environment, such as [26, 27]. As [28] asserts, the mix of the standardized service-based operations of a SOA can be integrated with the often special-purpose processes of a Grid, in part due to advances in web service technology.

IV. EVOLUTION OF THE ARCHITECTURAL FRAMEWORK

Our current baseline architecture involves three evidence-generating and fusion pipelines, one pipeline that services all soft data (multi-human-observer-reporting feeds) and two hard data pipelines that address longer-range hard sensors and close-range hard sensors. Each pipeline includes the basic fusion functions of Common Referencing, Data Association, and State Estimation for the specific pipeline data. Largely because of our team’s organizational structure—three separate university groups—the pipelines are integrated from the sensor-ends to the evidential production ends; it was not practical to separate these entities into standalone type services. Figure 2 shows these pipelines, where Ha and Hb denote hard data flows, and the lower processing thread showing the soft data flows.

630

Page 5: Network and Infrastructure Considerations for Hard and Soft …fusion.isif.org/proceedings/fusion12CD/html/pdf/085_296.pdf · 2014. 10. 2. · needs of complex dynamic distributed

Figure 2: Hard and Soft Evidential and Fusion-based

Pipelines

The preprocessing operations in the hard and soft pipelines are considerably different. Hard sensor preprocessing involves detection and signal processing operations whereas the soft data preprocessing involves natural language processing and text extraction operations. The A-B-C-D labels indicate that there are single and multi-sensor paths possible through these pipelines. These operations result in either or both of the results comprising entity-level evidential-set construction over given time windows (all the data are streaming data) or entity-level fusion-based state estimates related to various entities in the COIN environment (people, vehicles, events, behaviors, etc.). All of these data are accessible via our ESB, and these pipelines comprise our Core Enterprise Services.

On the using-end of the architecture, we have defined the human users as either the Intelligence Cell of a Brigade Combat Team [29] or the staff of a Company Operations Intelligence Support Team (COIST; see [30]). These users are provided three basic Service levels: Foraging Services, Sensemaking Services, and Analytic Support Services. These three Services are expanded into a set of component Services as shown in Figure 3.

For Foraging, which basically provides an agile capability to roam over selected evidential-window data sets, there is again the need for Common Referencing and Data Association operations to join the selected data coherently. Our discovery-oriented services include Graph Matching, Dynamic Social Net Analysis (SNA), and Abductive Reasoning processes to allow multi-paradigm-based mining and learning from the selected evidence. The learned sub-hypotheses need to be composed into an integrated hypothesis that can be examined to see if the estimated current situational state reflects the state defined in the commander’s Priority Intelligence Requirements (PIR’s); if so, an Alert is generated via Alert Services.

Figure 3: Prototype Architecture for Hard-Soft Data

Fusion and Analysis Operations

V. DESIGNING A SOA FOR H+S FUSION SOA is not a one-size-fits-all out-of-the-box solution.

Although the architectural paradigm has certain key tenets and best practices, designing a successful SOA is largely a matter of weighing compromises in complexity, cost, performance, and security. Our design decisions were predicated upon several key system requirements discussed in the previous section. First, due to the very nature of multi-source, multi-sensor fusion that includes human observations, the system requires an efficient and effective means for the representation, transmission, and storage of heterogeneous data. Second, the system must be capable of responding to constantly changing needs and environments -- including multiple reasoning/inferencing methods and the hybrid roles of humans and computer resources in the system. Third, the system must be capable of integration across multiple domains, networks, and software applications – some of which are new and some of which are legacy systems. This section outlines design decisions that were made to accommodate these requirements.

A. Heterogeneous Data Representation For supporting a distributed system that is capable of fusing

data from multiple sensors and human observers while still being able to interoperate with existing systems, we chose to embrace the community standards-driven Extensible Markup Language (XML)-based standards designed by the Open Geospatial Consortium (OGC) as part of their Sensor Web Enablement (SWE) initiative [31]. SWE encompasses several standards for representing sensor and observation data (e.g. SensorML, Transducer Markup Language (TML), and Observations and Measurements (O&M)), as well as methods

631

Page 6: Network and Infrastructure Considerations for Hard and Soft …fusion.isif.org/proceedings/fusion12CD/html/pdf/085_296.pdf · 2014. 10. 2. · needs of complex dynamic distributed

for locating sensors and determining access (Sensor Observation Service (SOS)) and tasking sensors (Sensor Planning Service)). For distributed hard and soft information fusion, TML has been especially useful for its ability to represent not only the actual data, but also metadata describing the intrinsic attributes (e.g. spherical lens distortion) and extrinsic attributes (e.g. weather or other environmental factors). The capability for TML to be transmitted in segments of variable size (i.e. single sensor reading or entire document) is particularly useful in environments of low network bandwidth or intermittent connectivity. In certain instances it can be very useful for the data to be transferred via “sneaker net” using a USB drive or other portable device. We are currently evaluating a newer OGC standard known as Event Pattern Markup Language (EML) [31] as a logical successor to TML.

B. Loose Coupling SOA literature endlessly touts the benefits of loose

coupling. While the capability to loosely couple aspects such as physical connectivity, communication style, control logic, and data store transactions is a key benefit of SOA, each aspect must be carefully weighed. While the SOA “toolbox” contains a multitude of tools for loosely coupling nearly every aspect of the system, it is virtually never a good design decision to use them all. An outline of decision points between loose and tight coupling are provided in Table 3 (from [32]).

Table 3. Loose vs. Tight Coupling Options (from [32])

In general, looser coupling yields improved flexibility, scalability, and fault tolerance [32], while tighter coupling results in improved performance, lower cost, and simpler implementations. We took the approach of simpler where practical, and loosely coupled where necessary. In some cases, a hybrid approach was taken. For example, some communications are synchronous (tight coupling) while others are asynchronous (loose coupling). In certain instances, streaming data from a sensor (typically represented in TML and transmitted over UDP) is only relevant within a very small time window. In real-time tracking, for example, it can be deleterious to system performance to resend “old” UDP packets that are no longer relevant to the current tracking sequence. On the other hand, high-level changes in hypothesis or initiating alerts to a human analyst are sometimes important enough to warrant certain system processes to “sit and wait” until message receipt is confirmed or a response is received.

Another area of particular interest was that of centralized (tight coupling) vs. decentralized (loose coupling) system

control. In a strictly centralized system, there is typically a main server (or servers) that house all service providing, database, and business logic facilities. This offers the benefits of simplicity and relatively high performance (for certain types of operations), but suffers the downsides of reduced scalability, difficult upgrades/updates, and a single point of failure. On the other hand, a strictly decentralized system may have absolutely no central control mechanism. Examples of this include many multi-agent systems (see [33]) and optimizations modeled after swarms or ant colonies occurring in nature (see [34]). Such decentralized systems have been shown to perform very well at certain optimization tasks, but it is typically impractical to maintain large systems in a completely decentralized manner. Our approach was to employ an Enterprise Service Bus (ESB) to impart certain necessary aspects of centralized control with sub-systems that could perform their tasks in a decentralized manner. In some instances, these connections could be performance-optimized by using direct point-to-point (e.g. IP address) level addressing, and in other cases the various sub-systems communicate through ESB mediated addressing. ESB mediated addressing, while adding minor performance overhead, is invaluable for certain aspects of a hybrid-sensing/hybrid-cognition distributed architecture [35].

Another area of consideration was strong (tight coupling) vs. weak (loose coupling) enforcement of data types. In strongly typed systems, it is possible to catch many errors and data integrity issues through the enforcement of observed properties of the data in comparison to expected properties of the data. For example, if a “date” type is expected and the observed value is “13/16/2001”, then an error is detected. In loosely typed systems, the data is typically just treated as a string of characters and accepted as such barring a network or communication-level error. When multiple systems are connected in a SOA, this decision is further complicated by differences in data models between multiple systems.

Harmonization is the SOA technique of mapping data from a data structure in one system to that in another system when no common object model exists. This can be accomplished through either simple mapping rules (for pre-known cases) or via artificial intelligence (AI) techniques in more complex situations or harmonization between types that are not known in advance. While this technique can be very helpful, caution must be used to ensure that the probability of successful mapping is commensurate with the importance of correctness in that case. In large complex systems, faulty harmonization can rapidly propagate erroneous data.

We took an alternative approach to data type enforcement. Rather than devising a strongly-typed object model and enforcing it upon all participants in the system, we chose a hybrid approach that relies on harmonization as well as community standards such as TML (see section above). This approach allows the utilization of accepted community standards (and accompanying XML schema documents for type enforcement) where applicable, and adds the flexibility to integrate with existing systems that may have been created before these standards were put into effect or by designers without knowledge of the standards.

632

Page 7: Network and Infrastructure Considerations for Hard and Soft …fusion.isif.org/proceedings/fusion12CD/html/pdf/085_296.pdf · 2014. 10. 2. · needs of complex dynamic distributed

C. Other SOA Tools There are many elements of our SOA implementation that

are beyond the scope of this paper. Some of these include the value of composed services (see figure 3 above), business process management (BPM) tools such as Business Process Execution Language for Web Services (BPEL4WS) [36], a host of available security and authentication options [37], and a very helpful and active open-source community. Additionally, the SOA-related field of Complex Event Processing (CEP) holds a great deal of promise in this arena. CEP is a technology for detecting higher-level events by organizing “clouds” of low-level events into a structured event hierarchy in real time. This is accomplished through the definition and detection of event patterns and modeling the relationship (causality, etc.) of events that occur within these patterns [38]. This field holds untapped potential for “JDL Level 3” [39] data fusion advancements.

VI. CONCLUSION There is currently a paradigm shift from conventional

sensors observing physical objects to a combination of human observers and “hard” sensors both observing physical objects and also performing the far more nuanced task of assessing situations and threats, determining how events and entities are related, and predicting future events. Accomplishing this task requires a shift in emphasis to human-centric and highly distributed processing, and the resulting systems become highly complex.

Figure 4: A service-oriented approach to data fusion

(from [35])

The SOA field presents a rich set of techniques, best

practices, and tools for creating and managing very large and complex systems. While many of these resources were initially developed for managing the complexities and demands of large business enterprises, they are ideally suited for addressing the complexities of distributed hard and soft information fusion. As detailed in this paper (and in the author’s previous work in [35] – see Figure 4), there is a direct mapping between the needs of distributed human-centric data

fusion and the capabilities provided by the service-oriented architecture paradigm.

ACKNOWLEDGMENT We gratefully acknowledge that this research activity has

been supported in part by a Multidisciplinary University Research Initiative (MURI) grant (Number W911NF-09-1-0392) for “Unified Research on Network-based Hard/Soft Information Fusion”, issued by the US Army Research Office (ARO) under the program management of Dr. John Lavery.

REFERENCES [1] D. L. Hall, M. Liggins, Chee-Yee Chong, and J. Llinas, Distributed Data Fusion for Network Operations, CRC Press, in preparation, 2012 [2] D. Hall and J. Jordan, Human-Centric Information Fusion, Artech House, 2010 [3] J. Llinas, “Network Centric Concepts: Impacts to Distributed Fusion System Design”, Chapter 3 in D. L. Hall, M. Liggins, C. Chong and J. Llinas, (2012), Distributed Data Fusion for Network Operations, CRC Press, in preparation, 2012 [4] A. Bisantz and J. Pfautz, “Human Engineering Factors in Distributed and Net-centric Fusion Systems,” Chapter 16 in D. L. Hall, M. Liggins, C. Chong and J. Llinas, (2012), Distributed Data Fusion for Network Operations, CRC Press, in preparation, 2012 [5] J. Graham, J. Rimland and D. Hall, “A COIN-inspired Synthetic Data Set for Qualitative Evaluation of Hard and Soft Fusion Systems”, Proceedings of the 14th International Conference on Information Fusion, Chicago, IL, July, 2011 [6] J. Graham, D. Hall and J. Rimland, “A New Synthetic Dataset for Evaluating Hard and Soft Fusion Algorithms”, Proceedings of the SPIE Defense, Security and Sensing Symposium, 25-29 April, 2011, Orlando, Fl [7] D. L. Hall and J. Llinas (Eds.), Handbook of Multisensor Data Fusion, CRC Press, 2001 [8] L. A. Zadeh, Fuzzy Logic = Computing with Words, Fuzzy Systems, IEEE Transactions on, 4(2), 103-111, 1996 [9] D. L. Hall and J. M. Jordan, Human-Centered Information Fusion, Artech House, Norwood, MA, 2010 [10] N. A. Fishman, Viral Data in SOA: An Enterprise Pandemic, IBM Press, 2009 [11] D. Georgakopoulos, and M.P. Papazoglou, Service-Oriented Computing, The MIT Press, 2008 [12] P. Mell and T. Grance, The NIST Definition of Cloud Computing (draft). NIST Special Publication, 800, 145, 2011 [13] T. Dillon, C. Wu, and E. Chang, Cloud Computing: Issues and Challenges, In Advanced information networking and applications (AINA), 2010 24th IEEE international conference on (pp. 27-33), 2010 [14] M.T. Schmidt, B. Hutchison, P. Lambros, and R. Phippen, The Enterprise Service Bus: Making Service-oriented Architecture Real, IBM Systems Journal, 44(4), 781-797, 2005 [15] U.S. Department of the Army. Counterinsurgency: Field Manual 3-24. Washington, DC: http://www.usgcoin.org/library/doctrine/COIN-FM3-24.pdf , 2006, p1-15

633

Page 8: Network and Infrastructure Considerations for Hard and Soft …fusion.isif.org/proceedings/fusion12CD/html/pdf/085_296.pdf · 2014. 10. 2. · needs of complex dynamic distributed

[16] E.A. Bier, S.K. Card, and J.W. Bodnar, Entity-Based Collaboration Tools for Intelligence Analysis, IEEE Symposium on Visual Analytics Science and Technology, Oct., 2008. [17] M. Prentice, M. Kandefer, and S.C.Shapiro, Tractor: A Framework for Soft Information Fusion, Proceedings of the 13th International Conference on Information Fusion (Fusion2010), July 2010 [18] M. Kandefer and S.C. Shapiro, Evaluating Spreading Activation for Soft Information Fusion, Proceedings of the 14th International Conference on Information Fusion (Fusion 2011), July 2011 [19] M. Prentice and S. C. Shapiro, Using Propositional Graphs for Soft Information Fusion, Proceedings of the 14th International Conference on Information Fusion (Fusion 2011), 2011 [20] G.L. Zacharias, J. MacMillan, and S.B.Van Hemel, Editors, Committee on Organizational Modeling from Individuals to Societies, “Behavioral Modeling and Simulation: From Individuals to Societies”, National Research Council, 2008 [21] G. Ackerman, (Principal Investigator; WMD Terrorism Project, Chemical and Biological Nonproliferation Program, Center for Nonproliferation Studies, Monterey Institute of International Studies, CA), Literature Review of Existing Terrorist Behavior Modeling, Final Report to the Defense Threat Reduction Agency, August 2002 [22] A. Kott, (Ed.), Information Warfare and Organizational Decision-Making, Artech House, Norwood, MA, 2007 [23] J.W. Bodnar, Warning Analysis for the Information Age: Rethinking the Intelligence Process, Joint Military Intelligence College, Washington, DC, 2003 [24] J.W. Bodnar, Making Sense of Massive Data by Hypothesis Testing, In Proceedings of the 2005 International Conference on Intelligence Analysis, 2005 [25] D. M. Russell, M. J. Stefik, P. Pirolli, and S. K. Card, The cost structure of sensemaking, Paper presented at the INTERCHI '93 Conference on Human Factors in Computing Systems, Amsterdam 1993 [26] M. Cannataro and D. Talia, Knowledge Grid: An Architecture for Distributed Knowledge Discovery, Communication of the ACM, Vol.46 No.1, pp89-93, 2003 [27] D. Talia, Knowledge Discovery Services and Tools on Grids, in Foundations of Intelligent Systems, Lecture Notes in Computer Science, Volume 2871/2003, 2003 [28] L. Srinivasan and J. Treadwell, An Overview of Service-oriented Architecture, Web Services, and Grid Computing, HP Software Global Business Unit Report, Nov. 2005 (citeseerx.ist.psu.edu) [29] Headquarters, Dept. of the Army, Stryker Brigade Combat Team, Field Manual No. 3-21.31, March 2003 [30] Center for Army Lessons Learned, Company Intelligence Support Team Handbook, Jan 2010 [31] A. Bröring, J. Echterhoff, S. Jirka, I. Simonis, T. Everding, C. Stasch, S. Liang, and R. Lemmens. "New Generation Sensor Web Enablement." Sensors (Basel) 11, no. 3 (2011): doi:10.3390/s110302652 [32] N. Josuttis, SOA in Practice (First ed.). O'Reilly, 2007 [33] J.C. Rimland and D.L. Hall, A Multi-agent Infrastructure for Hard and Soft Information Fusion . In SPIE proceedings. Fl, USA, 2011 [34] M. Dorigo, M. Birattari, and T. Stutzle, Ant Colony Optimization. Computational Intelligence Magazine, IEEE, 1(4), 28-39, 2006

[35] J. Rimland, “Service Oriented Architecture for Human Centric Information Fusion” Chapter 13 in D. L. Hall, M. Liggins, C. Chong and J. Llinas, Distributed Data Fusion for Network Operations, CRC Press, in preparation, 2012 [36] T. Gardner, "UML Modelling of Automated Business Processes with a Mapping to BPEL4WS." Orientation and Web Services (2003): 30 [37] DTIC Document, “Building Multilevel Secure Web Services-Based Components for the Global Information Grid”. 2006. Print [38] W. Y. Chang, Network-Centric service oriented enterprise. Springer-Verlag New York Inc, 430-431, 2007 [39] D.L Hall and J. Llinas, An Introduction to Multisensor Data Fusion. Proceedings of the IEEE 1997;85(1):6-23.

634