[IEEE 2008 IEEE International Conference on Electro/Information Technology (EIT 2008) - Ames, IA,...

Abstract—The focus of this research was to design a framework to create highly autonomous fault-tolerant distributed sensor networks with plug-and-play capabilities. This would enable diagnosis of faulty sensors and reconfiguration of the network in real time to ensure that the control of the manufacturing process can continue with accurate information in presence of sensor and processing element faults. The strategy is based on the recently approved IEEE 1451 family of standards. The innovative feature of the proposed effort is the IEEE 1451-based plug-and-play architecture that could lead to the development of a new member in the IEEE 1451 family of standards that will address reliability issues of the sensor networks.

Index Terms— intelligent sensor networks, process control, fault diagnosis, fault detection, reliability, fault-tolerance, self-diagnosable, smart sensors, IEEE 1451 family of standards.

1. INTRODUCTION

Today’s competitive global economy requires continued advances in manufacturing technology to achieve and maintain cost advantages. Critical to meeting this challenge is the quality of information that can be extracted from sensor measurements. Thus sensor reliability has a direct impact on process yields, throughput, and product performance. Faulty sensors need to be identified in real time to ensure that the manufacturing process does not continue with incorrect process settings, and the sensing process should be reconfigured so that control can continue with accurate information. Currently sensor reliability issues are not considered when the systems are designed and developed

Bharat Joshi and Reshmi Mitra are with the Department of Electrical and Computer Engineering, University of North Carolina at Charlotte, NC 28223, USA. *Corresponding author – [email protected]

Tyrone Vincent is with the Division of Engineering, Colorado School of Mines, Golden, CO 80401, USA.

Jinran Chen and Arun Somani are with the Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011.

Nicholas Gomez is with ITN Energy Systems, Inc., Littleton, CO 80127, USA.

This project was funded by NSF SBIR Grant No. DMI-0512949

although the reliability of the information collected and interpreted is highly dependent on the sensors used and sensor networks employed. In fact, most of the existing schemes used in industry for fault detection are mainly confined to simple strategies for predetermined small fault-sets. For example, reliability of sensors is a critical issue in thin-film deposition systems where they are typically exposed to harsh chemical and thermal environments. It is not uncommon to experience high rate of sensor failures in thin-film deposition systems. Furthermore, failure of sensors during a production run typically results in aborting the run due to unavailability of reliable information for monitoring and feedback control purposes. Although no formal study has been conducted to quantify production losses in thin-film industry the number is expected to be in the high millions.

Furthermore, complex production systems use a wide range of sensors and instruments that are procured from various vendors. It is well recognized that a large number of man-hours are spent on integrating these sensors into the systems. On numerous occasions the vendor interfaces are not compatible and the end-user has to develop short-term software patches to make everything work. Thus industry has begun to recognize the importance of IEEE 1451 family of standards [9] in the sensor networks for manufacturing.

The main contribution of this research is the unique and flexible fault diagnosis and reconfiguration framework within the IEEE 1451-based architecture to create a distributed fault-tolerant sensor networks. This ensures the plug-and-play capability by which a user can integrate modules into a system in minimal time and enable fault-tolerant capabilities. This could potentially lead to a new member in the IEEE 1451 family of standards that will address reliability issues of the sensor networks.

The paper is organized as follows. In Sections 2 hierarchical fault-tolerant sensor networks strategy is presented while in Section 3 details of potential incorporation of plug-and-play for reliability in IEEE 1451 family of standards is discussed. Finally, the conclusions are drawn in Section 4.

2. HIERACHICAL FAULT-TOLERANCE In all the existing fault tolerant schemes sensor diagnosis is performed based on the assumption that the processing elements are fault-free. This could lead to an incorrect and

Hierarchical Plug-and-Play Self-Diagnosable Intelligent Sensor Networks for Process Control

Bharat Joshi*, Senior Member IEEE, Tyrone Vincent, Member IEEE, Jinran Chen, Arun Somani, Fellow IEEE, Nicholas Gomez, and Reshmi Mitra

246978-1-4244-2030-8/08/$25.00 ©2008 IEEE.

undesirable diagnosis when the processing element is faulty. For example, when a processing element and the associated sensor are both faulty then there is a non-zero probability that the faulty sensor would be identified as healthy. This could potentially lead to catastrophic results. In this paper the problem is addressed by performing diagnosis at two levels –system level at which health of the processing elements and communication channels is diagnosed and subsystem level where health of all the sensors is determined. This avoids a scenario where a faulty sensor is identified as healthy. A faulty processing element could possibly identify a healthy sensor as faulty. However this diagnosis is better than a faulty sensor identified as healthy.

2.1 Sensor Level Fault Diagnosis The principle idea underlying the fault detection algorithm presented here is that the ability to develop simple models of sensor data from the various phases of a particular process autonomously will aid in the deployment of sub-system fault detection in a network setting [1-5]. The modeling concepts utilized are based on standard statistical methods and are thus well established, however, for use in a sensor network, there is a need to develop computationally efficient modules that can run with little human input.

It is assumed that the process runs in distinct phases that can be determined by actuator values, such as power levels or valve positions. For each phase, a linear data model based on Principal Component Analysis (PCA) is developed, utilizing cross-validation to determine the appropriate model complexity. The model is of the form of a matrix P that defines the principle directions for correlations among the data. Once models have been sufficiently developed for all stages and processes, new measurements can be evaluated and checked for errors. If no fault could be detected, the data is used for an ongoing update of the model. It is assumed that the user selects the appropriate groups of sensors for whichcorrelations exist that can be utilized for fault detection. For each group, the model development will take place on a defined network capable application processor (NCAP) processor.

2.2 System-Level Fault Diagnosis At the system level, diagnose of the processor and the communication links has to be performed. Fault management and diagnosis in today’s heterogeneous, large, and complex networks is a problem for network administration. There are two main approaches for management of large networks, depending on the level of monitoring, for example, [13, 16-18]. Passive network monitoring is done by using devices directly connected to the network media. For active monitoring, ad-hoc techniques such as ping and traceroute are used. Both these programs provide an instantaneous rather then overall view of the network. Problems like overloading of a node, interface status, and other statistics are ignored by these schemes. Alternatively, periodical checking for the status of the nodes can be employed in the network with periodicity varying across implementations. This type of monitoring involves communication between the network

manager and the managed network devices. There are two main techniques available for this type of monitoring.

1. Event based management 2. Polling based management

Figure 1: Health check protocol

In this paper Health Check protocol (HCK) is used for the system level fault diagnosis (Figure 1). This is a polling based protocol and uses UDP for data communication. Here a process on the agent, referred to as health check protocol generates these periodic requests internally. The agent responds to these requests and the HCK process sends out a message to the client. This reduces the communication overhead from client to agent within the current implementations using the SNMP framework. NMS (network management station) is a client which receives the HCK messages from the network and processes them.

Complete fault diagnosis process is given in Figure 2. The network includes multiple cluster heads, H connected to the NMS through gateways, G1 to G4. The network graph is created using the routing information received using routing message. In traditional NMS implementations where packet receipt is used as the indication of the node status, if gateway (Cluster head) G2 goes down, due to some fault, the cluster heads connected to the external network through this gateway lose connectivity. The request/response to /from these hosts are not received by the NMS and the hosts in this cluster are marked as faulty as shown in the Figure 3 (b).

Figure 2: Basic fault diagnosis process

3. PLUG-AND-PLAY SENSOR NETWORKS The advent of smart sensors and the development of IEEE 1451 family of standards offer a realistic potential of designing highly autonomous fault-tolerant distributed sensor networks with plug-and-play capabilities. The IEEE 1451.2 specification [8] defines a smart sensor as a sensor “that

247

provides functions beyond those necessary for generating a correct representation of a sensed or controlled quantity. This function typically simplifies the integration of the transducer into applications in a networked environment.” Furthermore, IEEE 1451.1 specifies the network capable application processor (NCAP) information model. Advances in technologies have also made it possible to integrate microcontrollers with the sensors. Under these conditions it is fair to assume a scenario where a distributed sensor network may consist of thousands of “plug-and-play” sensors where a sensor or a group of sensors are connected to a microcontroller or a processor. Thus, the power of the microcontrollers/processors can be exploited to diagnose both the health of each sensor in the network and the health of the complete distributed network.

The IEEE 1451 provides a set of common interfaces for connecting transducers (sensors and actuators) to existing instrumentation and control networks. IEEE 1451.1-1999 Standard defines the behaviors of a smart transducer using object model approach and the path for network connectivity. The sensor usage crosses various industries are divided into IEEE 1451.2, 1451.3, and 1451.4 to meet their specific needs. The first one, focused on an interface for transducers with lower signal bandwidth requirements, has been completed and designated as the IEEE 1451.2-1997 Standard.

The IEEE 1451.3, a Smart Transducer Interface for Sensors and Actuators - Digital Communication and Transducer Electronic Data Sheet (TEDS) Formats for Distributed Multidrop Systems defines the utilization of spread-spectrum modulation techniques to allow the following functions to be performed over a single cable:

1. synchronizing data acquisition for an array of sensors;

2. communicating simultaneously with an array of transducer bus interface modules (TBIM);

3. providing power for operation of all transducers on the bus and their associated electronics.

The context for the IEEE 1451.4-2004 transducer and interface is shown in Figure 3.

The IEEE 1451.4 Transducer is a sensor or actuator with typically one addressable device, or a node, containing TEDS. The digital communication can be used to read the TEDS information and to configure an IEEE 1451.4 Transducer. Multiple IEEE 1451.4 Transducers with switch nodes can be connected in a multi-drop configuration with maximum one of these Transducers in “active” mode and the rest in “passive” mode. The switch nodes can be used to change the functional mode of each IEEE 1451.4 Transducer.

IEEE 1451.4 Transducer may be used to sense or control multiple physical phenomena. Each phenomenon sensed or controlled shall be associated with a node. If there are more than one node included in a Transducer, one of the nodes will be designated to hold the Node-list with a memory block. An IEEE 1451.4 Transducer should not have more than one Node-list. More than one IEEE 1451.4 Transducers can be

connected to an IEEE 1451.4 MMI if they are capable of being in a passive mode.

The TEDS residing in the IEEE 1451.4 Transducer, provides self-describing capability. The TEDS contains fields that describe the type, operation, and attributes of one or more transducer elements (sensors or actuators). By requiring that the TEDS be physically associated with the IEEE 1451.4 Transducer, the resulting hardware partition encapsulates the measurement aspects in an IEEE 1451.4 Transducer, while the application-related aspects can reside in the NCAP or alternatively be stored in the TEDS.

An IEEE 1451.4 protocol is used to separate the time critical part of the communication of the IEEE 1451.4 interface from the T-block. The T-block object located in the NCAP handles the interpretation of the TEDS data for the end user. Further processing of the data may take place both in the NCAP and in other processors in larger systems. The NCAP includes an IEEE 1451.1 object model with an IEEE 1451.4 T-block.

IEEE 1451.4 addresses analog sensors with respect to their existing wiring and the requirement for wide bandwidth analog measurements. IEEE 1451.4 will allow analog-output, mixed-mode transducers to communicate digital information with a high-level IEEE 1451 object. To fit into the digital network defined by other 1451 standards, the bi-directional digital communication of self-identification, test, and programmable signal conditioning functionality is being defined with an eye toward simplicity and low cost. 1451.4 will provide compatibility with legacy systems and transition path to 1451. The standards embody an extensive effort to provide sufficient detail to achieve interoperability, while allowing flexibility for manufacturers of components, subsystems, and systems.

NCAP

NCAP

NETW

ORK

TEDS

TEDS

TEDS

TEDS

Network Layer Fault Detection Interface Layer

Sensors & Actuators

Network Layer System Fault Detection and/or system reconfiguration

Figure 3: Subsystem and system fault detection with Reconfiguration.

248

IEEE 1451.4-2000 system architecture comprises of IEEE 1451.4 NCAP (network capable application processor for network interfacing). The NCAP is a device that supports a network interface, application functionality, and access to the physical world via one or more transducers. The transducer-independent interface (TII) is a 10-wire digital communication interface that allows an NCAP or host to obtain sensor readings or control actuator actions as well as request TEDS data.

The layered architecture of NCAP has network layer, function layer and interface layer. These are sufficient enough to achieve interoperability, while allowing flexibility for manufacturers of components, subsystems, and systems. However, to make this interface smart, we intend to work on the fault tolerance issues related to MMI interface and network layer. We propose to develop fault detection (FD) layer for the NCAP architecture. The FD layer should be able to detect failures in transmission (network layer), sensor/actuator (transducer), and MMI Interface (interface layer).

Faults in the sensors are first identified. This involves the study of a wide range of sensors with different working conditions. In certain cases the deployment of the sensors is affected by the environment. The sensors can fail due to many reasons including, power failure, sensor wakes up and need to start sensing but fails. In this case it has enough power but unable to function. It may fail due to adverse environment conditions. In all these cases we need to formulate the mechanism to detect a failed node in a short time.

4. CONCLUSIONS AND FUTURE WORK In this paper unique and flexible fault diagnosis and reconfiguration strategies within the IEEE 1451-based architecture to create a distributed fault-tolerant sensor networks are proposed. This ensures the plug-and-play capability by which a user can integrate modules into a system in minimal time and enable fault-tolerant capabilities. This could potentially lead to a new member in the IEEE 1451 family of standards that will address reliability issues of the sensor networks.

Integration of the sensor and system level fault diagnosis strategies described above could yield enormous benefits for the hardware used in harsh industrial environments. As a rather large number of parameters have to be monitored, slight deviations in one of them may not be noticed in time. With the use of principal component analysis, a possible sensor fault could be detected by monitoring one indicator. This would enable the operator of the machine to react in a timely manner, preventing negative effects on the quality of the final product. A recovery process can then be designed to make the product fault tolerant. These algorithms are easily portable to any system that has basic communication and processing capabilities. Based on this work future activities will involve implementing the strategies on a production system.

5. REFERENCES [1] Chow, E. and Willsky, A., “Analytical Redundancy and

the Design of Robust Failure Detection Systems,” IEEE Transactions on Automatic Control, vol. 29, 1984, pp. 603-614.

[2] Dunia, R. and Qin, S., “A Subspace Approach to Multidimensional Fault identification and Reconstruction,” AIChE Journal, vol. 44, 1998, pp. 1813-1831.

[3] Dunia, R. and Qin, S., “A Unified Geometric Approach to Process and Fault Identification and Reconstruction: The Unidimensional Fault Case,” Comp. Chem. Eng., vol. 22, 1998, pp. 927-943.

[4] Frank, P., and Ding, X., “Survey of Robust Residual Generation and Evaluation Methods in Observer-Based Fault Detection Methods,” J. Proc. Cont., vol. 7, no. 6, 1997, pp. 403-424.

[5] Fretheim, T., Vincent, T., and Shoureshi, R., “Optimization Based Fault Detection for Nonlinear Systems,” Proc. American Control Conference, pp. 1747-1752, 2001.

[6] Heinzelman, W., Chandrakasan, A., and Balakrishnan, H., “Energy- Efficient Communication Protocols for Wireless Microsensor Networks,” in Proceedings of Hawaiian International Conference on Systems Science, Hawaii, US, January 2000.

[7] IEEE 1451.2 IEEE Standard for Smart Transducer Interface for Sensors and Actuators Transducer to Microprocessor Communication Protocol and Transducer Electronic Data Sheet.

[8] Intanagonwiwat, C., Govindan, R., and Estrin, D., “Directed Diffusion A scalable and Robust Communication Paradigm for Sensor Networks,” Proceedings for ACM Mobicom 2000, Boston, MA, 2000, pp. 56-67.

[9] Isermann, R., “Process Fault Detection Based on Modeling and Estimation Methods – A Survey,” Automatica, vol. 20, no. 4, 1984, pp. 387-404.

[10] Isermann, R. and Ballé, P. “Trends in the Application of Model-Based Fault Detection and Diagnosis of Technical Processes.” Control Engineering Practice 5(5), 1997, pp. 709-719.

[11] Johnson, B., Design and Analysis of Fault Tolerant Digital Systems, Addison Wesley, 1989.

[12] Joshi, B., Gomez, N., Chaffin, M., Weideman, S., Beck, M., Muha, J., and Britt, J., “Real-Time Fault Diagnosis of Sensors for Fault-Tolerant Control of Thin-Film (CIGS) Photovoltaic Process,” Proceedings of the MRS 2003 Spring Meeting, San Francisco, CA, April 21-25, 2003.

[13] Russel, E., Chiang, L., Braatz, R., “Data-driven Techniques for Fault Detection and Diagnosis in Chemical Processes,” Springer Verlag London, 2000.

[14] Somani, A., Agarwal, V., and Avis, D., “A Generalized Theory for System-Level Diagnosis,” IEEE Transactions on Computers, Vol. c-36, no. 5, May 1987, pp. 538-546.

[15] Somani, A. and Agarwal, V., “Distributed Diagnosis Algorithms for Regular Interconnected Structures,” IEEE transactions on Computers, Vol. 41, No. 7, July 1992, pp. 899-906.

249

[IEEE 2008 IEEE International Conference on Electro/Information Technology (EIT 2008) - Ames, IA,...

Documents

Transcript of [IEEE 2008 IEEE International Conference on Electro/Information Technology (EIT 2008) - Ames, IA,...