Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2...

21
Technology Assessment Report PUBLIC/CONFIDENTIAL 1 www.opendai.eu Project Acronym: Open-DAI Grant Agreement number: 297362 Project Title: Opening Data Architectures and Infrastructures of European Public Administrations Work Package: System and Architecture specification Deliverable Number: D2.1 Revision History Revision Date Author Organisation Description 30/3/2012 Luca Gioppo Mats Jonsson Federico Cairo Caner Tosunoglu Giuseppe Futia Marta Palanques CSI-Piemonte Netport Politecnico di Torino Sampas Politecnico di Torino BDigital Release 25/2/2013 Petra C Arrenäs NetPort.Karlshamn Legal Discaimer Copyright 2012 by CSI-Piemonte, BDIGITAL, SAMPAS, Netport, Regione Piemonte, Karlsham Municipality, Ordu Municipality, Barcelona Municipality, Lleida Municipality, Politecnico di Torino, DIGITPA. The information in this document is proprietary to the following Open-DAI consortium members: CSIPiemonte, BDIGITAL, SAMPAS, Netport, Regione Piemonte, Karlsham Kommun, Ordu Municipality, Barcelona Municipality, Lleida Municipality, Politecnico di Torino, DIGITPA. This document contains preliminary information and it is available under the term of the following license: The Open-DAI Data Assessment and Specification Report by Open-DAI Project is licensed under a Creative Commons Attribution 3.0 Unported License. Statement of originality: This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both.

Transcript of Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2...

Page 1: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 1 www.opendai.eu

Project Acronym: Open-DAI Grant Agreement number: 297362 Project Title: Opening Data Architectures and Infrastructures of European Public Administrations Work Package: System and Architecture specification

Deliverable Number: D2.1 Revision History Revision Date Author Organisation Description 30/3/2012 Luca Gioppo

Mats Jonsson Federico Cairo Caner Tosunoglu Giuseppe Futia Marta Palanques

CSI-Piemonte Netport Politecnico di Torino Sampas Politecnico di Torino BDigital

Release

25/2/2013 Petra C Arrenäs NetPort.Karlshamn Legal Discaimer Copyright 2012 by CSI-Piemonte, BDIGITAL, SAMPAS, Netport, Regione Piemonte, Karlsham Municipality, Ordu Municipality, Barcelona Municipality, Lleida Municipality, Politecnico di Torino, DIGITPA.

The information in this document is proprietary to the following Open-DAI consortium members: CSIPiemonte, BDIGITAL, SAMPAS, Netport, Regione Piemonte, Karlsham Kommun, Ordu Municipality, Barcelona Municipality, Lleida Municipality, Politecnico di Torino, DIGITPA.

This document contains preliminary information and it is available under the term of the following license:

The Open-DAI Data Assessment and Specification Report by Open-DAI Project is licensed under a

Creative Commons Attribution 3.0 Unported License. Statement of originality: This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both.

Page 2: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 2 www.opendai.eu

Table of contents 1 Summary...............................................................................................................................................3 2 Reference framework for technology assessment ................................................................................3

2.1 Partner technology framework ........................................................................................................3 2.1.1 Open-DAI technology assessment .........................................................................................3

2.2 Component technology framework..................................................................................................6 2.2.1 Open Source...........................................................................................................................6

3 Partner technology assessment............................................................................................................7 3.1 Network assessment .......................................................................................................................7

3.1.1 CSI-Piemonte - SAMPAS .......................................................................................................7 3.1.2 CSI-Piemonte - Netport ..........................................................................................................8 3.1.3 CSI-Piemonte - Lleida.............................................................................................................8 3.1.4 SAMPAS – Karlshamn..........................................................................................................10

3.2 Cloud infrastructure assessment...................................................................................................11 3.3 Legacy assessment.......................................................................................................................11

3.3.1 CSI-Piemonte legacy infrastructure ......................................................................................11 3.3.2 Lleida legacy infrastructure...................................................................................................11 3.3.3 Ordu legacy infrastructure ....................................................................................................12 3.3.4 Karlshamn legacy infrastructure ...........................................................................................13

4 Component technology assessment ...................................................................................................13 4.1 Cloud components ........................................................................................................................13

4.1.1 Puppet ..................................................................................................................................15 4.1.2 Marionette Collective ............................................................................................................15 4.1.3 Puppet Dashboard and Foreman .........................................................................................16

4.2 Middleware components ...............................................................................................................16 4.2.1 SOA components..................................................................................................................17 4.2.2 TEIID vs WSO2 Data service server ....................................................................................17 4.2.3 D2RQ....................................................................................................................................20

Page 3: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 3 www.opendai.eu

1 Summary The goal of WP2 is twofold:

1. Assessing the technology present within partner’s legacy data centre 2. Assessing which technology components are usable in the project to finalize project adoption

2 Reference framework for technology assessment In this chapter will be described the framework used by the project to assess different aspects of the technology

2.1 Partner technology framework Open-DAI project has many possible critical issues due to innovative cloud approach, connection to legacy environment, WAN access and distributed deployment, converging development of four different PA’s service providers on a common middleware infrastructure and the adoption of a virtualized EII solution. To anticipate and manage possible technical issues the project has decided to assess some key indicators that represent the main possible obstacles to the architectural implementation.

2.1.1 Open-DAI technology assessment To collect information about technology in the partner environment the project defined and distributed to partners that managed IT services of the legacy data a check-list aimed at collecting fundamental information on the hardware infrastructure. Target of the assessment was:

• Understanding if the new load coming from the project could cause performance loss on the legacy application

• Understanding if the network connection between the cloud sites and the legacy data centre was adequate to the needs of the virtualization platform

• Understanding if the network connection between the cloud nodes are adequate to allow the cloud management layer to perform correctly

• Assessing the feasibility of the secure connection between all the nodes involved in the project All these elements were needed to design the virtualization platform accordingly especially the caching components.

2.1.1.1 Checklist For each pilot site a checklist has been collected to get information needed to assess legacy infrastructure. Part of this data is also of use for WP3 deliverables, but in this context the interest is on the hardware and technical aspects. In particular, apart from name and description:

Legacy DB engine (with version)

This is important because TEIID tool has a large number of connectors towards other DB engines, but the project needs to assess the exact engine and version to check availability of the right connector

Legacy DB hardware

This is important since the TEIID virtual Db will behave as a new client on the legacy DB and is important that the hardware delivering the legacy DB engine is capable of accepting new load (also as for now is not know the amount of load that the tool will bring on the legacy infrastructure and is thus important to understand scalability of legacy solution). Possible workaround on slim legacy hardware will be the use of big caching solution on the virtual DB side.

Page 4: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 4 www.opendai.eu

Legacy DB reachable through VPN ?

This is important because if there is not the possibility of getting access to the legacy infrastructure the project will not be able to implement the pilot. A workaround will be to create ETL towards accessible hardware, possibly in the legacy data centre to demonstrate to the PA the functionality of the architecture and possibly overcome in time the issues raised.

2.1.1.2 Network assessment For network performance assessment the project used the iPerf tool. Iperf is a commonly used network testing tool that can create TCP and UDP data streams and measure the throughput of a network that is carrying them. Iperf is a tool for network performance measurement written in C++. Iperf allows the user to set various parameters that can be used for testing a network, or alternately for optimizing or tuning a network. Iperf has a client and server functionality, and can measure the throughput between the two ends, either unidirectional or bi-directional. It is open source software and runs on various platforms including Linux, Unix and Windows. UDP: When used for testing UDP capacity, Iperf allows the user to specify the datagram size and

provides results for the datagram throughput and the packet loss. TCP: When used for testing TCP capacity, Iperf measures the throughput of the payload. One thing to

note is that Iperf uses 1024*1024 for megabytes and 1000*1000 for megabits. Typical Iperf output contains a time stamped report of the amount of data transferred and the throughput measured. Iperf is significant as it is a cross-platform tool that can be run over any network and output standardized performance measurements. Thus it can be used for comparison of wired and wireless networking equipment and technologies in an unbiased way. Since it is also open source, the measurement methodology can be scrutinized by the user as well. The iPerf server was hosted on the cloud nodes (CSI-Piemonte and SAMPAS) and it was measured the network connection between each node and the legacy data centre. It was measured the UDP data streams since the result obtained with an UDP data stream can be considered more conservative than an TCP one. The target was to assess the data throughput between each connections.

2.1.1.3 Secure connection The project will require two types of secure connections between partners:

• The connection between the Cloud sites to allow the management network of the cloud infrastructure to access all the nodes and control all the virtual machines. This is described in Figure 1

• The connection between the private cloud of each partner and the data centre to allow the virtualization platform to access the legacy DB as shown in Figure 2

Page 5: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 5 www.opendai.eu

Figure 1 - VPN for cloud management network

Page 6: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 6 www.opendai.eu

Figure 2 - VPN through private cloud and legacy environment

2.2 Component technology framework The project delineated in the proposal suggested the adoption of a set of components to implement the technical infrastructure. There is the need to assess the pilot usage of the components to be sure that nothing is missing and that all the pilots need are taken into account. Figure 2 shows also the architectural diagram of the project and which tools and how they will be used. The complete description of the architectural solution will be described in the D2.2 deliverable of the project, in this document the focus is in the assessment of the technology and the definition of the list of software pieces that will be used and why.

2.2.1 Open Source The spirit of Openness of the project calls for the adoption of “Open Source” components, for various, reasons herein listed:

• OSS components allow for broad adoption in PA even in low budget economical moment • OSS components allow for easier integration needs that often emerge • OSS components integration can be shared as project deliverables contributing to a broader

return of investment

Page 7: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 7 www.opendai.eu

• OSS components are immediately re-usable by PA without any constraints of public tender or licensing procurement

Adopting OSS tools call for a careful selection for it is a rapidly evolving ecosystem and the project needs to be able to propose a model able to last well beyond the scope of the project itself

3 Partner technology assessment Barcelona (Spain) is excluded from this section. This is due to the fact that the city council of Barcelona has data integration infrastructure and an Open Data portal in place. As a result, the pilot in Barcelona will not deal with a legacy environment, but data will be retrieved from the current Open Data platform.

3.1 Network assessment The network assessment has been done accordingly the project framework using the Iperf tool. The following table summarize the results obtained as shown in the following chapters

Server Client Country IP used for test Bandwidth CSI-Piemonte Spain (Lleida) 62.81.188.235 Approximately 15M bps CSI-Piemonte Turkey 212.156.105.46 Approximately 8M bps CSI-Piemonte Sweden 195.84.86.2 Approximately 100M bps SAMPAS Sweden 88.255.168.28 Approximately 5M bps

3.1.1 CSI-Piemonte - SAMPAS C:\>iperf.exe -c 194.116.109.14 -u -d -b 10m ------------------------------------------------------------ Server listening on UDP port 5001 Receiving 1470 byte datagrams UDP buffer size: 8.00 KByte (default) ------------------------------------------------------------ Client connecting to 194.116.109.14, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 8.00 KByte (default) ------------------------------------------------------------ [168] local 192.168.34.149 port 60696 connected with 194.116.109.14 port 5001 [ ID] Interval Transfer Bandwidth [168] 0.0-10.0 sec 11.9 MBytes 10.0 Mbits/sec [168] Server Report: [168] 0.0-10.3 sec 10.5 MBytes 8.59 Mbits/sec 15.836 ms 1017/ 8506 (12%) [168] Sent 8506 datagrams ******************************************************************************************** C:\>iperf.exe -c 194.116.109.14 -u -d -b 9m ------------------------------------------------------------ Server listening on UDP port 5001 Receiving 1470 byte datagrams UDP buffer size: 8.00 KByte (default) ------------------------------------------------------------ Client connecting to 194.116.109.14, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 8.00 KByte (default) ------------------------------------------------------------ [180] local 192.168.34.149 port 60697 connected with 194.116.109.14 port 5001 [ ID] Interval Transfer Bandwidth [180] 0.0-10.0 sec 10.7 MBytes 8.99 Mbits/sec [180] Server Report: [180] 0.0-10.0 sec 10.2 MBytes 8.57 Mbits/sec 0.203 ms 355/ 7649 (4.6%) [180] Sent 7649 datagrams ******************************************************************************************** C:\>iperf.exe -c 194.116.109.14 -u -d -b 8m ------------------------------------------------------------ Server listening on UDP port 5001 Receiving 1470 byte datagrams UDP buffer size: 8.00 KByte (default) ------------------------------------------------------------

Page 8: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 8 www.opendai.eu

Client connecting to 194.116.109.14, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 8.00 KByte (default) ------------------------------------------------------------ [164] local 192.168.34.149 port 60992 connected with 194.116.109.14 port 5001 [ ID] Interval Transfer Bandwidth [164] 0.0-10.0 sec 9.54 MBytes 8.00 Mbits/sec [164] Server Report: [164] 0.0-10.0 sec 9.54 MBytes 8.00 Mbits/sec 0.187 ms 0/ 6805 (0%) [164] Sent 6805 datagrams

3.1.2 CSI-Piemonte - Netport D:\iPerf\iperf-2.0.5-cygwin>iperf -c 194.116.109.14 -u -d -b 100m WARNING: option -b implies udp testing iperf: ignoring extra argument -- -u iperf: ignoring extra argument -- -d ------------------------------------------------------------ Client connecting to 194.116.109.14, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 64.0 KByte (default) ------------------------------------------------------------ [ 3] local 172.16.10.67 port 55874 connected with 194.116.109.14 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 120 MBytes 100 Mbits/sec [ 3] Sent 85464 datagrams [ 3] Server Report: [ 3] 0.0-10.0 sec 120 MBytes 100 Mbits/sec 0.087 ms 159/85463 (0.19%) [ 3] 0.0-10.0 sec 3 datagrams received out-of-order

3.1.3 CSI-Piemonte - Lleida As the following log explain the data throughput that gives 0% packet loss is achieved with a 15 Mbits/sec. Server listening on UDP port 5001 Receiving 1470 byte datagrams UDP buffer size: 122 KByte (default) ------------------------------------------------------------ [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 29281 [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0.0-10.3 sec 21.1 MBytes 17.2 Mbits/sec 11.410 ms 32677/47695 (69%) [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 49593 [ 4] 0.0-10.3 sec 21.3 MBytes 17.5 Mbits/sec 12.304 ms 34721/49945 (70%) [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 61199 [ 3] 0.0-10.1 sec 21.3 MBytes 17.8 Mbits/sec 2.482 ms 34262/49473 (69%) [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 30691 Client connecting to 62.81.188.235, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 122 KByte (default) ------------------------------------------------------------ [ 5] local 194.116.109.14 port 51877 connected with 62.81.188.235 port 5001 [ 4] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec 0.269 ms 0/ 893 (0%) [ 5] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec [ 5] Sent 893 datagrams [ 5] WARNING: did not receive ack of last datagram after 10 tries. [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 5992 [ 3] 0.0-10.0 sec 1.20 MBytes 1.01 Mbits/sec 0.359 ms 36/ 893 (4%) [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 9470 [ 4] 0.0-10.0 sec 1.22 MBytes 1.02 Mbits/sec 0.287 ms 16/ 888 (1.8%) [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 29901 ------------------------------------------------------------ Client connecting to 62.81.188.235, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 122 KByte (default) ------------------------------------------------------------ [ 5] local 194.116.109.14 port 48076 connected with 62.81.188.235 port 5001

Page 9: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 9 www.opendai.eu

[ 5] 0.0-10.0 sec 120 MBytes 101 Mbits/sec [ 5] Sent 85467 datagrams [ 3] 0.0-10.2 sec 21.4 MBytes 17.5 Mbits/sec 10.686 ms 66083/81370 (81%) [ 5] WARNING: did not receive ack of last datagram after 10 tries. [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 35178 Client connecting to 62.81.188.235, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 122 KByte (default) ------------------------------------------------------------ [ 6] local 194.116.109.14 port 39144 connected with 62.81.188.235 port 5001 [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 26253 [ 5] 0.0-10.0 sec 23.8 MBytes 20.0 Mbits/sec [ 5] Sent 17008 datagrams [ 4] 0.0-10.0 sec 20.3 MBytes 17.0 Mbits/sec 0.700 ms 2531/17007 (15%) [ 4] 0.0-10.0 sec 1 datagrams received out-of-order read failed: Connection refused [ 5] WARNING: did not receive ack of last datagram after 10 tries. [ 3] 0.0-13.7 sec 1.09 MBytes 672 Kbits/sec 0.747 ms 113/ 893 (13%) [SUM] 0.0-13.7 sec 21.4 MBytes 13.1 Mbits/sec [ 6] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec [ 6] Sent 893 datagrams [ 6] WARNING: did not receive ack of last datagram after 10 tries. [ 7] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 8299 ------------------------------------------------------------ Client connecting to 62.81.188.235, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 122 KByte (default) ------------------------------------------------------------ [ 7] local 194.116.109.14 port 56095 connected with 62.81.188.235 port 5001 [ 3] 0.0-10.0 sec 2.39 MBytes 2.00 Mbits/sec 0.122 ms 0/ 1702 (0%) [ 5] 0.0-10.0 sec 2.39 MBytes 2.00 Mbits/sec [ 5] Sent 1702 datagrams [ 5] WARNING: did not receive ack of last datagram after 10 tries. [ 4] 0.0-19.7 sec 1.25 MBytes 533 Kbits/sec 0.200 ms 0/ 893 (0%) [SUM] 0.0-19.7 sec 3.64 MBytes 1.55 Mbits/sec [ 7] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec [ 7] Sent 893 datagrams [ 7] WARNING: did not receive ack of last datagram after 10 tries. [ 6] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 39560 [ 6] 0.0-10.0 sec 2.39 MBytes 2.00 Mbits/sec 0.212 ms 0/ 1702 (0%) [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 54129 [ 3] 0.0-10.0 sec 20.6 MBytes 17.3 Mbits/sec 0.550 ms 2288/17007 (13%) [ 3] 0.0-10.0 sec 1 datagrams received out-of-order [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 34560 [ 4] 0.0-10.0 sec 17.9 MBytes 15.0 Mbits/sec 0.113 ms 0/12755 (0%) [ 4] 0.0-10.0 sec 1 datagrams received out-of-order [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 29014 [ 3] 0.0-10.0 sec 11.9 MBytes 9.98 Mbits/sec 0.236 ms 12/ 8504 (0.14%) [ 3] 0.0-10.0 sec 1 datagrams received out-of-order [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 35042 [ 4] 0.0-10.0 sec 11.7 MBytes 9.83 Mbits/sec 0.185 ms 145/ 8504 (1.7%) [ 4] 0.0-10.0 sec 1 datagrams received out-of-order [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 1680 [ 3] 0.0-10.0 sec 11.9 MBytes 10.0 Mbits/sec 0.233 ms 0/ 8505 (0%) [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 30616 [ 4] 0.0-10.0 sec 11.9 MBytes 10.0 Mbits/sec 0.366 ms 0/ 8504 (0%) [ 4] 0.0-10.0 sec 1 datagrams received out-of-order [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 46293 [ 3] 0.0-10.1 sec 21.4 MBytes 17.8 Mbits/sec 0.575 ms 66127/81425 (81%) [ 3] 0.0-10.1 sec 1 datagrams received out-of-order [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 26077 [ 4] 0.0-10.0 sec 20.0 MBytes 16.8 Mbits/sec 0.366 ms 2726/17007 (16%) [ 4] 0.0-10.0 sec 1 datagrams received out-of-order [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 10769 [ 3] 0.0-10.0 sec 17.9 MBytes 15.0 Mbits/sec 0.141 ms 0/12755 (0%) [ 3] 0.0-10.0 sec 1 datagrams received out-of-order [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 32262 [ 4] 0.0-10.0 sec 17.9 MBytes 15.0 Mbits/sec 0.199 ms 0/12755 (0%) [ 4] 0.0-10.0 sec 1 datagrams received out-of-order [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 48219 [ 3] 0.0-10.3 sec 20.9 MBytes 17.1 Mbits/sec 14.744 ms 2112/17000 (12%) [ 3] 0.0-10.3 sec 1 datagrams received out-of-order

Page 10: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 10 www.opendai.eu

[ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 31316 [ 4] 0.0-10.0 sec 19.1 MBytes 16.0 Mbits/sec 0.193 ms 0/13606 (0%) [ 4] 0.0-10.0 sec 1 datagrams received out-of-order [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 15178 [ 3] 0.0-10.0 sec 20.3 MBytes 17.0 Mbits/sec 0.084 ms 0/14472 (0%) [ 3] 0.0-10.0 sec 1 datagrams received out-of-order [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 6310 [ 4] 0.0-10.0 sec 21.0 MBytes 17.6 Mbits/sec 0.788 ms 302/15314 (2%) [ 4] 0.0-10.0 sec 1 datagrams received out-of-order [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 51878 ------------------------------------------------------------ Client connecting to 62.81.188.235, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 122 KByte (default) ------------------------------------------------------------ [ 5] local 194.116.109.14 port 57632 connected with 62.81.188.235 port 5001 [ 5] 0.0-10.0 sec 120 MBytes 101 Mbits/sec [ 5] Sent 85465 datagrams [ 3] 0.0-10.3 sec 21.3 MBytes 17.4 Mbits/sec 12.251 ms 34454/49662 (69%) [ 5] WARNING: did not receive ack of last datagram after 10 tries. [ 4] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 11654 [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 18604 ------------------------------------------------------------ Client connecting to 62.81.188.235, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 122 KByte (default) ------------------------------------------------------------ [ 5] local 194.116.109.14 port 52860 connected with 62.81.188.235 port 5001 [ 4] 0.0-10.0 sec 19.9 MBytes 16.6 Mbits/sec 1.214 ms 270/14472 (1.9%) [ 4] 0.0-10.0 sec 1 datagrams received out-of-order [ 3] 0.0-10.5 sec 1.17 MBytes 935 Kbits/sec 5.312 ms 14/ 852 (1.6%) [SUM] 0.0-10.5 sec 21.1 MBytes 16.8 Mbits/sec [ 5] 0.0-10.0 sec 1.19 MBytes 1000 Kbits/sec [ 5] Sent 852 datagrams [ 5] WARNING: did not receive ack of last datagram after 10 tries. [ 6] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 62651 [ 6] 0.0-10.0 sec 20.3 MBytes 17.0 Mbits/sec 0.095 ms 0/14472 (0%) [ 6] 0.0-10.0 sec 1 datagrams received out-of-order [ 3] local 194.116.109.14 port 5001 connected with 62.81.188.235 port 8717

3.1.4 SAMPAS – Karlshamn >iperf -c 88.255.168.28 -b 5m WARNING: option -b implies udp testing ------------------------------------------------------------ Client connecting to 88.255.168.28, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 64.0 KByte (default) ------------------------------------------------------------ [ 3] local 172.16.10.61 port 57948 connected with 88.255.168.28 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 5.96 MBytes 5.00 Mbits/sec [ 3] Sent 4253 datagrams [ 3] Server Report: [ 3] 0.0-10.0 sec 5.93 MBytes 4.97 Mbits/sec 3.906 ms 25/ 4253 (0.59%) [ 3] 0.0-10.0 sec 1 datagrams received out-of-order >iperf -c 88.255.168.28 -b 10m WARNING: option -b implies udp testing ------------------------------------------------------------ Client connecting to 88.255.168.28, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 64.0 KByte (default) ------------------------------------------------------------ [ 3] local 172.16.10.61 port 49529 connected with 88.255.168.28 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 11.9 MBytes 10.0 Mbits/sec [ 3] Sent 8505 datagrams [ 3] Server Report:

Page 11: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 11 www.opendai.eu

[ 3] 0.0-10.3 sec 8.50 MBytes 6.92 Mbits/sec 17.680 ms 2446/ 8496 (29%) [ 3] 0.0-10.3 sec 3 datagrams received out-of-order

3.2 Cloud infrastructure assessment CSI-Piemonte will dedicate to the project the following public IP networks for the cloud infrastructure: 194.116.110.0/24 Direct attach network 194.116.111.0/24 Acquired IP network 194.116.109.0/28 Management network The cloud infrastructure requires three types of network: a direct attach option for virtual machines directly connected to internet and acquiring a public IP address; an acquired network for machines that, having a private IP address of the cloud, have to expose services through NAT; a management network for machines dedicated to the cloud management that has to be reached by internet, but has to be on a dedicated network for isolation.

3.3 Legacy assessment

3.3.1 CSI-Piemonte/Regione Piemonte legacy infrastructure As shown in the table below the hardware serving the legacy DB is performing enough to receive the additional load.

Data set Legacy DB name Legacy DB description

Legacy DB engine (with version)

Legacy DB hardware

Legacy DB reachable through VPN ?

Measurement station information BDARIA

Data from measurement station and its sensors is split between some 10 tables, not all may be useful for pilot

Oracle8i Enterprise Edition Release 8.1.7.4.0

Sun Enterprise 15000 yes

Air quality measurement BDARIA

Data is collected in a single table per month per air parameter measured as above as above yes

Accident NSITPROD

Many of the tables regard data of hospitalization of people involved in accident, that data could not be part of the pilot and will not be opened up Oracle 10.2.0.4.0

Dell 2950 (2 x Xeon e5410, 24GB Ram) yes

Transport data SITPROD

Tables regarding time schedule and path of public transport at regional level Oracle 9.2.0.6.0 as above yes

3.3.2 Lleida legacy infrastructure Lleida has a CISCO VPN concentrator that will enable the implementation of a secure VPN connection from the cloud environment The following table details the data sets to be integrated from legacy applications (i.e. data sets currently published as Open Data is not included).

Page 12: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 12 www.opendai.eu

Data set Legacy DB name Legacy DB description Legacy DB engine Legacy DB hardware

Legacy DB reachable through VPN ?

Road Incident sqlopendai

Contains information on all types of incidents and events collected through the local police call center. Only data regarding road incidents, events and works will be provided.

Microsoft SQL server 2008

HP quad core with 8 GB RAM yes

City neighborhoods data set

sqlopendai Definition of the neighborhoods of the city of Lleida

Microsoft SQL server 2008

HP quad core with 8 GB RAM yes

Private bus services sqlopendai

Provided by private companies managing bus services within the city. The data set contains information on the existing buses and their routes

Microsoft SQL server 2008

HP quad core with 8 GB RAM yes

Information Offices data set sqlopendai

Relation of the municipal information offices and their location

Microsoft SQL server 2008

HP quad core with 8 GB RAM yes

Accommodations data set sqlopendai

Relation of the accommodations in the city and their location

Microsoft SQL server 2008

HP quad core with 8 GB RAM yes

Accessibility data set sqlopendai

Accessibility status (certified or not certified) of the establishments adhered to the hospitality federation

Microsoft SQL server 2008

HP quad core with 8 GB RAM yes

Catering businesses data set

sqlopendai Catering businesses adhered to the hospitality federation

Microsoft SQL server 2008

HP quad core with 8 GB RAM yes

Hospitality businesses data set

sqlopendai Hospitality businesses adhered to the hospitality federation

Microsoft SQL server 2008

HP quad core with 8 GB RAM yes

3.3.3 Ordu legacy infrastructure Ordu municipality database legacy infrastructure as shown below.

Data set Legacy DB name

Legacy DB description

Legacy DB engine Legacy DB hardware

Legacy DB reachable through VPN ?

City Dynamics ORDU

A full or filtered replica for production DB Oracle 10g

HP dual core with 12GB 2 clustered servers yes

Page 13: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 13 www.opendai.eu

POIs ORDU

A full or filtered replica for production DB Oracle 10g

HP dual core with 12GB 2 clustered servers yes

Demands & Complains ORDU

A full or filtered replica for production DB Oracle 10g

HP dual core with 12GB 2 clustered servers yes

3.3.4 Karlshamn legacy infrastructure Karlshamn legacy infrastructure will not be reachable through VPN connection as it resides on hardware dedicated also to other data and is not possible to let external access. The project chose in this case to make a copy of the data onto a dedicated infrastructure in Karlshamn data centre, it will be thus possible to make VPN connection to this proxy infrastructure. This is a situation where many PA could happen to be into; the goal of the project is to collect and document thoroughly the issues raised by the PA and address them demonstrating the security and validity of the model. Data set Legacy DB name Legacy DB description Legacy DB engine

(with version) Legacy DB hardware

Legacy DB reachable through VPN ?

Reports ODAIReports Tables with information about reports: status history, date/time, type, associated equipment, IP of creator.

MySQL Community Server 5.5.23

Windows Server 2008 R2 (x64)

no

Equipment MapInfo TAB files One file for each type of equipment. File spec is here: http://en.wikipedia.org/wiki/MapInfo_TAB_format

Windows Server 2008 R2 (x64)

no

4 Component technology assessment The component technology assessment has the target to define the software components used in the Open-DAI project. There are two type of components: the one dedicated to the infrastructural cloud component for assigning virtual resources, managing, and operating the cloud; the second that compose the middleware and applicative infrastructure that will be used by the pilots for demonstrating the projects objectives All the components are chosen from the open source ecosystem taking particular care on selecting, where possible, software that is backed up by solid vendors that can provided paid support if required by the service provider.

4.1 Cloud components These tools are needed to manage the cloud environment, to help provisioning and automating the management of the Open-DAI solution. Per se these components could be part of a generic cloud offering, if not for the different approach used in the project. The proposed model, in fact, is neither a pure IAAS where the user is left with just a bare virtual machine that he has to administer on its own from operative system up to what he decides to install, nor a PAAS where user is given just the access to specific platforms available just to make deployments of services he develops. The Open-DAI model is to propose an intermediate model where the user is given a pool of components that can be placed together in a known and defined deployment topology, leaving to him the decision on

Page 14: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 14 www.opendai.eu

which pieces to use and how to scale them, taking him away the complexity to install and manage all the core middleware. This can be considered a self provisioning PAAS over a known deployment model. This way the user can concentrate on business implementation, gaining a wide flexibility on the middleware infrastructure. Another principle of the project is that cloud service provider must know as little as possible of the business of the user assuring that provider can monitor and manage stuff without being able to look over the business data. To reach this goal many of tools to help management and operations are needed. This is also due to the fact that OSS solutions do not come with rich featured administration tools and there is the need to integrate different funcionalities to get the desired results. Tool role description dependency used by CloudStack Cloud engine IAAS cloud solution NA KVM virtualization

infrastructure the Linux kernel. KVM supports native virtualization on processors with hardware virtualization extensions

Linux OS CloudStack

Zenoss Monitoring and reporting

To monitor resources of the cloud and within tenants

Python End User

Puppet See below Ruby Foreman MCollective See below Puppet Will be integrated

into a web administration tool

Foreman/Puppet Dashboard

See below Ruby, Ruby on Rails, Puppet

End User

Bind Domain Server To allow for DNS management on tenants the cloud

Linux OS End User

Postfix Mail server To allow mail service to tenants on the cloud

Linux OS End User

MySQL Relational database This is an instance of MySQL dedicated to keep data of all the management tools. Since many request a DB the project will use a single instance to optimize resources

Linux OS Puppet, Zenoss

Of all this components there is a distinction to make: CloudStack is a complex tool that will manage the whole IAAS infrastructure and will create the private clouds for all partners, so this is the foundation of the architecture. All the other tools will be deployed inside each private cloud to help manage all the middleware components that will be designed.

Page 15: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 15 www.opendai.eu

4.1.1 Puppet One of the most powerful solutions among open source frameworks and tools for managing the configuration of computer systems is Puppet. It allows to automate nearly every aspect of a system administrator’s job: user management, software installation, and configuration of specific services. Puppet is Ruby-based, licensed as GPLv2 and can run in either client-server or stand-alone modes. It can be used to manage configuration of different platforms, such UNIX, Linux and recently Microsoft Windows platforms as well. With Puppet is possible to manage all the phases of the lifecycle of a host: building, installation, upgrading and maintenance. Unlike traditional provisioning tools which leave the hosts unmanaged, Puppet is designed to continuously interact with hosts. The “modus operandi” of Puppet is based on three components:

• a client-server model for the deployment; • a declarative language to define “resources” that describe the state of a specific configuration; • a transactional layer that encompasses the process for configuring each host.

In this model the server is called “Puppet master”, the “Puppet client” is considered as an agent and the host itself is defined as a node. The Puppet master contains the definition of all the configurations required for the hosts in your environment and runs as a daemon. The puppet agents connect to the Puppet master and retrieve any configuration to be applied (the connection between the server and the client is encrypted using a standard SSL protocol). Puppet works according to the order of relationships among resources described in the configurations. In the first step Puppet analyzes your configuration and calculates how to apply it to your agent. In order to do this, Puppet creates a graph that shows all resources, their relationships to each other and to each agent. Puppet currently does not provide a mechanism to directly manage the interdependence among different nodes, and therefore it is necessary to find alternative strategies to handle this eventuality, for example combining its functionalities with Marionette Collective.

4.1.1.1 Facter Puppet needs to know “how” different operating systems and platforms manage certain types of resources. Each type has a number of “providers” that specify, for example, how to manage packages using a particular package management tool. To collect information about the agent, Puppet uses Facter, an independent and cross-platform system inventory tool based on Ruby designed to gather the “facts” on all the nodes you will be managing with Puppet, such as the hostname, IP address, operating system and other configuration items. When the agent connects to the master, Puppet will choose, for instance, the appropriate package provider in order to install an application software: on Centos, it will execute yum, on Ubuntu it will execute aptitude, and on Solaris it will use the pkg command. These facts allow you to customize the configuration for each host, becoming available as variables that can be used by Puppet. For instance, you can define a generic resource, for different operating systems, and then combining it with data (“facts”) from your agents. Facter can also be extended to add custom facts for specific information about your hosts: this use becomes more important when you want to manually trigger the connection of a specific cluster of agents with Marionette Collective.

4.1.2 Marionette Collective As explained in the previous paragraph the usual setting of Puppet is to run as a daemon: the agent periodically checks with the master to confirm that its configuration is up-to-date or to retrieve any new configuration (30-minutes run interval by default). Marionette Collective AKA MCollective is an orchestration framework that allows real-time commands, with the possibility of reconfiguring nodes on demand and in a controlled manner. Furthermore, MCollective can use metadata provided by Facter, collecting information and preparing tasks for a well-defined set of machines. Despite the existence of other tools to perform these actions that effectively provide the same functionality of the Unix shell, a robust API allows an orchestration by implementing as small agent plugins (though powerful, the shell interface is not in fact an ideal application-programming interface). Puppet is able to work with MCollective with specific Puppet Agent plugins, exploiting subscription messaging techniques that take advantages of modern technologies to handle communication between the nodes in a collective. These

Page 16: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 16 www.opendai.eu

technologies are implemented as asynchronous messaging software (also defined messaging middleware) such as Apache ActiveMQ and RabbitMQ.

4.1.3 Puppet Dashboard and Foreman In order to support the use of Puppet in production environments, some tools were born to avoid the management of Puppet configurations only via command line and filling manifest files. Among these tools, two console products have emerged: Puppet Dashboard and Foreman. The first was created by Puppet Labs, the company that supports the Puppet development, while the second was created by Israeli developer Ohad Levy. Both are applications based on Ruby on Rails (an open source web framework developed on Ruby), but they are characterized by some slight differences. In particular, Puppet Dashboard can be used as an External Node Classifier (ENC), a reporting tool, also becoming an interface to integrate some features introduced in the later versions of Puppet, including audit and inventory services. Foreman includes also some inventory capabilities, but it has a strong focus on provisioning and data center management. The External Node Classification (ENC) is a solution to store node information in external resources, in order to avoid to specify a large number of nodes in the manifest files: a not scalable and time-consuming solution. ENC is a script-based integration system that returns some information: classes, inheritance, variables and environment configuration that Puppet can then use to configure your hosts. The node’s configuration is returned in the form of YAML data, a human-friendly serialization language often used in some programming language (for example Ruby) as a configuration file format. Nevertheless, for an accurate system inventory in multi-master environments, the inventory service of Puppet is much more suitable to retrieve, store and search the node configurations through the REST API exposed to the network by the Puppet master: in this case, the collection of information about a specific node is in the form of Facter facts.

4.2 Middleware components Middleware components are the tools that Open-DAI pilots will be leveraging on to implement new services based on opened data services. Part of the tools are functional to the operation of exposing the data contained in the legacy silos, part is used, depending on the business case, to implement the new service (those components will be marked as used by “software designer”) Tool role description Dependency used by Jboss Application server Java Application

server to enable serve both deployment of architectural component and pilot deployments

Java TEIID, D2RQ, GeoServer and software designer

TEIID Virtual Database server

Enterprise Information integration tool, it will host virtual DB leveraging on the clustering and HA features of the Jboss application server

JBoss Software designer

D2RQ Relational to semantic translator

See below Software designer

GeoServer Spatial server Software designer Apache Web server Server that will

represent the entry Linux OS Jboss, WSO2 tools,

node.js, Software

Page 17: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 17 www.opendai.eu

point and proxy for all internal service

designer

WSO2 BPS Business process server

Server to manage processes designed in BPEL language

Java, WSO2 Registry, WSO2 ESB

Software designer

WSO2 ESB Enterprise service bus

Bus infrastructure to guarantee message delivery and robustness to processes

WSO2 registry Software designer, WSO2 BPS

Node.js Application server Javascript server side component

Apache Software designer

Push notification server

Server that will push notifications to mobile applications

Software designer

WSO2 Governance registry

Shared registry Java Software designer

4.2.1 SOA components The WSO2 components compose the SOA layer that will allow the implementation of processes without having to code software, but designing workflows and orchestrating elementary functions and data services. The project chose these components over other frameworks because they are well known by the partners and considered a sound solution in the open source software ecosystem.

4.2.2 TEIID vs WSO2 Data service server

4.2.2.1 WSO2 Data Services Sever WSO2 Data Services Server is the data source virtualization component of WSO2 Carbon suite. It is available either as an OSGi module already integrated in the Carbon product (so that it can be enabled or disabled directly from the Carbon GUI) or as an independent software combinable with other WSO2 products through Web service. WSO2 Data Services Server is a mechanism to provide a WS interface (SOAP or REST) to one or more different data sources (RDBMS, CSV, Excel, etc.) of different vendors (Oracle, Teradata, Microsoft, etc). Supported data sources are any RDBMS, CSV, Excel, ODS, Cassandra, Google Spreadsheets, RDF, and any Web page via scraping. Supported data bases are MSSQL, DB2, Oracle, OpenEdge, Teradata, MySQL, PostgreSQL/EnterpriseDB, H2, Derby or any database with a JDBC driver. Supported transports are HTTP, HTTPS, JMS, SMTP, and others including (via WSO2 Enterprise Service Bus) FTP, FTPS, SFTP and TCP. WSO2 Data Services Server creates a data abstraction from the underlying data sources and makes it available as one or more high-level services, making transparent to users the underlying details of data retrieving. Services are described by admin via the GUI and translated by software into a simple XML file compliant with Data Services Descriptor Language (DSDL). DSDL is a XML based language defined by WSO2 to write data services. The main advantage of WSO2 Data Services Server is the ability to summarize many complex operations on data in a single service (for instance a set of queries launched on multiple databases in a certain order), while preserving security (using WS-Security) and reliability (using WS-ReliableMessaging) of the service. Through the WSO2 ESB orchestrator, data services can also be integrated with other Web

Page 18: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 18 www.opendai.eu

services exposed on ESB, forming a chain of more complex services, which may involve the Identity Server, the Business Process Sever, the Rules Server, etc.

Another interesting aspect of WSO2 Data Services Server is the recent opening up to Semantic Web standards (namely RDF – Resource Description Framework). Software release 2.6.3 supports exposing data in RDF as a service. However, this feature doesn’t meet the Open-Dai architectural requirements, because we want to provide semantic access to a traditional data source, not traditional access to a semantic data source.

4.2.2.2 JBoss Teiid JBoss Teiid is an open source data virtualization system developed by Red Hat. It consists of several modules that work together for allowing application designers to integrate different data sources and expose them via JDBC-SQL, SOAP (Web services), SOAP-SQL or XQuery. These modules are:

• Teiid Designer • Teiid Server (Teiid Runtime) • Connector Framework • Teiid Console & Admin Shell • Teiid Query Engine

Page 19: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 19 www.opendai.eu

Through its Connector Framework, JBoss Teiid can virtualize various types of data sources (Oracle, DB2, SQL Server, MySQL, PostgreSQL, XML, File etc.) and even allow developers to write their own connectors for customized and not-expected sources. On top of these data sources JBoss Teiid defines relational and XML views for abstracting the information structure from underlying physical data structures. Teiid Server, deployed on JBoss AS, is a scalable and manageable runtime environment that provides additional security, fault-tolerance, and administrative features. It provides access to the Virtual Database via JDBC or Web services. Teiid Query Engine processes relational, XML, XQuery and procedural queries from federated data sources. Features include support for homogenous schemas, heterogeneous schemas, transactions, and user defined functions. The main strength of JBoss Teiid is its component called Teiid Designer, an Eclipse-based graphical modeling tool for modeling, analyzing, integrating and testing multiple data sources to produce relational, XML and Web service views that expose business data. Though Teiid Designer, system designer can perform model-driven definition, integration and testing of data services in a graphical fashion and without programming. This allows to start solving at a high level semantic differences between data sources, so that a subsequent data transformation in RDF can be facilitated.

4.2.2.3 Why we choose JBoss Teiid What really matters in the Open-DAI project is not to predefine a set of standard operations on data sources that can be called via Web service, but to obtain a single virtual database from different data silos, so that:

• virtualization accomplishes the first step towards semantic data integration, defining a unified logical schema on top of a number of source logical schemas. This schema, and not individual source models, will be possibly mapped, by means of D2RQ, to an ontology specially designed for the system.

• the final result has to allow every other system component to query the virtual database also in a

traditional way (either via JDBC or WS, however simply passing a SQL query) without forcing them to use services predefined by the data server.

Page 20: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 20 www.opendai.eu

The main disadvantage of the WSO2 Data Services Sever approach is that a database, once virtualized by the system, is no longer reachable via JDBC and treatable as a normal database, but it becomes a pure Web service. Instead, a database virtualized by JBoss Teiid is still a database (even if it represents a logical abstraction of the most different data sources) and thus it can be accessed via JDBC and has its own DB schema on which to make further mapping, even semantics-based. JBoss Teiid can be considered as a more moderate and flexible solution than WSO2 Data Services Server, leaving the system administrator the ability to perform traditional database management operations. The main advantages of using JBoss Teiid are the following:

• JDBC/SQL access to data. JDBC APIs are the most comfortable and familiar way to have access data from Java applications. Other components of the Open-DAI architecture, developed in Java, can directly query the virtual database through JDBC/SQL, without being constrained by prepackaged Web services. Moreover, the D2RQ component can just handle data sources exposed through JDBC, so this kind of access is the only one compliant with the rest of the Open-DAI system.

• JBoss Teiid can be deployed on JBoss AS with a very quick and easy configuration. JBoss AS is also used by other components of Open-DAI architecture (GeoServer, D2RQ, Semantic Manager), then system designer can simultaneously deploy these applications on the same AS, integrating them in a unique environment and managing them from the same interface.

• JBoss Teiid not only has good documentation freely available online, but can leverage the activeness of the JBoss community for reporting problems, finding solutions and analyzing case studies. A number of forums, blogs, wikis, chats and mailing lists are available on the JBoss Teiid website, which can help developers and designers to not feel left alone during the entire software lifecycle.

• With Teiid Designer component, system designer can perform data abstraction through an Eclipse-based modeling tool, starting that data integration/homogenization process which will be refined by the semantic modules of Open-DAI architecture.

4.2.3 D2RQ D2RQ is an open source software, released under Apache License, which allows access via SPARQL protocol and language to a set of data contained in a traditional RDB. D2RQ system is rather simple and consists of three main modules: D2RQ Engine, D2R Sever and an XML file called D2RQ Mapping File.

Page 21: Deliverable Number: D2open-dai.eu/wordpress/media/ODAI-WP2-D1-Technology-Assessmen… · 4.2.2 TEIID vs WSO2 Data service server.....17 4.2.3 D2RQ.....20. Technology Assessment Report

Technology Assessment Report

PUBLIC/CONFIDENTIAL 21 www.opendai.eu

D2RQ Engine uses the Mapping File to convert into RDF all (or part of) contents of a database. Then it exposes them through APIs that can be integrated into popular semantic Web frameworks such as Jena and Sesame. The RDF graph produced by D2RQ Engine is also passed to the D2R Server component, that builds up a SPARQL endpoint from which semantic applications can retrieve in real time RDF data. The D2RQ Mapping File is written in D2RQ Mapping Language, a declarative language for mapping relational database schemas to RDF vocabularies and OWL ontologies. It can be automatically produced by the system starting from relational database tables, or it can be defined on the basis of a vocabulary or an ontology, either standard or custom. So users can choose to map data to some metadata schemas well-known in the Semantic Web, such as FOAF or Dublin Core, or to an ontology built by himself with customized classes and relations. A Jena-based semantic application, through the specific APIs, can easily integrate the knowledge base produced by D2RQ with its own ontology, performing reasoning operations and exposing a new SPARQL endpoint which contains the new inferred information. There are other products providing RDF/SPARQL access to relational databases: Revelytix Spyder, Virtuoso RDF Views, Virtuoso Sponger, Sparqlify, Triplify, SquirrelRDF and METAmorhposes. Unfortunately, the two most interesting, namely Revelytix Spyder and Virtuoso RDF Views, have problems related to licenses: Revelytix Spyder is free but not open source, whereas the mechanism of Virtuoso RDF Views for connecting to an external database is part of the proprietary version of Virtuoso Server and it is not provided by Virtuoso Open Source Edition. Both Triplify and METAmorhposes does RDFize data, but doesn’t contain SPARQL endpoint functionalities. Virtuoso Sponger extracts RDF data from HTML pages, CSV, Atom, RSS, OO documents, etc. This would be optimal for situations when a business application has proprietary database you cannot directly access but has a Web frontend. Nevertheless this scenario is not expected in Open-DAI project. Sparqlify supports only PostgreSQL databases. SquirrelRDF provides a tool that creates just a rough mapping for a database schema: it’s the naive RDB to RDF mapping described in the document “Relational Databases on the Semantic Web” by Tim Berners-Lee , which does not consider an ontology. So the main reasons why you chose D2RQ compared to similar products are the following.

• It allows automatic generation of a SPARQL endpoint and not merely transform data in RDF. • It is natively combinable with Jena, the most popular framework for developing Semantic Web

applications. • “True” open source. We want to avoid, where possible, “free” or “community” versions that retain

restrictions in usage or modification. On the contrary, we want to privilege products that are licensed using one of the common open source licenses: GPL, LGPL, BSD, Apache, Mozilla Public License, etc.

• It is the most popular and reliable solution, created by one of the best known research group in the Semantic Web field (Christian Bezier of Freie Universität Berlin).

• Very lightweight, it hasn’t got nonessential features or enhancements intended especially to add commercial appeal.

• Easy to use, because it also comes with a very well-written documentation. • It supports update operations on the data source using SPARQL 1.1.