Dockerizing a multi-component Open Data app
-
Upload
stergios-tsiafoulis -
Category
Software
-
view
3.168 -
download
0
Transcript of Dockerizing a multi-component Open Data app
Dockerizing a multi-component Open Data app
Athens Docker Meetup, June 2016Dimitris Negkas, Stergios Tsiafoulis
Description and Scope
LinkedEconomy (http://linkedeconomy.org/).
is a publicly available web platform and linked data repository.
its scope is to transform, curate, aggregate, interlink and publish economic data in machine-readable format, to enable citizens awareness
research with unprecedented data
evidence-based policy
Data Sources Sources Currently used:
Transparency – DIAVGEIA
Central Electronic Registry of Public Procurement - E-Procurement
National Strategic Reference Framework (NSRF)
Central Market of Thessaloniki (CMT)
e-Prices
Fuel Prices
Municipality of Athens, Municipality of Thessaloniki
Government of Australia
Data growth
we use Open Link Virtuoso for 15 different sources of nearly 1B triples
we host 27 datasets in CKAN from 15 organizations
data is increased respectively each month
Data processing Each data source is separately handled and processed as its
available data are not uniformly provided or in machine-readable format.
Diavgeia, “NSRF” and Observatories for product and fuel prices provide a rich API interface that can be easily queried in order to provide machine-readable data in JSON format.
In the cases of E-Procurement, “CMT” and “Municipalities of Athens and Thessaloniki” there is no API available. Thus, we have developed a software module, which gathers online information in an automated way, storing it in a machine-readable format.
General Architecture
Process model
Open economic data related to public budgeting, spending and prices are characterized of high volume, velocity, variety and veracity
We have to build custom components under the common logic of transforming static data to linked open data streams.
Process model: Nucleus
The nucleus of our approach is semantic modelling, data enrichment and interconnections.
Data are stored in raw (as harvested from sources), in RDF and json formats.
Process model : Data distribution
Enriched data are distributed though five channels:
1. Data dumps (CKAN), 2. SPARQL queries,3. Web, 4. Social media 5. Structured inputs to
Business Intelligence (BI) systems.
Additionally, data can be further analysed and exchanged with relevant platforms (e.g. SPARQL to R).
Process model : Validation and messenger
The validation component runs throughout the whole process in order to safeguard high data quality by detecting errors.
The messaging component works as an internal messaging and alert system for all components.
Infrastructure
Functionalities /
Components Services / Data sources
VM1 linkedeconomy.org apache, php, mysql, drupal
VM2 SPARQL endpoint, demo site OLV, apache, php, mysql, drupal
VM3 Harvester
CouchDB, Lucene, apache, mysql / CKAN
(Greek Datasets)
VM4 Harvester, Messenger mysql, LinkedEconomy dropbox
VM5 Storage - Secondary triplestore CouchDB, OLV, CouchDB-Lucene, docker
VM6 Harvester
apache, php, mysql, drupal / CKAN (Foreign
Datasets)
VM7 SPARQL endpoint OLV (Foreign graphs)
VM8 Management JIRA, mysql, tomcat
VM9 Dashboard front-end, CMS, INSPINIA
VM10 System administration VPN, firewalls, etc.
Physical Storage - Core triplestore OLV (Greek graphs)
As core infrastructure we use ~okeanos, which is an established cloud-based service provided for the Greek research and academic community.
Application System
Small ApplicationsJava, Php and UNIX Scripts
Di@vgeia
KHMDHS
Virtuoso
CouchDB
Drupal
MySql
ePrices
CKAN
fuelPricesQGIS
Dockerize the System
Di@vgeia
KHMDHS
ePrices
Virtuoso
Drupal
MySql
QGIS Desktop
CouchDB
QGIS Server
Small Applications
CKAN
Docker MySQL
version: '2' services: mysql: build: ./mysql-docker/5.6 container_name: eLodDrupalmySQL volumes: - /mysql_drupal:/var/lib/mysql environment: - MYSQL_DATABASE=drupalelod - MYSQL_ROOT_PASSWORD=eLodmysqlpass restart: on-failure
Save your data !!
Will build the image from your directory
Do not use flag “always” in your development environment!
Docker Drupal drupal: build: ./docker-drupal command: - /start.sh depends_on: - mysql container_name: eLodDrupal #image: eLodDrupal ports: - "8081:80" volumes: - "/data_drupal:/var/www/html" links: - "mysql" environment: - MYSQL_DATABASE=drupalelod - MYSQL_USER=root - MYSQL_PASSWORD=eLodmysqlpass - DRUPAL_ADMIN_PW=eLODDR - DRUPAL_ADMIN=admin - MYSQL_HOST=eLodDrupalmySQL - [email protected] restart: on-failure
Will start the service only after MySQL service
Will link the container with MySQL container
Docker Virtuoso virtuoso: build: ./docker-virtuoso container_name: eLodVirtuoso ports: - "8890:8890" volumes: - /virtuoso/db:/var/lib/virtuoso/db environment: - DBA_PASSWORD=eLodVir - SPARQL_UPDATE=true - DEFAULT_GRAPH=http://localhost:8890/DAV restart: on-failure
Docker QGIS qgisdesktop: #image: kartoza/qgis-desktop:2.14 build: ./qgis-desktop/2.14 hostname: qgis-server volumes: #Wherever you want to mount your data from - ./gis:/gis #Unix socket for X11 - "/tmp/.X11-unix:/tmp/.X11-unix" links: - db:db environment: - DISPLAY=unix:1 command: /usr/bin/qgis
Build the system
Clone the repository from githubhttps://github.com/stetsiafoulis/eLOD
Create the directories where you are going to link your data
Enter docker-compose up -d and that’s it !!
Why Docker ?
o Portableo Lightweight o Move to different cloud infrastructures
and to Physical serverso Run on Virtual Machines for
development and testing o Easily Scale o Easy Delivery and deploymento Run Anywhere (regardless host distro,
physical, cloud or not )o Run Anything
Scaling per Source
Di@ygeia KHMDHSVirtuoso
Drupal
MySql
QGIS Desktop
CouchDB
QGIS Server
Small Applications
Virtuoso
Drupal
MySql
CouchDB
QGIS Server
Small ApplicationsQGIS Desktop
Next Steps - Swarm Virtuoso
Drupal
MySql
CouchDB
QGIS Server
Cluster management ScalingState reconciliationMulti-host networkingService discoveryLoad balancing
Appendix - Data Sources links LinkedEconomy (http://linkedeconomy.org/).
Sources Currently used: Transparency - DIAVGEIA: https://diavgeia.gov.gr Central Electronic Registry of Public Procurement - E-Procurement (KHDMHS):
http://www.eprocurement.gov.gr National Strategic Reference Framework (NSRF): https://www.espa.gr/en Central Market of Thessaloniki (CMT): http://www.kath.gr/ e-Prices: http://www.e-prices.gr/ Fuel Prices: http://www.fuelprices.gr/ Municipality of Athens: https://www.cityofathens.gr/khe/proypologismos Municipality of Thessaloniki:
http://www.thessaloniki.gr/portal/page/portal/DioikitikesYpiresies/GenDnsiDioikOikonYpiresion/DnsiDiafanEksipirDimoton/TmimaDiafaneias/AnoiktiDdiathesiDedomenon/DimosiefsiEktelesisProipologismou/ektelesi-proypologismou
Government of Australia: http://data.gov.au/