Stetl for INSPIRE Data Transformation
-
Upload
just-van-den-broecke -
Category
Technology
-
view
546 -
download
1
description
Transcript of Stetl for INSPIRE Data Transformation
INSPIRE Transformation with Stetl-
A lightweight Python Framework for Geospatial ETL
Just van den BroeckeEuroGeographics - KEN Workshop
Paris, Oct 8, 2013www.justobjects.nl
About MeIndependent Open Source Geospatial Professional
Secretary OSGeo Dutch Local Chapter Member of the Dutch OpenGeoGroep
Just van den [email protected] www.justobjects.nl
We have a Problem
The Rich GML Problem
Rich GML = Complex Mess
INSPIRE Dutch National Datasets
Germany: AFIS-ALKIS-ATKISUK: OS Mastermap
.
.
“Semi GML” e.g. Dutch Addresses & Buildings (BAG)
ArbitraryNesting
The Street Name!
A Street Element in an INSPIRE Annex I Address..
Complex Model
Transformations
100+ MBGML Files
Millionsof
Objects
10s of Millionsof
<Elements>
MultipleTransformation
Steps
Solution is Spatial ETL
But How ?(with FOSS)
FOSS ETL - DIY ? Maybe
FOSS ETL - High Level
FOSS ETL - Lower Level
Each powerful individually but cannot do the entire ETL
ogr2ogr
FOSS ETL - How to Combine?
=+ + ?ogr2ogr
Example - 2011 Kadaster ESDIN
http://inspire.kademo.nl/doc/design-etl.html
Good ideas buthard to scale and reuse. Need Framework
FOSS ETL : Add Python to Equation
=+ + ?( )ogr2ogr
=+ +
Stetl
( )ogr2ogr
Stetl=
SimpleStreaming
SpatialSpeedy
ETL
GML1
GML2
Stetl
From Barrels of GML to Maps
From Local National Datato INSPIRE DL Services
Source<GML>
NLExtractStetl deegree
WFS
INSPIRE<GML>
AtomFeed
INSPIREAddresses
DutchAddresses+
Buildings
deegreeblobstore
Stetl
StetlConcepts
Process Chain
Input Filter OutputFilter
Stetl concepts
Source Target
Process Chain
Input Filter Outputgml
Filter
Stetl concepts
Example: GML to PostGIS
Reader ogr2ogr
gml
Stetl concepts
Example: INSPIRE Model Transform
ogr2ogr XSLT Writergml
Stetl concepts
Simple Features
Complex Features
Example: deegree Store
ogr2ogr XSLTdeegreeWriter
Stetl concepts
Or viaWFS-T
Process Chain - How?
Input Filters Output
Stetl concepts
Example: XML to Shape
XMLInput
XSLTFilter
ogr2ogrOutput
Example: XML to Shape
The Source
Example: XML to Shape
XMLInput
Example: XML to Shape
XMLInput
XSLTFilter
Example: XML to Shape
Prepare XSLT Script
Example: XML to Shape
XSLT GML Output
Example: XML to Shape
XMLInput
XSLTFilter
ogr2ogrOutput
Example: XML to Shape
The Stetl Config File
ProcessChain
XMLInputXSLT
Filter
ogr2ogrOutput
Running Stetl
stetl -c etl.cfg
Result Shapefile viewed in QGIS
Installing Stetl
via PyPi
Deps•GDAL+Python bindings•lxml (xml proc)•psycopg2 (Postgres)
sudo pip install stetl
Speed: Streaming
Input Filter Output
gml
Stetl concepts
Speed: Going Native
Input Filter Outputgml
ogr2ogr StetlStetl
Native C Libs/Progs
Calls
Stetl concepts
Example Components
Input Filters Output
Stetl concepts
XMLFile XSLT GMLFile
ogr2ogr XMLAssembler ogr2ogr
LineStream XMLValidator WFS-T
deegree* FeatureExtractor deegree*
YourInput YourFilter YourOutput
Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMAT
log = Util.get_log("xsltfilter")
class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)
self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close()
def invoke(self, packet): if packet.data is None: return packet return self.transform(packet)
def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet
[etl]chains = input_xml_file|my_filter|output_std
[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml
# My custom component[my_filter]class = my.myfilter.MyFilter
[output_std]class = outputs.standardoutput.StandardXmlOutput
class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)
def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet
Your Own Components
Stetl concepts
Step 1- Define Class
Step 2- Config Class
Data Structures
Stetl concepts
• Components exchange Packets• Packet contains data and status• Data formats, e.g. :
xml_line_stream etree_docetree_element (feature)etree_element_arraystringany..
deegree Integration
Stetl concepts
•Input DeegreeBlobstoreInput•Output DeegreeBlobstoreInput DeegreeFSLoaderOutput WFSTOutput
Cases - The Netherlands
•INSPIRE Download Services publish to deegree store (WFS) generate GML files (for Atom Feed)
•National GML Datasets GML to PostGIS (Top10NL, BGT)
[etl]chains = input_sql_pre|schema_name_filter|output_postgres, input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr, input_sql_post|schema_name_filter|output_postgres
# Pre SQL file inputs to be executed[input_sql_pre]class = inputs.fileinput.StringFileInputfile_path = sql/drop-tables.sql,sql/create-schema.sql
# Post SQL file inputs to be executed[input_sql_post]class = inputs.fileinput.StringFileInputfile_path = sql/delete-duplicates.sql
# Generic filter to substitute Python-format string values like {schema} in string[schema_name_filter]class = filters.stringfilter.StringSubstitutionFilter# format args {schema} is schema nameformat_args = schema:{schema}
[output_postgres]class = outputs.dboutput.PostgresDbOutputdatabase = {database}host = {host}port = {port}user = {user}password = {password}schema = {schema}
# The source input file(s) from dir and produce gml:featureMember elements[input_big_gml_files]class = inputs.fileinput.XmlElementStreamerFileInputfile_path = {gml_files}element_tags = featureMember
Top10NL Extract
ParameterSubstitution
Top10NL+BAG (Dutch Topo + Buildings)
BGT - Dutch Large Scale Topo
Cases - INSPIRE Transforms
•Simple: Dutch Admin Borders to AU
•Advanced: Dutch Addresses to AD
INSPIRE - XSLT STRUCTURE
Local CP GMLto
INSPIRE SpatialDataset
Local CP GMLto
INSPIRE GML
GenerateCP INSPIRE GML
ReusableXSLT ScriptsReusable
XSLT Scripts
Theme CP
Local AU GMLto
INSPIRE SpatialDataset
Local AU GMLto
INSPIRE GML
GenerateAU INSPIRE GML
Theme AU
Local GN GMLto
INSPIRE SpatialDataset
Local GN GMLto
INSPIRE GML
GenerateGN INSPIRE GML
Theme GN
Called by All
Locally Specific XSL
GenericXSL
XSLT Template Call
XSLT - 3 MAIN STEPS/SCRIPTS
1.Generate Spatial Dataset GML Container (specific)
2.Extract data values from local OGR simple feature data (specific)
3. Call XSLT template per Theme Feature type (generic)
XSLT AU - STEP 1
XSLT AU - STEP 2
XSLT AU - STEP 3
XSLT - REUSE
STETL CONFIG
STETL CONFIG AD
Case: INSPIRE DL Services - Dutch Addresses
Source<GML>
NLExtractStetl deegree
WFS
INSPIRE<GML>
AtomFeed
INSPIREAddresses
DutchAddresses+
Buildings
deegreeblobstore
Stetl
Other Uses (Geocoder etc)
Project Status - Sept 21, 2013
• v1.0.4 installable via PyPi• Documentation on www.stetl.org • Real world transforms done• Seeking feedback, support and contributors
Rich GML Problem Solved?
Thank You !
www.stetl.orggithub.com/justb4/stetl