NEW APPLICATIONS OF CIM TO DATA ANALYTICS - …cimug.ucaiug.org/Meetings/EU2016/Presentations/CIM...
Transcript of NEW APPLICATIONS OF CIM TO DATA ANALYTICS - …cimug.ucaiug.org/Meetings/EU2016/Presentations/CIM...
NEW APPLICATIONS OF CIM TO DATA ANALYTICS
European CIM User Group, Amsterdam.
EDF Research & Development Division.
Friday, June 3rd 2016.
| 2
§ How Data Analytics can improve the business of a DSO ?
§ What is Data Analytics ?
§ The Data Science Process (« Doing Data Science » - Cathy O’Neil & Rachel Schutt).
THE OBJECTIVE
New applications of CIM to Data Analytics | June 2016
| 3
THE PROJECT
§ Explore the potential of Data Analytics solutions that are relevant for DSO.¨ For data collection, storage & cleaning.¨ For statistical models.¨ For visualization.
§ Initiation of a Data Analytics Platform to examine these solutions.
§ The main components of the platform, till now.¨ A triplestore (Stardog) to store & process the knowlegde of the electrical network.¨ A graph database (Neo4j) to store & process the topology of the network.¨ A data historian (Open TSDB) to store & process the measurements of the network.¨ Some visualization modules.¨ And one common language for the whole platform : the CIM.
§ Partnership with IREQ (Research Institute of Hydro-Québec) on the semantic web.
New applications of CIM to Data Analytics | June 2016
| 4
Graph Database
Neo4j
THE DATA ANALYTICS PLATFORM
New applications of CIM to Data Analytics | June 2016
TriplestoreStardog
Data HistorianOpenTSDB
Data Vizualisation
GIS
Network Operations Asset ManagementWork Management
MeteringWeather Foreast
Customer Support…
CollectProcessClean
Statisticalmodels
OMS
SCADA
WMS
MMS …
Real world & DSO referencials Vizualise
THE TRIPLESTORE
European CIM User Group, Amsterdam.
Friday, June 3rd 2016.
| 6
THE TRIPLESTORE - PRINCIPLES
§ Stores & process the knowlegde of the electrical network.¨ A technology from the ressource description (RDF) & the semantic web.¨ Stores the complete network : equipments, assets, locations, organisations...¨ No data model, just triples : subject, predicate, object.¨ But a vocabulary (or ontology) : we use the CIM.¨ 1/30th of the french MV network with CIM ~ 7 millions triples.¨ Stardog, as most triplestore, can process billions of triples.
New applications of CIM to Data Analytics | June 2016
| 7
THE TRIPLESTORE - BENEFITS
§ Many data references in the real world.¨ Various data models & formats.¨ The same equipment in the real world can be referenced more than once.
§ A unique data reference in our platform.¨ A unique vocabulary (CIM) & a unique format (triples, in Turtle for import).¨ Triplestores have facilities to clean the data (sameAs).¨ Triplestores have facilities to manage the upgrades of the network (named graph).
New applications of CIM to Data Analytics | June 2016
TriplestoreStardog
èCIMGIS
SCADA MV network in the SCADA
LV network in the GIS
LV network in CIM
MV network in the GIS
A HV/MV transformer in the real world
sameAsèCIM
èCIM
MV network in CIM
MV network in CIM
| 8
THE TRIPLESTORE – CIM BENEFITS
§ An example of the CIM benefits : equipment naming.¨ In the real world, various representation of the network.¨ The same equipment can be identified by multiple identifiers & names.¨ The CIM classes Name & NameType are particularly relevant to assign multiple
identifiers & names to an IdentifiedObject.¨ We design the mRID of each IdentifiedObject with the NameType & the Name.
New applications of CIM to Data Analytics | June 2016
Namename=[SITR]
Namename=[NOM]
Namename=[GCO]
Namename=[SIG]
NameTypename = « scada_names »description = « SCADA names »
NameTypename = « scada_ids »description = « SCADA identifiers »
NameTypename = « gdo_codes »description = « GDO codes »
NameTypename = « gis_ids »description = « GIS identifiers »
IdentifiedObjectmrid=[NameType.name]/[CIM short name]/[Name.name]
THE GRAPH DATABASE
European CIM User Group, Amsterdam.
Friday, June 3rd 2016.
| 10
THE GRAPH DATABASE - PRINCIPLE
§ Stores & process the topology of the electrical network.¨ A technology from NoSQL & Big Data trend.¨ No data model, just nodes & relations.¨ Nodes can have labels to describe their roles : we use the CIM.¨ Nodes and relations can have properties : we use the CIM.¨ 1/100th of the french MV & LV network with CIM ~ 300.000 nodes.
New applications of CIM to Data Analytics | June 2016
| 11
THE GRAPH DATABASE - IMPORT
§ The import of the network in the graph database.¨ Nodes represent CIM equipements (Conducting & Containers), but not only…¨ The connectivity of the CIM (ConnectivityNode & Terminal) is replaced by relations.¨ Only relevant classes & attributes of the CIM are used to label & detail the nodes.
New applications of CIM to Data Analytics | June 2016
Substation
Switch
EnergyConsumer
Switch
Junction
| 12
THE GRAPH DATABASE - BENEFITS
§ Neo4j can process the graph of the electrical network.¨ With the Cypher query language.
• A cypher query which returns the EnergyConsumer contained in a Substation.• MATCH (s:Substation)-[:CONTAINS]->(e:EnergyConsumer) WHERE
s.mrid='scada_ids/sub/9351_0_20608036‘ RETURN e
¨ With traversals in Java for more complex & specific queries.• To get the equipments powered by a feeder.• To get the transfer feeders of a feeder.• To find topological island in the network.• To simulate contingency.• To evaluate load-flows.
New applications of CIM to Data Analytics | June 2016
THE DATA HISTORIAN
European CIM User Group, Amsterdam.
Friday, June 3rd 2016.
| 14
THE DATA HISTORIAN - PRINCIPLE
§ Stores & process time series measured in the electrical network ¨ Open TSDB is a Data Historian layer on top of Hadoop Hbase.¨ Import raw files, read & write through REST API.
§ But no real time series available.¨ We needed to simulate time series with contingency simulation in Neo4j.¨ Our simulation tools generates CIM DiscreteMeasurement messages.¨ Converted & written through the REST API.
New applications of CIM to Data Analytics | June 2016CIM DiscreteMeasurements
Graph Database
Neo4j
Data HistorianOpenTSDB
TriplestoreStardog
Simulation tool
DATA VISUALIZATION
European CIM User Group, Amsterdam.
Friday, June 3rd 2016.
| 16
DATA VISUALIZATION
New applications of CIM to Data Analytics | June 2016
§ Visualization modules for operational projects of the french DSO.
CONCLUSION, LIMITATIONS & NEXT STEPS
European CIM User Group, Amsterdam.
Friday, June 3rd 2016.
| 18
CONCLUSION
§ The CIM is relevant for data collection.¨ Data Analytics requires the use of many heterogeneous data sources (real world).¨ The CIM assures a single & consistent representation of heterogeneous data.
§ The CIM is relevant for data processing.¨ The model is flexible enough to be adapted to different kind of storage.
• For the complete description of the network in the triplestore.• Just for the description of the topology in the graph database.
¨ The CIM proposes a single & consistent representation of the MV and LV network.
§ The CIM is relevant for data cleaning.¨ The CIM naming classes¨ Necessary to activate the sameAs property in the triplestore.
§ But no real time series available for our platform.¨ Is the CIM also relevant for time series processing ?
New applications of CIM to Data Analytics | June 2016
| 19
NEXT STEPS
§ The Data Science Profile (« Doing Data Science » - Cathy O’Neil & Rachel Schutt).¨ So far, we mainly addressed Computer Science, Data Viz & Domain Expertise.¨ On the next steps, we need to adress Statistics & Machine Learning.¨ CIM also relevant for these domains ?
New applications of CIM to Data Analytics | June 2016