A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

29
A Data Lake and a Data Lab to Optimize Operations and Safety Within a Nuclear Fleet Hadoop Summit 2016, San José, June 30th Marie-Luce PICARD, EDF R&D – [email protected] Jean-Marc RANGOD, EDF-DPNT Christophe SALPERWYCK, EDF R&D Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D

Transcript of A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

Page 1: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

A Data Lake and a Data Lab to Optimize Operations and Safety Within a Nuclear Fleet

Hadoop Summit 2016, San José, June 30th

 Marie-Luce PICARD, EDF R&D – [email protected]

Jean-Marc RANGOD, EDF-DPNT Christophe SALPERWYCK, EDF R&D

Special thanks to Raphaël QUERCIA EDF-DTG, Carole MAI and Amandine PIERROT EDF R&D

Page 2: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

2

Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION

Brice Richard - Flickr

KC Tan Phoyography - Flickr

Page 3: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

3

Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION

Brice Richard - Flickr

KC Tan Phoyography - Flickr

Page 4: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

4

ELECTRICITY GENERATION623.5 TWH

All electricity-related activitiesGenerationTransmission & DistributionTrading and Sales & MarketingEnergy services

Key figures* €72.9 billion in sales 38.5 million customers 158,161 employees worldwide 84.7% of generation does not emit CO2

2014 INVESTMENTS €4.5 BILLION

EDF: A GLOBAL LEADER IN ELECTRICITY

*as of 2015

EDF :AN EFFICIENT,

RESPONSIBLE ELECTRICITY COMPANY

AND THE CHAMPION OF LOW-CARBON

GROWTH

Page 5: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

EDF 2015 I

WORLD’S LEADING OPERATOR, EXCELLENT PERFORMANCE IN FRANCE72.9 GW installed capacity, 54% of the Group’s net generation capacity

477.7 TWh generated, 77% of the Group’s output

58 reactors operated in France, 15 in the UK

3 EPR under construction: — 1 in Flamanville (France) — 2 in Taishan (China)

2 EPR in project phase

OSART safety audit17 best practices identified by IAEA

France Best generation performance for six years

UKWorld record for safety in the workplace

China Strengthened cooperation agreement with CNNC

NUCLEAR

P.5

Page 6: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet
Page 7: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

R&D KEY FIGURES

Page 8: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

Scientific partnerships with actors of Paris-Saclay

research departments8

exceptional buildings4

outstanding hall test1 Unique equipment,

innovative communication tools

Diverse areas of expertise

1500work stations

Plenty of collaborative spaces

EDF LAB PARIS-SACLAY

Page 9: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

9

Main Big Data related challenges for EDFPower Generation

Process monitoring and condition-based maintenance from sensors

Power generation forecasting for renewables

Energy management Load forecasting Balancing and optimizing generation and consumption

(using smart metering information, including renewables)

Electrical networks Smart Grid operations (local) Condition-based maintenance

Customers and sales New services to customers using smart-metering data Smart Homes, Smart Building, Smart Cities management

related to energy

Page 10: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

10

Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION

Brice Richard - Flickr

KC Tan Phoyography - Flickr

Page 11: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

11

Operations and maintenance of the nuclear fleet

The maintenance policy of EDF generation fleet is optimized to ensure reliability and safety of equipment and systems while strengthening our competitiveness: Have better diagnosis, improved performance and availability Make a better use of data and documents, so far stored into Data silos

More globally, the IT teams and projects aim at: Strengthen performance of operations and maintenance through a global fleet approach Simplify the Industrial Information System architecture Improve and develop the way we use our data Accumulate and archive data through time

… while reducing costs

Page 12: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

12

Voluminous and heterogeneous data …. stored in data silos

Source : Wikipedia

One DB by nuclear site, gathering data from sensors. Use of Data Historians.

Focus on data: High volume:

data is stored up to 40-60 years (lifetime of the plant) SCADA data can be sampled every 20 to 40 ms (but mainly a few

seconds) Around 10.000 sensors per plant

Variety: Data is heterogeneous Time series, images, documents Various data sources

The actual systems (historians) don’t allow too many concurrent access, and their SLA are quite bad

Page 13: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

13

A Data Lake for the nuclear fleet

ESPADON : the Data Lake for the nuclear fleet

One DB by nuclear site, gathering data from sensors. Use of Data Historians.

Source : Wikipedia

© M. Caraveo, Hadoop cluster NOE data center

Page 14: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

14

Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION

Brice Richard - Flickr

KC Tan Phoyography - Flickr

Page 15: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

| 15

A data lake for the nuclear fleet: big picture

….

Files (chemical

information)

Historian - SCADA

Files (dosimetry)

E-monitoring application

Viz

Interactive queries and

reporting

ODBC

Web Service

Web Service

Hadoop cluster – ESPADON Data Lake

Reports

© M. Caraveo, Hadoop cluster NOE data center

Page 16: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

16

Zoom on data 4 generations of plants, but high level of normalization of data and sensors (for

example, use of trigrams for identification of elementary systems) Two main types of sensors : ANA (for analogic) and TOR (for state events)

Time series

Volume For the POC, 10 plants, 2 years: about 20 billions of points Target (59 plants) : 15 To of data (all plants, whole lifecycle)

Metric, global Date Value QualityBU2ABP177MT- 2015-04-30T22:05:00.000Z 156.6 Good/MBU2ABP177MT- 2015-04-30T22:06:00.000Z 156.4 Good/MBU2ABP177MT- 2015-04-30T22:07:00.000Z 156.2 Good/MBU2ABP177MT- 2015-04-30T22:08:00.000Z 156.0 GoodBU2ABP177MT- 2015-04-30T22:09:00.000Z 156.2 Good/MBU2ABP177MT- 2015-04-30T22:10:00.000Z 156.4 Good/MBU2ABP177MT- 2015-04-30T22:12:00.000Z 156.7 Good/MBU2ABP177MT- 2015-04-30T22:14:00.000Z 157.1 GoodBU2ABP177MT- 2015-04-30T22:15:00.000Z 157.3 GoodBU2ABP177MT- 2015-04-30T22:16:00.000Z 157.5 GoodBU2ABP177MT- 2015-04-30T22:19:00.000Z 157.3 Good/MBU2ABP177MT- 2015-04-30T22:20:00.000Z 157.1 Good/MBU2ABP177MT- 2015-04-30T22:21:00.000Z 157.3 Good/MBU2ABP177MT- 2015-04-30T22:22:00.000Z 157.1 Good/MBU2ABP177MT- 2015-04-30T22:24:00.000Z 156.9 Good/MBU2ABP177MT- 2015-04-30T22:27:00.000Z 157.1 Good/MBU2ABP177MT- 2015-04-30T22:28:00.000Z 157.3 Good/MBU2ABP177MT- 2015-04-30T22:29:00.000Z 157.5 Good/MBU2ABP177MT- 2015-04-30T22:30:00.000Z 157.7 Good/M

Page 17: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

17

Data model

Use of HBASE and PHOENIX Distributed key/values store Allows models update (normalization requirements evolution, new indicators… new plants) Phoenix for SQL compliance + BI tools

Tables 3 tables : DDT, ANA, TOR Rowkey : <sensorid, timestamp> (queries mainly consider one or several sensors for a period of time) Sequential storage ; split into Hfiles and Hregion according to the plant unit

Clé ColumnFamily Colonne Valeur Phoenix typem (concat(metriqueid, timestamp))

0 v H_ValeurANA Floatq H_QualitéANA Char(10)n H_NiveauxANA varchar(10)

Clé ColumnFamily Colonne Valeur Phoenix typem (concat(metriqueid, timestamp))

0 v H_ValeurTOR Varchar(10)q H_QualiteTOR Char(10)n H_NiveauxTOR Varchar(10)

Page 18: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

18

Validation and performances evaluation

POC validation Upload of historical data; queries / analyses Existing functions: viz, reports, services Data injection: SCADA for the whole fleet,

integration of other sources of data

Results 6 weeks (estimated) needed to upload historical data

from 59 plants Queries for validating the model :

Use of Jmeter for simulating load With or without insertion workload ~ < 1 second for drawing a curve for a selected month

Integration of an existing GUI for viz (realized within a few days)

Validation of specific calculation within reports ODBC link for specific e-monitoring application Integration of various sources of (structured) data into

the data lake ‘Real-time’ insertion of data (micro-batch):

Up to 2M points / s Very low latency between insertion and availability (< 10s)

SELECT MIN(v), MAX(v), FIRST_VALUE(v) WITHIN GROUP (ORDER BY ts

ASC),LAST_VALUE(v) WITHIN GROUP (ORDER BY ts ASC),

TO_CHAR(ts, 'dd') as day, TO_CHAR(ts, 'HH') as hour,

TO_CHAR(ts, 'mm') as minute,count(*) as cnt

FROMORLI_ANA

WHERE m = ? AND

ts > current_time()-1 AND //last 24hts < current_time() GROUP BY

day, hour, minute

Phoenix query (ANA)

Page 19: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

19

Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION

Brice Richard - Flickr

KC Tan Phoyography - Flickr

Page 20: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

20

Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet

Active and reactive power are indicators of constraints on alternators: effect on their wears

• ~ 50 plants• 20 years of data• 10 min interval data

• Phoenix queries allow to select plants and periods of time• Compute and show reactive power per day or per hour of the

day• More detailed analysis • Fleet level analysis• Interactive queries

Page 21: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

21

Added value of data science algorithms on heterogeneous data: Operations and maintenance can be better optimized through data analytics run on data coming from the whole fleet

Monitoring and control of contractual agreements when network frequency varies (plants have to contribute to the global balance)

• Pattern matching• Response time for different plants

• Different levels of analysis : by plant, by generation, global

• Generic approach implemented for any kind of patterns

Page 22: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

22

Added value of data science algorithms on heterogeneous data

Prediction of plants cooling according to the quality of incoming water in the plants

• Correlations?• According to the plants• Use of GAM models

• Integration of two internal sources + external data

• Better understanding• // Work in progress //

Page 23: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

23

Integration of data science and visualization: architecture

Hadoop Cluster Web Service REST(VM)

Browser

Page 24: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

24

Integration of data science: a global approach

Pre-processing

Data qualitySamplingSynchronization…

Selection and queries

ThresholdPattern matchingPeriod of time…

Analysis and data science

ReportingExploratory analysis (distribution …)Modelling …

Page 25: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

25

Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION

Brice Richard - Flickr

KC Tan Phoyography - Flickr

Page 26: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

26

A Data Lab in progress: a team, an approach … … and some questions

Objectives: Bring value from data analytics

Issues: Skills and organization (between entities) Architecture : Operational Hadoop cluster and loads (use of a multitenant

enterprise cluster) Other loads (data science) Data prep within Hadoop + edge machine for data science (Spark, R,

Python) How to quantify value Developments costs and maintenance How to industrialize

Source: Xebia

Page 27: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

27

Outline1. A FEW WORDS ABOUT EDF2. CONTEXT AND OBJECTIVES3. A DATA LAKE FOR A NUCLEAR FLEET4. DATA SCIENCE ALGORITHMS FOR OPTIMIZING OPERATIONS5. A DATA LAB IN PROGRESS6. AS A CONCLUSION

Brice Richard - Flickr

KC Tan Phoyography - Flickr

Page 28: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

28

Takeaways A Data Lake for our nuclear fleet

In progress : industrialization and decommissioning of Historian applications Great reduction of licensing costs

A Data Lab under construction POCs showing the added value of data science algorithms

predictive maintenance In the context of fleet renovation for plant life extension (major overhaul program): operations & maintenance, generation

costs optimization Issues remaining : skills, organization, technical architecture, quantify value

Perspectives and technical issues: Data lakes and labs for other fleets (thermal plants, hydro, renewables) Scalable time-series analytics (synchronization, missing data …) Handling heterogeneous data (textual, images, graphs …) IoT platform

Page 29: A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear fleet

References

A proof of concept with Hadoop: storage and analytics of electrical time-series. Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, June 2012: http://www.slideshare.net/Hadoop_Summit/proof-of-concent-with-hadoop

Massive Smart Meter Data Storage and Processing on top of Hadoop. Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012, Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, 2012: http://www.cse.buffalo.edu/faculty/tkosar/bigdata2012/program.php

Searching time-series with Hadoop in an electric power company. Alice Bérard, Georges Hébrail, BigMine Workshop, KDD2013, Chicago, August 2013: http://bigdata-mining.org/

Real-time energy data-analytics with Storm.Rémy Saissy, Marie-Luce Picard, Charles Bernard, Bruno Jacquin, Simon Maby, Benoît Grossin, Hadoop Summit 2014, Californie, USA, June 2014: http://fr.slideshare.net/Hadoop_Summit/t-525p212picard

Computing Data Quality Indicators on Big Data Stream Using a CEPWenlu Yang, Alzennyr Gomes Da Silva, Marie-Luce Picard, IEEE Xplore - IWCIM 2015, Prague, Novembre 2015.

Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical NetworkGuillaume Germaine, Thomas Vial, Hadoop Summit Europe 2016, Dublin http://www.slideshare.net/HadoopSummit/exploring-titan-and-spark-graphx-for-analyzing-timevarying-electrical-networks