From Zero to Data Flow in Hours with Apache NiFi

25
Copyright © 2016, Schlumberger, All rights reserved. rom Zero to Data Flow n Hours with Apache Nifi Hadoop Summit – San Jose 2016 Chris Herrera Schlumberger

Transcript of From Zero to Data Flow in Hours with Apache NiFi

Page 1: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

From Zero to Data FlowIn Hours with Apache Nifi

Hadoop Summit – San Jose 2016

Chris HerreraSchlumberger

Page 2: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Agenda

• Why is composable data flow important to the drilling industry

• Current State of the System

• The Breaking Point to the new system

• An unexpected workflow in testing

• How are we using it today

• What’s Next

Page 3: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Legal Notices This presentation is for informational purposes only. STATEMENTS AND OPINIONS EXPRESSED IN THIS PRESENTATION ARE THOSE OF THE PRESENTER AND DO NOT REFLECT THE OPINIONS OF SCHLUMBERGER. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY REPRESENTATIONS AND/OR WARRANTIES EXPRESS OR IMPLIED. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY RESPONSIBILITY FOR THE CONTENT, ACCURACY, AND/OR COMPLETENESS OF THE INFORMATION IN this presentation. This presentation, and any recordings or reproductions in various media formats, including, without limitation, print, audio, and video, is the copyrighted work of Schlumberger, and Schlumberger hereby retains all intellectual property and/or proprietary rights related thereto. Schlumberger and the Schlumberger logo are trademarks of Schlumberger in the U.S. and/or other countries. Other names and brands referenced in this presentation are the trademarks of their respective owners, and any references thereto are not endorsements or approvals. Copyright © 2016, Schlumberger, All rights reserved.

Page 4: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Introduction

• 2 Years managing product development and innovation teams working on real time data ingestion and delivery

• 5 years of experience in the Hadoop ecosystem

• 11 years of experience with various aspects of the oilfield (operational and technical)

Chris HerreraSchlumberger

Page 5: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Wireline

Measurement / Logging While Drilling

Mud logging

Fluids

Completions

Cementing

Rig • Several contractors brought in to develop and complete the well

• Can be comprised of one, or most of the time many companies

• All bringing their own system, a lot of times without a central repository of data

• Can be within decent cell connectivity, or out deep in the middle of a jungle with only 128k of high latency bandwidth

The Major Components of a Drilling Project

Page 6: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Where Does This Data Need to Go?

RT Server

Operational Support

Client Monitoring

Processing and Print Centers

Page 7: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Workflow of Data During and Post Operations

Proc

essin

g Ce

nter

Acqu

isitio

n

Data

Ser

ver

Classification & Labelling

Quality Control

Classification

Quality Control

Hosting

QC & Labelling

Conversion

Data Delivery

KPI &

Rep

ortin

g ProcessingAcq

Sales and Job Planning

Data Processor

Customer

Manager

Client Data Delivery

Sales

Field Engineer

Page 8: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Input

DLIS

LAS1.22.0 3.0

WITSLevel 0Level 1Level 2

CSV

Profibus Modbus

What Does This Mean In A Data Sense

Output

CSV PDS

LAS1.22.0 3.0

DLIS

RT Server

Page 9: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

What Does This Mean in a Volume Sense

~9000Users / Month

~10Files / Minute

~480Data

Queries / sec

~3050 Wells / month

Page 10: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Context

Fidelity

TimeAcquisition - Field Interpretation - Office

A Quick(ish) Note On The Importance of Data Provenance

• Need to retain the fidelity throughout the flow.

Page 11: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Typical Data Problems Concerns

• What is the time zone of the data we are receiving – one day UTC...

• ”Ahh, I see you did not implement that part of the standard...”

• Wait, Why are you sending data at 5 times the sampling rate of the sensor...

• I did not get the memo that you were changing your data model today...

• Governmental / Client data residency concerns

Page 12: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Current Solution…

• 100+ Man Years of effort over 14 years

• ~2,000,000 + Lines of Code

• Extreme barrier to entry for workflow changes

• Very little understanding of what happened to the data

Input

DLISLAS1.2

2.0 3.0

WITSLevel 0Level 1Level 2

CSV

Profibus Modbus

Output

CSV PDSLAS1.2

2.0 3.0

DLIS

RT Server

Page 13: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

We Needed A Simpler – Maintainable Solution…

Page 14: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

The Original Plan…

Rabbit MQ

DLIS Parser

ETP Endpoint

LAS Parser Data

Writer

{}DB

Event Publisher

Node JS

What About:

• Data cleansing

• Routing

• The ability to debug what has gone wrong

• TIME (estimated 6 man months)

Page 15: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

How does Nifi fit into the equation?

• Knowing where data came from is crucial (and often missing) to real time decision making

• The ability to visualize the data flow at a granular level aids in troubleshooting and operational understanding

• With several processors already available, there is a low barrier to entry when it comes to data flow creation

Page 16: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Enter Nifi…

Processor Creation Data Flow Creation Creation Play…

10 Man HoursETPWITSML 1.3.1.1 / 1.4.1.1LAS 1.2 / 2.0

1 Man Day

Page 17: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Prototype Setup

Data Source Processor Input

Data Cleansing

Data Enrichment

{ }Repo

Data Storage

Put Data

2 Man Days

• Append Well Name• Append Client Name• Append Run name• Append Pass Name

Process Group:GetUpdate

Process Group:Fix Time ZoneRemove Absent indexes

Data Cleansing

Routing

Page 18: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

What About Testing!

Page 19: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Testing Landscape Today

2.2 TB Test Data

• 22 Applications

• 14 Different formats of data

• Data of questionable quality

• Stored on a file share

Effort

• .5 man effort / sprint on maintenance

• 2 weeks to perform a full test

Page 20: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Step 1: Data Set Curation – Creating the Set of Reference

LAS1.22.0 3.0

WITSLevel 0Level 1Level 2

CSV

Clean Test Data Set

2.2 TB Test Data

6 Hours

Page 21: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Docker

Step 2: Immediate Test Harness

Clean Test Data Set

• Step 1: Need Data

• Step 2: Docker pull xxx.xxx.xxx.xxx:xxxx/flowTest

• Step 3: add put processor

• Step 4: start dataflow

From: 2 weeks to setup a test to:

Page 22: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

• Docker

Step 3: Immediate Live Data Testing

Production RT System

Processor Input

Testing Processor

Group

Anonymize Data

• Significantly cuts down time to test application against real data• Especially in

brownfield applications

• Brings a level of confidence to the project that otherwise would be missing.

Page 23: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Next Steps

Page 24: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provenance

RT Server

• Understanding the chain of custody from sensor to user

• Tracking the provenance of the data as it traverses through the system

Page 25: From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.

Thank You! Questions?