Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value •...

10
Unlock the Power of Big Data: Data Lake Industrialization Pierrick Condette Crédit Agricole Consumer Finance IT Marketing & BigData Lead France Jean-François Guilmard Accenture Big Data Lead France

Transcript of Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value •...

Page 1: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

Unlock the Power of Big Data:Data Lake Industrialization

Pierrick Condette

Crédit Agricole Consumer Finance

IT Marketing & BigData Lead France

Jean-François Guilmard

Accenture

Big Data Lead France

Page 2: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

Crédit Agricole Consumer Finance

European Consumer Finance key actor

Part of Crédit Agricole S.A. group

French market leader

Page 3: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

A Digital Transformation Plan: “CA CF 3.0”

Our Digital Transformation

CA CF IT Transformation Principles

CA CF Datalake illustration

A Journey: deploy a BigData platform

A Major Ambition: becoming a Data Centric company

• Twitter

• INSEE

• etc.

• Eulerian

• Dynatrace

• 1000mercis

Page 4: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

The “Data Lake” Value• Centralize and value company data to answer Business needs & expectations from all CA CF Directions

• Ease and accelerate projects delivery in Operational, Decisional and Data Science domains

• Separate platform (no data exchanges)

• Integration & ML industrializationplanned in 2018

For Streaming:

POC in progress, first usage in 2018

Page 5: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

The “Data Lake” Program

Infrastructure

Industrialization

Business

Projects

More than 20 projects identified :

8 LIVE (e.g.: website KPIs monitoring, 360° customer, regulatory calculation, DMP data integration, reseller support)

> 10 in Progress (client segmentation, real-time reporting, social media loans subscription, data science lab, BI decommissioning)

Mixed SCRUM Team (Architects, Data Engineers, Ops)

Planning & Definition Infrastructure

Industrialization

Business Projects

Winter 2016 Summer 2016 Fall 2016 Winter 2017 Fall 2017

• Paris = 4 persons

• Nantes (France) = 4-6 persons

• Mauritius = 6-8 persons

Summer 2017

Major Go-Live

Building a common assets base: policies, procedures, referential,

standard reusable components, normalized data, etc.

Provided by group

specialized entityMapR Converged Data Platform HPE Vertica Elasticsearch

Page 6: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

Why TALEND ?

HISTORY• Already used as an ETL for operational data transformations (Mainframe, etc.)

• Internal positive feedbacks, existing guidelines and qualified administrators

BENEFITS✓ Ease internal team transition from BI to Big Data

✓ Simplify, standardize then accelerate the creation of distributed processing (components logic)

✓ Enable industrialization and consistency: common workspace for all projects and developers where policies and best practices are applied(e.g. naming rules, pre- & post- jobs, joblet reuse, etc.)

✓ Ease testing and deployment on environments thanks to Talend Command Centre (TAC).

✓ Foster client ownership and jobs maintenance thanks to jobs readability

Page 7: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

TALEND Spark Job Examples (1/2)

1. Hive table reading to get “Contracts”

over 24-month (filter on date)

2. Partitioning the 900 millions Contract

rows by Client

3. Client aggregation & KPI calculation

4. Results writing into MapR-DB

MapR table truncate performed within

shell script before launching job

SPARK Tuning (config file) 1 2 3 4

Client Segmentation (Monthly)

• 13 similar jobs for pre-KPIs calculation + 3 final jobs to generate the two ‘X’ & ‘Y’ Axis before Segmentation (‘X’ & ‘Y’ crossing)

Page 8: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

TALEND Spark Job Examples (2/2)

1. MapR-DB table reading to get JSON

files (filter on rowkey Hive not required)

2. JSON data extraction (based on a

Joblet – same for Full & Delta)

3. Data transformation & writing into

Vertica (JDBC)

4. Rejection management

5. If OK, rowkey deletion from MapR-DB

source table (Java custom code)

1 2 3 4

IDD3.0: Portefeuille (every 5min)

• Delta & Full Jobs to structure raw information (JSON file) for a business application (should evolve to streaming logic)

1 2 3

4

5

Page 9: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

Lessons Learned : 4 Success Factors

Invest since the beginning on

technical expertise

• Architecture decisions based on prototypes

• Support complex developments

• Perform performance analysis and processing

jobs tuning and optimization

Build a common assets base

• Ensure consistency between projects (technical,

process, data)

• Reduce time and costs delivery

• Ease maintenance and supervision

Organize a multidisciplinary

core team to promote co-working

and knowledge sharing

• Build/maintain internal resources skills on the

complex and evolving BigData ecosystem

• Ease integration process thanks to direct access

to internal resources (key POCs, legacy docs, etc.)

Prioritize business projects

to grow at the right pace

• MVP approach led by PO / Scrum Master couple

• Quick value to selected business projects

• Maintain positive dynamic based on proven

business results

Page 10: Unlock the Power of Big Data - info.talend.com _TalendConnect… · The “Data Lake” Value • Centralize and value company data to answer Business needs & expectations from all

Be Eligible to Win Prizes at the End of the Show!