Big Data Offerings by CIGNEX...

21
CIGNEX Datamatics Confidential www.cignex.com Big Data Offerings by CIGNEX Datamatics December 2012 Presented by Name: Munwar Shariff Email: [email protected] Title: CTO

Transcript of Big Data Offerings by CIGNEX...

Page 1: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Big Data Offerings by CIGNEX Datamatics

December 2012

Presented by

Name: Munwar Shariff

Email: [email protected]

Title: CTO

Page 2: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

BIG DATA CASE STUDIES

CIGNEX Datamatics

2

Page 3: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Patent Search – Situational Analysis using Big Data

• Leading Global Manufacturing Company operating in chemicals, plastics, catalysts

• Challenges Faced – Proprietary Enterprise search engine developed on Oracle Database & IBM Filenet

• Over 80 million patents information (Patent Family, Facet, Proximity, Highlight etc.) stored in various repositories

• Slow, non-scalable and expensive

• Assorted interface to access data

• Solution – Hadoop, Solr and Neo4j based solution

• Synchronization layer – for interface with REST web services

• Persistence layer – Hbase/Hadoop

• Indexing layer – Solr search

• Neo4J to handle Patent Family Calculation

• Migrated millions of records to Apache Hadoop using a CD HBase Hook for Liferay

• Client Benefits – 10x increase in search performance

• Average Search Time reduced to 6.8 ms (from 70 ms)

• Doc Throughput increased to 62/second (from 6/second)

– 20x reduction in TCO • Replaced expensive IBM Filenet and Oracle DB based infrastructure with Open Source tools

• Use of commodity hardware

3

Page 4: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com 4

Client Applications

Repository Server

Persistence Mongodb &

Neo4j

Indexing Engines

External Data &

Applications

External Indexes

Authentication Server

API Server Status Server

Queue Manager

Models Locks

Processors

Controllers

Services Status

Specialized Service Providers

Patent Search – Situational Analysis using Big Data

Page 5: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Managing high volume data feeds through MongoDB

• Leading provider of software based CMS managing entire lifecycle of publishing videos

• Challenges Faced;

– 30 Million INSERTS / hour

– 10 Million UPDATES/ hour

– ~150 GB/Day & ~5 TB/month – High Volume data growth

– CONCURRENT High Volume CRUD’s in REAL TIME

– Poor performance of READ queries

– Difficulty in identifying the Shard Keys, Indexes & Cluster configuration

• Solution

– Identify collections that require Sharding with Ideal shard keys with write distribution / query isolation

– Identify collections that require Indexes to speed up READS

– Scientific Cluster sizing & optimal hardware recommendations

– Options around Data archival for better utilization of Cluster configuration

– Performance Tuning tips around collection design

– Performance benchmarking of suggested shard keys, indexes through Load Testing

• Client Benefits

– 5x improved WRITE / UPDATE Performance through SHARDING

– Better utilization of existing infrastructure in Cluster configuration(mongos, arbiter on APP servers)

– Automated Performance Tuning scripts for Testing the recommended approach

5

Page 6: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Managing high volume data feeds through MongoDB

6

mongod

Secondary

mongod Primary Mongod

Arbiter

mongod Secondary

mongod Primary

Mongod Arbiter

mongod Secondary

mongod Primary

Mongod Arbiter

mongod Secondary

mongod Primary Mongod

Arbiter

mongod Secondary

mongod Primary Mongod

Arbiter

mo

ngo

s m

on

gos

mo

ngo

s m

on

gos

mo

ngo

s m

on

gos

Ap

p

Serv

er

Ap

p

Serv

er

Ap

p

Serv

er

Ap

p

Serv

er

Ap

p

Serv

er

Ap

p

Serv

er

Dat

a T

ier

mongod mongod

mongod

Config Servers

App Tier

Shard 1

Shard 2

Shard 3

Shard 4

Replica Set

Routed Requests from mongos to shards

Routed for non-sharded collections

Lo

ad

Bal

ance

r Solution Architecture

Page 7: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Real-time intelligence for fleet management & worksites

• Leading provider of advanced location-based solutions

• Challenges Faced – Varying formats & sizes of data feeds from different devices on the sites

– ~5 million inserts / day from ~200000 devices

– Improve performance of READS every hour

– Handle disaster recovery from multi-geography data centers

– 24*7 support

• Solution – Overall health check of the system & recommendations

– Efficient indexes based on read patterns

– Robust disaster recovery & failover plan considering different scenarios

– Multi data center deployment planning

– Disaster recovery & Failover testing

– MongoDB Monitoring Service (MMS) setup for cluster administration & maintenance

• Client Benefits – 2x improved performance through RIGHT indexes

– 24*7 support of MongoDB cluster with instant response to issues and 99% uptime

– Real time & instant intelligence on key monitoring metrics

7

Page 8: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Real-time intelligence for fleet management & worksites

8

Connected GPS Devices

Load Balancer

App Server App Server

Primary

Secondary

Secondary

DC - 2 DC - 1

Replica Set 1 A

pp

Ser

ver

Solution Architecture

Page 9: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Hadoop based Log Processing & Analysis

• Global IT Services Company

• Challenges Faced

– Existing RDBMS solution was incapable of aggregating and managing large unstructured logs generated from different systems

• Lack of control over collection and manipulation of log files due to high volume

• Adding a new log cluster to the existing system was difficult and slogged the system performance

• Huge Maintenance costs due to investments to address the high end storage needs

• Solution

– Log Processing and Analysis

• Apache Flume– distributed system for aggregating streaming data

• HDFS – Primary Hadoop Storage system

• MapReduce – Parallel storage to process large amount of data in parallel

• Sqoop – allows efficient transfer of huge data between Hadoop & structured data stores

• Pentaho – Open Source Data Integration

• Client Benefits

– Seamless aggregation and archival of log files irrespective of environment of log files generated

• IT team received a 360-degree view into employee usage patterns

• Rich user interface with accessibility through mobile devices and tablets

• Cost advantage through non dependence on high end storage networks

9

Page 10: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com 10

Scheduler

Dashboard

1) Fetch Logs

from Server to HDFS using Flume

2) Run Map-Reduce

on Logs collected Daily and generate Summary in HDFS M

3) Export Summary

from HDFS to MySQL Using Sqoop M

4) Generate

Reports on Dashboard using Pentaho on MySQL

Data Sources Collection and Analytics Reporting

Mail Logs

Server Syslogs

Web Logs

Firewall Logs

Voip Call Logs

Hadoop based Log Processing & Analysis

Page 11: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com 11

Hadoop based Log Processing & Analysis (Pentaho Reporting for Mobile Devices)

Page 12: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

BIG DATA SOLUTIONS

CIGNEX Datamatics

12

Page 13: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

BigArchive - Enterprise Scale Archival Solution

• Scalable Distributed Repository to archive large number of variety of documents

• Low cost and high performance – uses open standards and open source technologies such as MongoDB, Solr, Apache Tika

• Dynamically captures content and metadata from the documents at load time, stores them in MongoDB and indexes them in Solr

• Provides enterprise search and high performance retrieval of documents

• REST based API interoperable to work with various custom client applications built on Java, PHP, .NET

13

Page 14: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

BigArchive - Architecture

14

Repository

Controller (Custom Java + Netty)

RESTful Service Layer API (jersey)

User Interface

Persist

Object

Retrieve

Object

Index

Metadata Search

Metadata

Web Service

Request Response

CUD Search &

Retrieve

Content

&

Metadata

Page 15: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

By 2015, at least 60% of information workers will interact

with their content applications via a mobile device

Employees work on proposals and

presentation on mobile devices while

travelling.

People use digital assets (videos, images) longer on

Tablets and Mobiles compared to desktops.

15

Mobile Media site with Drupal + MongoDB

Mobile Explosion

Page 16: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Mobile Media site with Drupal + MongoDB

• Fast performance • Large User base • Concurrent CRUD • Access through various channels

• Millions of Digital assets • Variety of content • Complexity of data

• Rich UI Features • Social Features • Mobile Access • Fast search

• Elastic scaling • Cost effectiveness • Centralized storage • Ease of Maintenance

• HIGH Availability • Automatic failover • User management

Velocity Volume

User experience

Scalability

Security & Availability

16

• Easy Integration • Shorter Dev cycle • Faster Deployment • Ease of schema design

Flexibility & Agility

Mobile Media Site

Page 17: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

• Big Data Portal with MongoDB and Liferay provide lower TCO and higher ROI to enterprises

• MongoDB enables Portals for scalability (for huge volumes of content) and flexibility (schema-less content)

• Liferay’s rich user interface, content management, security, social and mobile features complement MongoDB’s powerful storage features

17

Big Data Portal with Liferay + MongoDB

Page 18: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Big Data Portal with Liferay + MongoDB

Page 19: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

RDBMS NoSQL (MongoDB)

Incoming Request

Entire Video (30 MB)

loaded into user device

Size: 30 MB

Stored as a Single Collection

Loading…

Size: 30 MB

3 MB

3 MB

3 MB

3 MB Stored in Collections

(Chunks)

Instantaneous Streaming & playback for Videos

Incoming Request

Individual chunks loaded

leading to no playback problems

Buffer and Playback Problems

Big Data Portal with Liferay + MongoDB

Page 20: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Integrated Business Ecosystem (IBE) Blueprint Big Data – Integral to CIGNEX Datamatics UXP solution

20

Shaping Languages

Metadata Data

Integration Indexing

Graph Database

EDW

MapReduce

Map/Reduce

Databases including

Business Process

Management

Business Intelligence

Enterprise Resource Planning

Customer Relationship Management

Enterprise Content

Management

Portals E-commerce CMS

Legacy Solutions

Proprietary SW

.NET Systems

CMS Repositories

Inte

gra

tio

n

Mobile Social Cloud Rich Experience Browser friendly Real time Contextual

UXP Components

Platform

Analytics

Page 21: Big Data Offerings by CIGNEX Datamaticsinsight.datamaticstech.com/dtlsp/rna_Presales/CaseStudies/BigData... · CIGNEX Datamatics Confidential Big Data Offerings by CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com

Name: Munwar Shariff

Email: [email protected]

Title: CTO

Thank You. Any Questions ?

Making Open Source Work