Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

11

Click here to load reader

Transcript of Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

Page 1: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

https://clearfunnel.com

Boot-strapping a Big Data Solutions Startup:

HPCC Systems on Amazon

Raj ChandrasekaranCTO & Co-Founder

Oct 12, 2016

Page 2: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

Who are we

HPCC Systems in our Journey

Key Innovations

Live Use Cases

The Road Ahead…

2

Page 3: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

3

About Us: Offering an innovative approach to Big Data use cases

ClearFunnel provides custom Big Data Solutions ‘as-a-Service’ on Subscription Fee

Clients only pay low monthly subscription fees for using the solutions

Not a staff augmentation or resource provider

We provide custom solutions ‘as-a-Service’ for Big Data and Data Science use cases

Not a Systems Integration firm

We extensively use HPCC Systems(on a busy day, our clusters grow up to 3,300 slaves)

We thrive in solving complex Machine Learning and Big Data use cases

What we do What we are not

Not a technology consulting firm

We host and operate our solutions on AWS

Not a technology reseller

Page 4: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

4

Cost follows Revenue

No Capex in Big Data Talent, Technology, and Infrastructure

Accelerated Product Launches(~ 5 - 6 weeks)

Low Subscription (Opex) andPay-for-Performance

Zero Risk, Try-before-You-Pay

Seamless and Turnkey Solution as-a-Service

Unique Benefits

Page 5: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

5

HPCC Systems in our Start-up journey

Fail-fast, fail-oftenRe-writing our platform ‘n’ number of times with ECL

proved to be a game-changer

Dev-ops on auto-pilotTrue one-click automation of cluster provisioning, HPCC

Systems setup, ECL execution, and job monitoring

Simplified data-pipelineTech stack: HPCC Systems - Bash/Py - AWS micro-

services

Near real-time with HPCC Systems!We deliver a 30-second index refresh frequency for

a complex IOT analytics use case

Zone of ExcellenceECL re-usable libraries allow us to deliver end-to-

end solutions within 5-6 weeks

Complex Machine Learning, NLPECL helped us implement complex text

analytics, ML, and custom NLP algorithms

``

Page 6: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

6

Key Innovations: Cloud DevOps Automation

Determine Server Type & Qty

based on Mem / CPU / Disk / Network / Load

Bid on the Best Price and Zone

Provision Servers

Auto-detect all Nodes of a Cluster

HPCC Systems Installation and Cluster Config

Start ECL job

Hand-off files to another cluster

Auto Shut Down Cluster, if needed

Post-ECL Housekeeping, Sentinel, Alerts

6

4 4

4 4

3 3

6

Page 7: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

7

Key Innovations: NLP

Article

Recommendation

6 ML

Models

Committee

Optimizer

Social

Signals

Integration

Layered

Taxonomy

Plagiarism

and

Similarity

Entity, Keywords,

And

Topic Extraction

Try out our Entity Extraction Roxie query at

http://tinyurl.com/cf-nlp-test

Page 8: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

8

Key Innovations: Prediction Engine

Key ask of businesses:

Timely identification ofgenuine prospects

Our Client’s IP:

• Collect global B2B web traffic data

• Algorithms to analyze buying behavior

• Data triangulation to identify companies and contacts currently in buying cycle

HPCC Systems Implementation:

• End-to-end, automated data processing pipeline

• ETL

• Prediction, Text Analytics and Scoring Engine

• Reporting

• Data Feeds and API

Page 9: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

9

Some of our other live Use Cases powered by HPCC Systems

IOT and Satellite-signal based global Maritime Domain Awareness and

Supply Chain solution

Advanced Text Analytics and NLPbased medical information solution

Wrapper to embed client’s existing app to provide scale-out

and parallelization capabilities

Streaming Topic analysis and signals to drive Content Marketing

of Brands

Page 10: Boot-strapping a Big Data Solutions Startup: HPCC Systems on Amazon

10

The Road Ahead…

Consolidate existing capabilities…• Advanced Text Analytics and NLP• IOT and Stream processing• Partnering with new businesses / startups to launch their analytics use cases faster

than any alternative

01

Operate the largest deployment of HPCC Systems outside of Reed…• End-to-end automation driven management of our growing HPCC Systems footprint

(scale to thousands of servers)• Continue innovating on and extending HPCC Systems capabilities (using in-

line/embedded coding support, web services interface, command line integration,homogeneous processing and reporting stack, Cassandra option for Dali, MySQLsupport, etc.)

03

Build expertise in…• Image analytics at Big Data scale• Real-time acoustic analytics• Operational analytics (IOTs / Sensors in energy and utilities infrastructure)

02