Platform for Data Scientists

16
Platform for Data Scientists Binu K, Architect Analytics Platform www.subex.com 1

Transcript of Platform for Data Scientists

Page 1: Platform for Data Scientists

www.subex.com

Platform for Data ScientistsBinu K, Architect Analytics Platform

1

Page 2: Platform for Data Scientists

Why Platform?

www.subex.com2

Page 3: Platform for Data Scientists

Data and Analytics

Capture

• Acquire, extract, parse, aggregate

Analyze

• Feature Engineering, Exploratory analysis

Modelling

• Machine learning, Statistics, Optimisation

Analytics Output

• Application to live data - Trends, Prediction

Communication of Results

• Dashboards and Reports

The process & pain areas

Time taken for data into insights – Few Months

3

60 – 75%

Credits : Forbes

Page 4: Platform for Data Scientists

www.subex.com 4

Advantages

Automate repeated routine jobs• Data load• Preprocessing

Maximum resource Utilization• Scheduling job overnight

Focus more on business• Look different use cases• Solution areas

Integrated tool box• Combine tools into one

environment

Page 5: Platform for Data Scientists

www.subex.com 5

Expectations

Workbench• Exploratory Data Analysis• Advanced Modelling• Distributed

Architecture

Bespoke Algorithms• Customized ML algorithms• Custom Approaches

Industrialization• Packaged Analytics

Platform

Page 6: Platform for Data Scientists

Workbench

www.subex.com6

Page 7: Platform for Data Scientists

Work BenchEDA

7

Querying capabilities• Pointed queries• Aggregations• Partitioning• Windowing• Analytical functions

Descriptive Stats• Univariate analysis• Bivariate analysisPredictive Modeling• Building and testing• Ensemble

Page 8: Platform for Data Scientists

Bespoke Algorithms

www.subex.com8

Page 9: Platform for Data Scientists

www.subex.com 9

Customization

• Decision Trees/Random Forests• Handling categorical values• Identify top reason• Custom node labelling

• K-Means• Weighted Distance • Geospatial distance - Harvesine distance

• Social Network Analysis• Build call network• Community detection• Influencer identification

Domain & scale

Page 10: Platform for Data Scientists

Packaged Analytics

www.subex.com10

Page 11: Platform for Data Scientists

Objective

www.subex.com 11

Pareto AnalysisExample

Selection of a limited subset which produces significant overall effect. Two comparable metrics with unbalanced magnitudes of cause & effect are identified

Samples

• Smart phones constitute 27% of all handsets but contribute to 95% of all mobile traffic

• 75% of the of the revenue is generated from 15% of distinct rate plans• 10% of distinct problem areas are responsible for 83% of total complaints

Use cases

Can be used to identify impact of a causal metric on a outcome metric.

Page 12: Platform for Data Scientists

Private & Confidentialwww.subex.com

ROC® Analytics & InsightsData Flow

12

Streaming & Batch Sources

StructuredROC FMS ROC RA, ROC PS etc.

UnstructuredLogs, Tweets, DPI, Mobile App, ERP etc.

ProfilerDomain Guided Analytics

Analytical EngineDistributed ML and Statistical Techniques

Self LearningContinuous Feedback for Periodic Improvement

Signal Hub

Domain and Analytical Inputs

Daily ProfilesProfile for a day

Profile Manager

Master ProfileProfile from many days

Pareto Analysis

Machine Learning & Statistics Libraries (Mllib, Scikit learn etc.)

AP4

AP2

AP5

AP3

Many more….

Page 13: Platform for Data Scientists

www.subex.com 13

Recipe for Success

Regardless of what some software vendor advertisements may claim, you can’t just purchase some Analytics software, install it, sit back, and watch it solve all your problems.

Right combination of domain (business acumen) and analytics is required to solve any business problem

“There is a tendency of solving one’s problems by means of much equipment rather than thought."  

Alan Turing.

Page 14: Platform for Data Scientists

www.subex.com 14

ROC® InsightsTechnologies

Data Ingestion Data Storage Modelling/Profiler Reporting

Page 15: Platform for Data Scientists

www.subex.com

Thank You

[email protected]

15

Page 16: Platform for Data Scientists

TechomicsArchitecture

16