7/28/2019 Satyam OpenAnalytics NYC
1/24
1BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
BIG DATA ANALYTICS&PITFALLS TO AVOID
Dr. Satyam Priyadarshy
June 17, 2013 New York City
7/28/2019 Satyam OpenAnalytics NYC
2/24
7/28/2019 Satyam OpenAnalytics NYC
3/24
3BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
BIG DATA Buzz - Should Business Care?
Big Data future is bright.Organizations that caneffectively leverage BigData without sinking in
the Big Data Hole willrealize additionalbusiness value, a loyalcustomer base andincreased profits.
2.5 Exa bytes of newdata/day generated
What we know?
A top business priority
Big opportunities available
Everyone is talking about it
But...
Emerging technology helps
Adds value definitely
Definition, Leverage is not clear
Big challenges for companies
The path to execute is less understood
Realization is complex but getting easier
Expertise is demand but supply is short
7/28/2019 Satyam OpenAnalytics NYC
4/24
4BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
BIG DATA - 7 Vs that describe
VELOCITYMoving away frombatch processing to
real-time addition ofmassive data for nearreal-time analysis
VARIETYStructured andunstructured data - e.g.
POS data, Sensor Data,transaction data, callcenter data, supply chaindata, new media data,etc.
VERACITYReliability andpredictability of not
so precise data types.E.g. Sentiment data,Weather data and itsimpact on business.
VOLUMEThe ever growing dataform Terra bytes to
Peta bytes to Zettabytes
Big Data definition isevolving. The origin ofword dates back to 1990.Typically 4 Vs defined
Big Data, but I stronglyrecommend the 7 Vsthat describe Big Data.
(Source:chiefknowledgeguru.com)
80% of data generated isunstructured
VALUEUnless value isrealized, Big Data isa just Big Hole
VIRTUALData resides in virtualenvironment - e.g.POS, Private and Public
Clouds, Geo-located,inside and outsidefirewalls
VARIATIONNo single configurationof the 6 Vs below fitseveryone. There is
variation for eachbusiness.
7/28/2019 Satyam OpenAnalytics NYC
5/24
5BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
KARMA matters
Knowledge
Business,
Technology,PeopleStrategy
Big DataSources,Lifecycle
Re-invest
based onactions
Action
Scalable
Architecture,Infrastructure,Tools &Technology,Resources
Mining the BigData with
targeted andopen mind tofind Gold andother items
Recognition
Revenue By
Sell NewInsights
IncreaseProfitMargins
Add newfeatures to
products &services
Market
Grow Share
CustomerCentricity
Advance
Innovate
with help ofBig analytics
Gather evenmore BigData andkeep goingthrough this
cycle
7/28/2019 Satyam OpenAnalytics NYC
6/24
6BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
KARMA SCORE is calculated using maturity level of thesecapabilities
Parallel Processing, API,Query, Reporting
Data Mining, Analytics,Pattern, Statistics
Machine Learning,Inference predictions
Tools, Technologies,Human Resources
Service to support business Data, Information,Knowledge, Process
PresentationVisualization, Mobility,Collaboration, Exploration
Actions ImproveProduct/Services, GrowRevenue/Profits, Agility
Collection of Raw Data,
Structured&Unstructured, Discovery,Staging
Extract, Load, Transform
Data Connectors, Access,Use, Move
Data Storage: Hadoop,NoSQL, Key-value, MPP,In-memory, blobs, etc.
Policy, Privacy, Security,
Metadata, Risk, Total cost ofownership, Access control
Data Lifecycle, Data Assets,SLA, ROI, ROA, Data Quality
Physical Store, VirtualStorage, Encryption,Masking, Archive, DisasterRecovery
DataGovernance
andManagement
Big Data
Big Math andBig Analytics
Big Value, BigActions
7/28/2019 Satyam OpenAnalytics NYC
7/247BIG DATA ANALYTICS & PITFALLS TO AVOID
Dr. Satyam Priyadarshy
What ever your KARMA Score is?One can leverage Big Data eventually
The Great Enabler is OPEN SOURCE Revolution
In the last decade or so.
7/28/2019 Satyam OpenAnalytics NYC
8/248BIG DATA ANALYTICS & PITFALLS TO AVOID
Dr. Satyam Priyadarshy
In a Zoo In an Open Environment
OPEN SOURCE Creates a HAPPY, FLOURISHINGEnvironment
7/28/2019 Satyam OpenAnalytics NYC
9/249BIG DATA ANALYTICS & PITFALLS TO AVOID
Dr. Satyam Priyadarshy
Open Source Key Characteristics
FREE (*)
NOT CAGED, NOT
BLACK BOX
MODIFICATIONSALLOWED
MODIFIEDVERSIONS
REDISTRBUTABLE
LIVES INHARMONY WITH
OTHERS
7/28/2019 Satyam OpenAnalytics NYC
10/24
10BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
Open Source BIG DATA PLAYERS
THESE TOOLS ENABLE YOU TO DIG THE GOLD IN BIG DATA(This is not a comprehensive list of tools/technologies)
7/28/2019 Satyam OpenAnalytics NYC
11/24
11BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
ACTION for finding the GOLD
PROBLEMSOLVING
OPERATIONAL
STRATEGICFUTURISTICBasic Analytics
Advanced Analytics
Holistic Analytics
GO FOR THE GOLD
ADDRESSESCurrent Concerns
Reduce Costs
Eliminate Issues
ADDRESSES GROWTH
Customer Centric
Easily Incorporate New Data
Innovation Related
Emerging Trends Adoption
BIG DATA,
BIG MATH,
BIG ANALYTICS
Descriptive Statistics
Inferential Statistics
7/28/2019 Satyam OpenAnalytics NYC
12/24
12BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
THATS A GOLD MINE
7/28/2019 Satyam OpenAnalytics NYC
13/24
13BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
WHATS IN A GOLD MINE?
Gold Suite
BASE Suite
Iron-Manganese
Suite
Gold
Arsenic
MercuryTungsten
Silver
Copper
Lead
ZincBismuth
Cadmium
Molybdenum
Silver
Iron
Manganese
CobaltNickel
Yttrium
To GET GOLD ONE HAS TO DIG DEEPER
IF YOU FOUND
SILVER WHILE DIGGING FOR GOLD
WHAT WOULD YOU DO?
7/28/2019 Satyam OpenAnalytics NYC
14/24
14BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
CASE STUDY DDoS Attack
PROBLEMBIGANALYTICS
THE GOLDKNOWLEDGE
ACTIONS
RECOGINITION
Source of attacks identified
After integrating
Distributed targets
Multiple attack types
Slow performance over
binary data sets
A step closer to solution,but requires more work to
get it near real-time for
actionable insights.
Feedback loop to known
datasets to enhance the
predictability and
performance
45 days later
Its Science not BI
DNS Servers are persistently
attacked to create DdoS
Attacks. Can we predict?
CHALLENGES:
7+ TB / Day
Varied Formats based on
Request and type of
attacks
Hadoop based data storage
APPROACH
Hive / MapR queries and
R for statistical analysis
Interconnection of datawith known data sources
for identification
Tableau and (Open
source DS3.js and
Ploticus) for Visualization
Iteratively optimized
queries for speed
7/28/2019 Satyam OpenAnalytics NYC
15/24
15BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
CASE STUDY- DDoS Attack Pattern Based Study
-200
-100
0
100
200
300
400
500
600
700
800
900
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
Single Day - Outlier Events - 10K Size ::
Zones Hit from Multiple Sources
-200
-100
0
100
200
300
400
500
600
700
800
900
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
Single Day - Outlier Events - 2K Size
:: Zones Hit from Multiple Sources
ABC.TLDABC.TLD
SB
GOLD.TLD
TrafficVolume
Unique ZRatio
AFTER DIGGING FURTHER
Unique ZRatio
7/28/2019 Satyam OpenAnalytics NYC
16/24
16BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
PITFALLS
Lack of knowledge Tools, Data
Science
Too Much Data. Initially mostof it was discarded
HOW TO OVERCOME
Deploy Hadoop Clusters withcheap storage and store withbest possible compression
BIG DATA PITFALLS
Expert, Education, Execution
Big Data can help MOSTBUSINESSES
Executives Not Sure
Belief Big DATA has all theanswers
The Whole Mine is NOTGOLD.. Shows insights andcoach
Education, Best Practices andInsights after mining and finduseful patterns initially
7/28/2019 Satyam OpenAnalytics NYC
17/24
17BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
PITFALLS
Silo Culture
Multiple copies of same datain different formats
HOW TO OVERCOME
Keep Raw Data (along withDR site), Transform duringAnalysis
BIG DATA PITFALLS
Devastating for companies.Single Source of Truth Key toSuccess
Big Data can help MOSTBUSINESSES
Well Established Enterprise DataWarehouse
Intuition Based Culture
Can only focus on Gold, ifyou find Silver and otherprecious metal, you miss themark. Show Insights andMove On To Gold
Keep it for Simple,Operational Analytics,Augment with Big Data forInnovation and FutureGrowth
7/28/2019 Satyam OpenAnalytics NYC
18/24
18BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
Simple way to see some Big Data Challenges
Data acquisition
Storage
Processing1st
Data transport & dissemination
Data management & curation
Big Analytics Tools, Technology, Know-How2nd
Privacy, Security and Disaster Recovey
Technical/Scientific Talent
Cost of all of the above3rd
7/28/2019 Satyam OpenAnalytics NYC
19/24
19BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
KARMA matters
Knowledge
Business,Technology,PeopleStrategy
Big DataSources,Lifecycle
Re-invest
based onactions
Action
ScalableArchitecture,Infrastructure,Tools &Technology,Resources
Mining the BigData withtargeted andopen mind tofind Gold andother items
Recognition
Revenue BySell NewInsights
IncreaseProfitMargins
Add newfeatures to
products &services
Market
Grow Share
CustomerCentricity
Advance
Innovatewith help ofBig analytics
Gather evenmore BigData andkeep goingthrough thiscycle
7/28/2019 Satyam OpenAnalytics NYC
20/24
20BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
THANK YOU UNDERSTAND YOUR BIG DATA KARMA SCORE ANDUnderstand the Big Picture, THE Direction and LEAD
Helps Build
Strong
Foundation
Focus on OUR MOST
VALUED CUSTOMES
INCREASE
PROFITABiLITY
7/28/2019 Satyam OpenAnalytics NYC
21/24
21BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
Appendix
7/28/2019 Satyam OpenAnalytics NYC
22/24
22BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
The Pitfalls for Adopting Big Data
The Big Data Definition of 4 Vs Velocity,Volume, Variety, Veracity is incomplete.
The Belief that Big Data solves everythingfor Everyone.
Big Data is Abound, but Dimensions of itare to be understood
The Loudest Often Wins (LOW) or thehighest paid persons opinion (HIPPO)prevails
Data Driven approach trumps intuition is ahard nut to crack. Really!!
Data for Datas Sake Talent Gap
Data, Data Everywhere
Infighting
Aiming Too High
Reference: Wall Street Journal March 11,2013 on page R4
7/28/2019 Satyam OpenAnalytics NYC
23/24
23BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy
Time Management (ByFrederick Winslow Taylor)
Zero Defects Analysis andPacing of Assemby Line
(Ford)
Statistical Process Control(Walter Shewhart)
Operational ResearchPopularized (Royal Air
Force)
Social NetworkAnalysis
Business IntelligenceTerm coined (H. P.
Luhn)
Artificial Intelligence(John McCarthy)
Exploratory Data Analysis- visualization (John
Turkey)
Business IntellgiencePopularized (Gartner)
Expert Systems (using AI)
The Visual Display ofQuantitative Information
(Edward Tufte)
Data Mining (part ofAI) and Web analytics
Big Analytics
1890 1920 1950 1980 2010
Brief History of Analytics
7/28/2019 Satyam OpenAnalytics NYC
24/24
24BIG DATA ANALYTICS & PITFALLS TO AVOID
DEFINITIONS of Analytics for Business
ANALYTICSAny data-driven process that provides insights
ADVANCED ANALYTICS Helps understanding cause-effect relationship, prediction of future events,
best possible action
BIG ANALYTICS FOR BUSINESS
Relevant for the business, actionable insights for
increasing revenue/profit, value measurement andleverages Big Data.