Big Data Concept - M-Culture
Transcript of Big Data Concept - M-Culture
Big Data – Fundamental Concept
“Big Data should be driven by Business Needs,
not Technology” Nimal Manuel Partner, McKinsey & Company Shilpa Aggarwal Associate Principal, McKinsey & Company
The Business Value to be captured is crucial from Big Data.
Data in Motion
Data at Rest
Data in Many Forms
Information
Ingestion
and
Integration
Landing Area,
Hadoop
Analytics Zone
and Archive
Raw Data
Structured Data Unstructured Data Text Analytics Data Mining
Entity Analytics Machine Learning
Real-time
Analytics
Video/Audio
Network/Sensor
Entity Analytics
Predictive
Stream
Processing Data Integration Data Federation Data Quality
Federation
Data Streams
Master Data Management
Matching and
Linking
Stewardship
Reference Data
Information Governance, Security & Business Continuity
Business
Intelligence
Data Exploration
& Visualization
Predictive
Analytics Big Data
Infrastructure
Programs
Agencies
Researchers
Administrators
Others
Internal
Exploration, Integrated Warehouse
& Marts Zone
Discovery
Deep Reflection
Operational
Predictive
Customer
Call Center Social CRM
Big Data & Analytics – Core Functions & Services
Credit to IBM Reference Architecture
Big Data Analytics – Technology Platform Hadoop & Open Sources
Example of Big Data Platform Architecture
Refer to NigelTebbutt1’s cone-tm-digital-marketing-principles-pdf
Big Data Analytics – Technology Platform Hadoop & Open Sources Hadoop Component Stack
Refer to NigelTebbutt1’s cone-tm-digital-marketing-principles-pdf
Big Data & Analytics – Core Functionalities Technology Perspective
Source Systems
Structured & Unstructured Content
(Big Data Content)
Data Quality
Data Security
EDW, GIS,
Data Lake,
Data Virtualization
ETL/ELT
Data on Demand
Information Sharing
Descriptive, Predictive,
Prescriptive & Cognitive Analytics
Streams Analytics
Persistent Relationship Awareness
Content & Sentiment Analytics
Analysis Repository
Workflow & Case Management Visualization &
Link Analysis
Trusted Information Layer Establish, Manage, Share & Deliver information that
is accurate, complete, in context and insightful.
- Data Management
- Data Integration
- Data Virtualization
- Data Quality
- Data Security & Privacy
- Data Governance
Analytics Layer Intelligence, Descriptive & Predictive Analytics against
structured, semi-structured and unstructured information
Visual Analysis & Collaboration Operational dashboards, workflow, case adjudication
Operational Dashboards
Information Exchange
Credit to IBM Reference Architecture
Big Data Analytics Tool - R Model (example)
• What is R? – Open Source Data Analysis Software R Language (Procedural Language e.g. If-then-else)
R Engine, R Library, PMML package for R (Predictive Model)
Open for integration: SAS, SPSS, Excel, SQL Server, Oracle, …
• R-Model Development Stats, Math, Data Science
Big Data Statistics In R
Distributed Computing on Hadoop
Advanced Analytics
• R-Model Development examples Linear Regression, Logistic Regression, Multiple Regression
ANOVA, ROC Curve
Principal Components Analysis (PCA)
Decision Trees, Random Forests
Support Vector Machines
Neural Networks
Markov Chain Monte Carlo
Social Network Modeling
Geo Location
Face Recognition
etc.
Note: R .vs. Python: Similar, Python-Object Oriented with easy-to-understand syntax, R's functionality is developed with statisticians in mind and strong data visualization capabilities.
Data Visualization with R Example – Cowplot CRAN package (ggplot2 add-on)
Comprehensive R Archive Network (CRAN) is the main repository for R packages.
Ministry of Culture – Big Data Sources Digital Technology for Heritage Culture
Canadian Museum for Human Rights - Intangible Collections Transmedia Storytelling: watching films, playing games, reading texts, observing artefacts, being immersed in mixed-media environments, …
Ministry of Culture – Big Data Sources Digital Technology for Heritage Culture
Smartphone + GPS + Compass feature +high speed wireless network AR/MR
Social Engagement &Text Analytics
• With Social Engagement & Text Analytics, we can know
who involved
the number of ‘check-in’s
Polarities of the messages
Centralized Data Storage & Analysis
• CMS – Content Management System Enterprise Search Engine Archiving Engine Open Data APIs
• Web Content Analysis Retain audience Understand how the site measures up against the others Know where the content needs improvement