Gilbane Boston 2011 big data
-
Upload
peter-okelly -
Category
Technology
-
view
2.813 -
download
3
description
Transcript of Gilbane Boston 2011 big data
![Page 1: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/1.jpg)
Get Ready for Big Data
Peter O'KellyPrincipal Analyst, O'Kelly Associates
Hadley ReynoldsManaging Director, Next Era Research
Kathleen ReidySenior Analyst, 451 Research
Wednesday November 30, 20112:40 – 4:00
![Page 2: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/2.jpg)
2
Agenda
• Big data in context• Big structured data• Big unstructured data• Big opportunities and risks• Q&A
![Page 3: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/3.jpg)
3
Big Data in Context
• What is “big data”?– Unhelpfully, both “big data” and “NoSQL,” generally
considered a key part of the big data wave, are defined more in terms of what they’re not than what they are
– A typical big data definition (Wikipedia): • “[…] datasets that grow so large that they become
awkward to work with using on-hand database management tools”
![Page 4: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/4.jpg)
4
Big Data in Context
• With thanks to the Business SOA blog:– “[…] describe Big Data in the same way that the
Hitchhikers Guide to the Galaxy described space:– ‘Space,’ it says, ‘is big. Really big. You just won't believe how
vastly, hugely, mindbogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space, listen...’”
![Page 5: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/5.jpg)
5
Big Data in Context
• Why is big data a big deal now?– Commodity hardware and the Internet
• Capability and price/performance curves that continue to defy all economic “laws”
• Also facilitating compelling cloud services
– Maturation and uptake of open source software, e.g., Hadoop• Powerful and often no- or low-cost
– IT market • Enthusiasm for “NoSQL” systems• Frustration with incumbent information management vendors
– Useful new data sources/resources, e.g., social network activity graphs, the “Internet of things,” sensor networks…
– Competitive and compliance imperatives
![Page 6: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/6.jpg)
6
Big Data in Context
• A big data reality check– “Mindbogglingly”-scale information management is not new
• Consider, e.g., VLDB, multi-billion document repositories, and the World Wide Web…
– What is new and compelling• The combination of market dynamics producing new capability and
price/performance curves• Cloud
– No deep capital investment required to get started– Cloud-based information resources
• Some innovative marketing, suggesting – Self-proclaimed next-generation big data systems are magical and revolutionary– Deployed systems are obsolete and wasteful
![Page 7: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/7.jpg)
7
A Big-Picture Framework
• A digital information item dichotomy – Resources (~unstructured information)
• Digital artifacts optimized to convey stories– Organized in terms of narrative, hierarchy, and sequence
• Examples: books, magazines, documents (e.g., PDF, Word), Web pages, XBRL documents, video, hypertext…
– Relations (~structured information)• Application-independent descriptions of real-world
things and relationships• Examples: business domain databases, e.g., customer,
sales, HR…
![Page 8: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/8.jpg)
8
A Big-Picture Framework
Resource RelationW
ord
docs
DITA
doc
s
XBRL
doc
s
PDF d
ocs
Oper
ation
al d
b
Desk
top
db
Stre
amin
g db
![Page 9: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/9.jpg)
9
A Big-Picture Framework
Resources Relations
Conceptual Resources and links Entities, attributes, relationships, and identifiers
Logical Model: hypertextLanguage: XQuery (ideally)
Model: extended relationalLanguage: SQL
Physical Indexing (e.g., scalar data types, XML, full-text), locking and isolation levels, federation, replication, in-memory databases,
columnar storage, table spaces, caching, and more
![Page 10: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/10.jpg)
10
Agenda
• Big data in context• Big structured data• Big unstructured data• Big opportunities and risks• Q&A
![Page 11: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/11.jpg)
11
Big Structured Data
• NoSQL• Hadoop• RDBMS reconsidered• Back to the bigger picture
![Page 12: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/12.jpg)
12
NoSQL
• No clear consensus on what “NoSQL” means– Started with what it’s against, not what it’s about
• And often finds a receptive audience due to frustration with RDBMS business-as-usual
– The “NoSQL” meme is a moving target• Initially implied “Just say ‘no’ to SQL”• Later quietly redefined as “Not Only SQL”• What may be next: “New Opportunities for SQL”
– I.e., some developers may reconsider the value of SQL and RDBMSs, after hitting NoSQL limitations
![Page 14: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/14.jpg)
14
NoSQL Perspectives
• The “NoSQL” meme confusingly conflates– Document database requirements
• Best served by XML DBMS (XDBMS)
– Physical model decisions on which only DBAs and systems architects should focus
• And which are more complementary than competitive with RDBMS/XDBMS
– Object databases, which have floundered for decades• But with which some application developers are nonetheless
enamored, for minimized “impedance mismatch,” despite significant information management compromises
– Semantic models• Also more complementary than competitive with RDBMS/XDBMS
![Page 15: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/15.jpg)
15
Hadoop
• Hadoop is often considered central to big data– Originating with Google’s MapReduce architecture, Apache Hadoop is
an open source architecture for distributed processing on networks of commodity hardware
• Commercial application domains include (from Wikipedia)– Log and/or clickstream analysis of various kinds– Marketing analytics– Machine learning and/or sophisticated data mining– Image processing– Processing of XML messages– Web crawling and/or text processing– General archiving, including of relational/tabular data, e.g. for
compliance
![Page 16: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/16.jpg)
16
Hadoop
• Hadoop is popular and rapidly evolving– Most leading information management vendors,
including Microsoft, have embraced Hadoop– There is now a Hadoop ecosystem
![Page 17: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/17.jpg)
17
RDBMS Reconsidered
• RDBMS incumbents appear to be under siege, with – IT frustration with RDBMS business-as-usual
• Counterproductive RDBMS vendor policies and attitudes• DBA modus operandi often seen as excessively conservative
– Conventional wisdom about RDBMS limitations for, e.g.,• “Web scale”• “Agility”• The application/database “impedance mismatch”
– The advent of open source and/or specialized DBMSs• E.g., MySQL is the M in the “LAMP stack”• “The end of the one-size-fits-all DBMS era”
![Page 18: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/18.jpg)
18
RDBMS Reconsidered
• An RDBMS reality check– Leading RDBMS products and open source initiatives are very
powerful and flexible• And will continue to evolve, e.g., with the mainstream deployment of
massive-memory servers and solid state disk (SSD) storage
– And they continue to expand• E.g., in-database processing, with, for example, analytics engines
running within DBMS kernels
– But the RDBMS incumbents nonetheless face unprecedented challenges
• Which sometimes resonate with frustrated architects and developers because of negative experiences that have more to do with how RDBMSs were used rather than what RDBMSs can effectively address
![Page 19: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/19.jpg)
19
RDBMS in the Big-Picture Framework
Resources Relations
Conceptual Resources and links Entities, attributes, relationships, and identifiers
Logical Model: hypertextLanguage: XQuery
Model: extended relationalLanguage: SQL
Physical Indexing (e.g., scalar data types, XML, full-text), locking and isolation levels, federation, replication, in-memory databases,
columnar storage, table spaces, caching, and more
![Page 20: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/20.jpg)
20
RDBMS Reconsidered
• A Forrester big data reality check (from “Stay Alert To Database Technology Innovation,” 11/19/2010): – “For 90% of BI use cases, which are often less than
50 terabytes in size, relational databases still are good enough” (p. 4)
– “Traditional relational databases are still good enough for the majority of transactional use cases” (p. 5)
![Page 21: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/21.jpg)
21
Back to the Bigger Picture
• Compared with traditional enterprise data management, big data is– Essentially a collection of specialized physical
models for very large, analysis-oriented data management
– Expanding to encompass resources as well as relations
– More about the potential for displacing expensive and closed/proprietary distributed processing alternatives than displacing RDBMS or XDBMS
![Page 22: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/22.jpg)
22
Structured Big Data: Recap
• Substantive, sustainable, and synergistic – RDBMS– XDBMS– Hadoop– The cloud as an information management
platform• Vaguely defined, transitory, and over-hyped
– NoSQL
![Page 23: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/23.jpg)
23
Agenda
• Big data in context• Big structured data• Big unstructured data• Big opportunities and risks• Q&A
![Page 24: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/24.jpg)
24
Big Unstructured Data
• Finding Facts about Data – IDC/EMC• Patterns for Unstructured Big Data• How-to issues – who will know?
![Page 25: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/25.jpg)
25http://www.emc.com/leadership/programs/digital-universe.htm
![Page 26: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/26.jpg)
26
![Page 27: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/27.jpg)
27
![Page 28: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/28.jpg)
284/28/2011
![Page 29: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/29.jpg)
29
![Page 30: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/30.jpg)
30
![Page 31: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/31.jpg)
314/28/2011
![Page 32: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/32.jpg)
32
![Page 33: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/33.jpg)
33
![Page 34: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/34.jpg)
34
Facebook:800M users500M visitors/day$100B potential value @ IPO
![Page 35: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/35.jpg)
35http://inmaps.linkedinlabs.com/
![Page 36: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/36.jpg)
36
Unstructured Big Data Patterns
• Search• Social• Mobile• Online Activities/Digital Marketing• Inquiry/Detection – Connecting Dots• Question Answering
![Page 37: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/37.jpg)
37
Mobile Adds:
Location data pointsVoice searchesSiri questionsApp history profileBrowse history profileSearch history profilePast purchase profileCamera-generated outputs/inputsCoupon delivery & merchandisingFriends' locationsSocial searchLocal ad-match algo opportunities
![Page 38: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/38.jpg)
384/28/2011
![Page 39: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/39.jpg)
39
Online Activities/Digital Marketing
![Page 40: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/40.jpg)
40
• Inquiry/Detection – Connecting Dots– Intelligence– Law Enforcement– Fraud Detection (Government, Financial, Health, …)– eDiscovery
![Page 41: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/41.jpg)
41
Social Media Monitoring
![Page 42: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/42.jpg)
424/28/2011
Question Answering
![Page 43: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/43.jpg)
43
Question Answering Beyond Jeopardy
![Page 44: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/44.jpg)
44
Twitter Analytics Questions• What can we tell about a user from their tweets?
– from the tweets of those they follow?– from the tweets of their followers?– from the ratio of followers/following
• What graph structures lead to successful networks?• User reputation?• Sentiment analysis?• What features get a tweet retweeted?
– How deep is the retweet tree?
• Long term duplicate detection• Machine learning• Language detection
![Page 45: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/45.jpg)
45
![Page 46: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/46.jpg)
46http://www.mckinsey.com/en/Features/Big_Data.aspx
![Page 47: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/47.jpg)
47
Agenda
• Big data in context• Big structured data• Big unstructured data• Big opportunities and risks• Q&A
![Page 48: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/48.jpg)
48
Big Data Opportunities• Improved visibility and insights
– Can explore previously impractical questions• Real-time analytics
– Less dependence on “dead data”• Blur the boundaries between structured and unstructured
information– Unified views of resources and relations
• Consolidation– Reduce the number of moving parts in your infrastructure
• Along with related licensing and maintenance expenses
• Compliance – capture and maintain data & records previously beyond firm's capabilities
![Page 49: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/49.jpg)
49
Big Data Risks• The potential for an ever-expanding set of information silos
– Critical to relentlessly focus on minimized redundancy and optimized integration
• GIGO (garbage in, garbage out) at super-scale– Dramatic improvements in capabilities and price/performance
provide new opportunities for self-inflicted damage, for organizations that don’t model or query effectively
• Cognitive overreach – The potential for information workers to create nonsensical
queries based on poorly-designed and/or misunderstood information models
• Skills gaps create competitive disadvantages
![Page 51: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/51.jpg)
Relational
Non-relational Analytic
OracleOperational IBM DB2 SQL Server
PostgreSQLMySQL Ingres
SAP Sybase ASE
Hadoop TeradataNetezza
JustOne
EMC Greenplum
Aster Data
ParAccel
HP Vertica
-as-a-Service
SimpleDB
Amazon RDS
Xeround
NewSQL
Calpont
GenieDB
VoltDB
ScalArc
NoSQL
DocumentLotus Notes
CouchDB
MongoDB
Graph
Key value
Big tables
ObjectivityMarkLogicInterSystems
Versant
Progress
McObject
HBase
Hypertable
RedisRiak
Voldemort
BerkeleyDB
Membrain
InfiniteGraphNeo4J
GraphDB
App EngineDatastore
Data Grid/Cache
Clustrix
Schooner MySQL
Tokutek
Akiban
CodeFutures
ContinuentScaleBase
Translattice
SQL Azure
FathomDB
EnterpriseDB
Database.com
Infobright SAP Sybase IQIBM InfoSphere
NimbusDB
VectorWise
HandlerSocket
Cassandra
Cloudant
MemcachedIBM eXtreme Scale
Oracle CoherenceGigaSpacesTerracotta
GridGain ScaleOut Vmware GemFire CloudTranInfiniSpan
Couchbase RavenDB
Drizzle
PiccoloDryad Hadapt
Mapr
Brisk
MySQL Cluster
Database market landscape
![Page 52: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/52.jpg)
52
Num
ber &
Com
plex
ity o
f Tec
hnol
ogie
sBig Data Complexity Continuum
Time Horizon
eCommerce
IDC 2005
Sentiment extraction
Speech to text
Intelligent Machines
Log Analysis
Predictions
Historic
Relationship Detection
PatternDetection
Influence Networks
Brand monitoring
Climate Modeling And Prediction
Trend Analytics
Reputationmanagement
Voice of Customer
Gov’t IntelligenceApplications
Data mining
Current (Monitor)Future(Predict)
MedicaldiagnosticsFraud
Detection
Web search
Ad Targeting Retargeting
![Page 53: Gilbane Boston 2011 big data](https://reader033.fdocuments.us/reader033/viewer/2022060110/555a5ed1d8b42ae7218b45a6/html5/thumbnails/53.jpg)
04/12/2023© IDC
Velocity Value
VolumeVariety/
Complexity
Big Data
Big Data CharacteristicsBig Data Characteristics