Using Data Virtualization to Integrate With Big Data
-
Upload
mark-madsen -
Category
Technology
-
view
111 -
download
8
description
Transcript of Using Data Virtualization to Integrate With Big Data
The Role of Data Virtualization in a World of Big Data
June 6, 2012
Mark [email protected]
Information Management Through Human History
New technology development(innovation)
createsNew methods to cope
(maturation)
createsNew information scale and availability
(saturation)
creates…
Copyright Third Nature, Inc.
Big Data
You keep using that word. I do not think it means what you think it means.
What makes data “big”?
Hierarchical structures
Nested structures
Encoded values
Non‐standard (for a database) types
Deep structure
Very large amounts
Human authored text
“big” is better off being defined as “complex” or “hard to manage”
Copyright Third Nature, Inc.
You could store this data in the data warehouse but…
Old database technology has so many problems
“Big Data”
New technology has so many problems
Reality is multiple data stores and platformsSeparate, purpose-built databases and processing systems for different types of data and query / computing workloads is the norm for information delivery. Data flows between most of these environments.
BI, Reporting, Dashboards
1 Marge Inover a $150,000 St at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Ivan Awf ulit c h $160,000 Derm atol ogi st
4 Nadi a Geddit $36,000 DBA
1 M arge I nover a $150, 000 S t at is t ic ian
2 Ani ta Bat h $120, 000 Sew er i nspec tor3 I v an Awful it ch $160, 000 Der matol og i st
4 N adi a Geddit $36, 000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 M arge I nover a $150, 000 S t at is t ic ian
2 Ani ta Bat h $120, 000 Sew er i nspec tor3 I v an Awful it ch $160, 000 Der matol og i st
4 N adi a Geddit $36, 000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
Data Warehouse
Databases Documents Flat Files XML Queues ERP Applications
Source Environments
Example “big data”: Web tracking dataUSER_ID 301212631165031
SESSION_ID 590387153892659
VISIT_DATE 1/10/2010 0:00
SESSION_START_DATE 1:41:44 AM
PAGE_VIEW_DATE 1/10/2010 9:59
DESTINATION_URL
https://www.phisherking.com/gifts/store/LogonForm?mmc=link‐src‐email‐_‐m100109‐_‐44IOJ1‐_‐shop&langId=‐1&storeId=1055&URL=BECGiftListItemDisplay
REFERRAL_NAME Direct
REFERRAL_URL ‐
PAGE_ID PROD_24259_CARD
REL_PRODUCTS PROD_24654_CARD, PROD_3648_FLOWERS
SITE_LOCATION_NAME VALENTINE'S DAY MICROSITE
SITE_LOCATION_ID SHOP‐BY‐HOLIDAY VALENTINES DAY
IP_ADDRESS 67.189.110.179
BROWSER_OS_NAMEMOZILLA/4.0 (COMPATIBLE; MSIE 7.0; AOL 9.0; WINDOWS NT 5.1; TRIDENT/4.0; GTB6; .NET CLR 1.1.4322)
Example “big data”: Web tracking dataUSER_ID 301212631165031
SESSION_ID 590387153892659
VISIT_DATE 1/10/2010 0:00
SESSION_START_DATE 1:41:44 AM
PAGE_VIEW_DATE 1/10/2010 9:59
DESTINATION_URL
https://www.phisherking.com/gifts/store/LogonForm?mmc=link‐src‐email‐_‐m100109‐_‐44IOJ1‐_‐shop&langId=‐1&storeId=1055&URL=BECGiftListItemDisplay
REFERRAL_NAME Direct
REFERRAL_URL ‐
PAGE_ID PROD_24259_CARD
REL_PRODUCTS PROD_24654_CARD, PROD_3648_FLOWERS
SITE_LOCATION_NAME VALENTINE'S DAY MICROSITE
SITE_LOCATION_ID SHOP‐BY‐HOLIDAY VALENTINES DAY
IP_ADDRESS 67.189.110.179
BROWSER_OS_NAMEMOZILLA/4.0 (COMPATIBLE; MSIE 7.0; AOL 9.0; WINDOWS NT 5.1; TRIDENT/4.0; GTB6; .NET CLR 1.1.4322)
The event stream contains IDs, but no reference data…
Reference data, aka dimensions, master data. This isn’t an OLTP DB, there is no reference data available from the source.
.
It would be logical to keep all the data in one place.
I need that data now.
The typical situation for analysts
It will take 6 months
There are two architectural approaches to facilitating analysis, depending on where the analyst works in the environment:
1. Back end integration: For analysts working within the BD environment ‐ Reaching out from the environment to get other data that's needed to make sense of information.
2. Front end integration: For analysts working in a more conventional BI / analysis environment ‐reaching in to the BD environment from other tools.
Solution: copy the data into Hadoop?Just load it from the DW. If it’s there. Otherwise, dump and load the data from the sources.
Great for one-time analysis, but if you need to do it again next week, or if you need current values on a regular basis?
You can build custom extracts from each source. But…
• Poor tool support
• Problem of on-demand / current values
• Minimal data management possible in the Hadoop environment
• The analyst waits
OLTP SourcesData warehouse
OLTP SourcesData warehouse
Alternative: data virtualization to enable accessA data virtualization layer can be used to make other sources (OLTP, the data warehouse) appear locally accessible to the analyst or Hadoop programmer. Then, two choices are possible:▪ extract the data and load it into the local environment
▪ access it dynamically from within the environment
OLTP SourcesData warehouse
Alternative: data virtualization to bridge storesA data virtualization layer can be used to bridge the database and big data environments, hiding the back end complexities.
Allows one to access raw or processed data from Hadoop alongside data from other environments with some benefits: no limited Hive connectors, no client‐side data merging, no difficult metadata layer integrations.
Data virtualization can simplify access across the entire data environment, “big” or not
DV also enables shared metadata across environments, avoiding the costs of model integration and burying it in source code.
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 M arge I nover a $150, 000 S t at is t ic ian
2 Ani ta Bat h $120, 000 Sew er i nspec tor3 I v an Awful it ch $160, 000 Der matol og i st
4 N adi a Geddit $36, 000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 M arge I nover a $150, 000 S t at is t ic ian
2 Ani ta Bat h $120, 000 Sew er i nspec tor3 I v an Awful it ch $160, 000 Der matol og i st
4 N adi a Geddit $36, 000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
1 Marge Inover a $150,000 S t at is t ic ian
2 Anit a Bat h $120,000 Sewer i ns pec tor3 Iv an Awf ulit c h $160,000 Derm atol og i st
4 Nadi a Geddit $36,000 DBA
Data Warehouse
BI, Reporting, Dashboards
Databases Documents Flat Files XML Queues ERP Applications
Source Environments
Data virtualization layer (front end)
DV layer (back end)
Bridge the data environment to uses beyond BI
The use cases are now interactive applications, lower latency data, complex analytics and extend beyond read‐only queries.
About the PresenterMark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, analytics and information management. Mark is an award-winning author, architect and former CTO whose work has been featured in numerous industry publications. During his career Mark received awards from the American Productivity & Quality Center, TDWI, Computerworld and the Smithsonian Institute. He is an international speaker, contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.
About Third Nature
Third Nature is a research and consulting firm focused on new and emerging technology and practices in business intelligence, analytics and performance management. If your question is related to BI, analytics, information strategy and data then you‘re at the right place.
Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors.
We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.