Disaster Recovery for the Real-Time Data Warehouses
description
Transcript of Disaster Recovery for the Real-Time Data Warehouses
Disaster Recovery For the Real-Time Data Warehouse:
Replicating and Parallelizing Big Data
What you will learn: 4 strategies
1. Separate operational warehouses from reporting systems
2. Use changed data capture and Big Data replication
3. Implement parallel, active-active data warehouses
4. Maintain a “golden event” warehouse in Hadoop
2Confidential & Proprietary
Analytics Have a Measurable Effect
• For the median Fortune 1000 Company, a 10% increase in data usability corresponds to $2.01B in annual revenue gains
• A “real-time infrastructure” ranks #3 on the CIO’s list of strategies
• Organizations adept at analytics see 1.6x the revenue growth
2.0x the profit growth, and 2.5x the stock price appreciation of their peers
3Confidential & Proprietary
Big Data, Big Opportunity – University of Texas at Austin, Sept 2011
A “real-time infrastructure” – Gartner
– “Outperforming in a Data-Rich and Hyper-Connected World.” IBM Center for Applied Insights and Economic Intelligence
Data Warehousing: Now Part of Operations
4Confidential & Proprietary
real-time pricing
real-time marketing
fraud detection
inventory management
customer service
Analytics in Business Operations:Constant, Up-to-Minute Access to Big Data
5
Click-stream Mobile ads
Energy usage Power production
Market Data Securities Trading
Traffic & Logistics Fleet Deployment
Network Activity IT Root-Cause Call Activity Capacity Allocation
ADVERTISING
UTILITIES
INFORMATION TECHNOLOGY
CAPITAL MARKETS
TRANSPORTATION
TELECOMMUNICATIONS
Expectations have changed
6
Confidential & Proprietary
What we need…vs. what we have
7Confidential & Proprietary
Need Have
Up-TimeSLAs: 99.999% Backup and recovery can
take days in the event of an outage or system failure
Real-timeAccess to information as it happens
ETL processes can take hours before information is available
Distribution
Add new applications as the business demands
Access to warehouse is tightly controlled; performance bottlenecks of a single database can impact mission-critical systems
4 disaster recovery strategies for big data
1. Separate operational warehouses from reporting systems
2. Use changed data capture and Big Data replication
3. Implement parallel, active-active data warehousing
4. Maintain a “golden event” warehouse in Hadoop
8Confidential & Proprietary
1. Separate operations from reporting
9
DB2
Secondary Warehouse
Primary Warehouse
WAN
Operations
Reporting
application
Run day-to-day applications in one place. Ad-hoc reporting happens in a separate warehouse.
BENEFITBetter control over performance
CHALLENGEKeeping changes in sync
2. Changed data capture
10
Data Fabric250 MB/s per boxLoad-balancedLinearly scalableBuilt-in persistence
Primary Cluster
1 GB/s
Reporting Cluster
WAN
application
Determine what has changed, then replicate it to achieve parity between environments
BENEFITQuickly propagate changes to remote sites
CHALLENGEIdentifying changes is difficult. The volume of data represents a stop-gap as it continues to grow.
3. Parallel, active-active data warehousing
11
Data Fabric250 MB/s per boxLoad-balancedLinearly scalableBuilt-in persistence
Primary Cluster
1 GB/s
Reporting Cluster
WAN
Confidential & Proprietary
Capture application data streams and load to parallel data warehouses over the WAN
BENEFITMultiple warehouses are kept up to date
CHALLENGESynchronization of many data streams
4. “Golden Event” store
12Confidential & Proprietary
application
Data Fabric250 MB/s per boxLoad-balancedLinearly scalableBuilt-in persistence
Golden Event Store
Primary Data Warehouse
Reporting Data Warehouse(Optional)
New Apps & Analytics
Capture raw data and store it in Hadoop
BENEFITNew analytics are always possible
CHALLENGEBest practices are only just being developed
About Tervela Turbo
• New release!• Capture, share, and distribute data• Accelerate any of the use cases we discussed today
13Confidential & Proprietary
Big Data Requires Big Data Movement
Confidential & Proprietary 14
As companies implement more big data solutions, the need to use high-performance message delivery with those systems will grow.
Gartner: Hype Cycle for Big Data, 2012
Key Features and Benefits of Tervela Turbo
15
Data Capture• Adapters for top data stores• Flexible multi-language API• Real-time acquisition
Data Availability• Parallel loading• Large-volume buffering• Automatic retry• Data replay
Data Distribution• Continuous loading• No disruption with bad consumers• Warehouses, DBs, Hadoop, etc• Web, mobile, custom apps
Real-TimeRegardless of data volume or number of sources
ReliableFor mission-critical operations that can’t go down
Multi-PlatformFeeds explosion of analytic apps on any platform without disrupting other consumers
Key Features Key Benefits
Capture, Share, and Distribute
Big Data For Mission-Critical Analytics
www.terverla.com
@tervela
Learn More About Big Data Movement
16
Access videos, how-to guides, and other
educational materials at:tervela.com/datafabric