Building Analytic Apps for SaaS: “Analytics as a Service”
-
Upload
amazon-web-services -
Category
Technology
-
view
75 -
download
0
Transcript of Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
Today’s Presenters
David Potes, Partner Solutions Architect, Big Data, AWSMariano Luna, Sr. Manager, Technical Alliances & Cloud, TIBCO SoftwarePatrick Brown, VP of Digital Marketing, WaggleRaj Chary, VP Technology/Architecture, Waggle
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon Redshift
a lot fastera lot simplera lot cheaper
The Amazon Redshift view of data warehousing
10x cheaper
Easy to provision
Higher DBA productivity
10x faster
No programming
Easily leverage BI tools, Hadoop, machine learning, streaming
Analysis inline with process flows
Pay as you go, grow as you need
Managed availability and disaster recovery
Enterprise Big data SaaS
Amazon Redshift architecture Leader node
Simple SQL endpointStores metadataOptimizes query planCoordinates query execution
Compute nodesLocal columnar storageParallel/distributed execution of all queries, loads, backups, restores, resizes
Start at just $0.25/hour, grow to 2 PB (compressed)DC1: SSD; scale from 160 GB to 326 TBDS2: HDD; scale from 2 TB to 2 PB
SQL clients/BI tools
128GB RAM
16TB disk
16 cores
Ingestion/backupBackupRestoreAmazon S3/Amazon
DynamoDB/Secure Shell (SSH)
JDBC/ODBC
10 GigE(HPC)
128GB RAM
16TB disk
16 coresCompute node
128GB RAM
16TB disk
16 coresCompute node
128GB RAM
16TB disk
16 coresCompute node
Leadernode
Benefit #1: Amazon Redshift is fast Dramatically less I/O
Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
analyze compression listing;
Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
Benefit #1: Amazon Redshift is fastParallel and distributed
Query
Load
Export
Backup
Restore
Resize
S3/EMR/DynamoDB/SSH
128GB RAM
16TB disk
16 coresCompute node
128GB RAM
16TB disk
16 coresCompute node
128GB RAM
16TB disk
16 coresCompute node
SQL clients/BI tools
128GB RAM
48TB disk
16 cores
CN
128GB RAM
48TB disk
16 cores
CN
128GB RAM
48TB disk
16 cores
CN
128GB RAM
48TB disk
16 coresLeadernode
128GB RAM
48TB disk
16 cores
CN
128GB RAM
48TB disk
16 cores
CN
128GB RAM
48TB disk
16 cores
CN
128GB RAM
48TB disk
16 cores
CN
128GB RAM
48TB disk
16 coresLeadernode
Benefit #1: Amazon Redshift is fast
Hardware optimized for I/O intensive workloads, 4 GB/sec/node
Enhanced networking, over 1 million packets/sec/node
Choice of storage type, instance size
Regular cadence of autopatched improvements
Benefit #2: Amazon Redshift is inexpensive
Ds2 (HDD) Price per hour for DW1.XL single node
Effective annual price per TB compressed
On demand $ 0.850 $ 3,7251-year reservation $ 0.500 $ 2,1903-year reservation $ 0.228 $ 999
Dc1 (SSD) Price per hour for DW2.L single node
Effective annual price per TB compressed
On demand $ 0.250 $ 13,6901-year reservation $ 0.161 $ 8,7953-year reservation $ 0.100 $ 5,500
Pricing is simpleNumber of nodes x price/hourNo charge for leader node No upfront costsPay as you go
Benefit #3: Amazon Redshift is fully managed
Continuous/incremental backups
Multiple copies within cluster
Continuous and incremental backups to Amazon S3
Continuous and incremental backups across regions
Streaming restore
Amazon S3
Amazon S3
Region 1
Region 2
Compute node
Compute node
Compute node
Benefit #4: Security is built in• Load encrypted from Amazon S3
• SSL to secure data in transit
• ECDHE perfect forward security
• Amazon VPC for network isolation
• Encryption to secure data at rest
– All blocks on disks and in Amazon S3 encrypted– Block key, cluster key, master key (AES-256)– On-premises HSM and AWS CloudHSM support
• Audit logging and AWS CloudTrail integration
• SOC 1/2/3, PCI-DSS, FedRAMP, BAA
10 GigE(HPC)
IngestionBackupRestore
SQL clients/BI tools
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
S3/EMR/DynamoDB/SSH
Customer VPC
InternalVPC
JDBC/ODBC
Leadernode
Compute node
Compute node
Compute node
Benefit #5: We innovate quickly Well over 125 new features added since launch Release every two weeks Automatic patching
-50
-80
100
40
-30
15
30
90
80
Benefit #6: Amazon Redshift is powerful• Approximate functions
• User-defined functions
• Machine learning
• Data science
Mariano Luna
January 12th, 2017
Building Analytic Apps for SaaS: “Analytics as a Service”
Hello, it’s me
Mariano Luna
Sr. Manager, Technical Alliances & Cloud
TIBCO Software (Houston, TX)
Do you feel like your users are always looking for more?
The Rise of Modern ApplicationsThe Old Way Modern Applications
The Rise of Modern Applications
Your app
Analytics
Knowledge
worker
The Old Way Modern Applications
The Rise of Modern Applications
Your app
Analytics
Put answers in context
Increase adoption of BI
Give consumersactionable data
Embedded Analytics
Knowledge
worker
The Old Way Modern Applications
Jaspersoft OverviewAn embeddable analytics platform designed for software companies
Jaspersoft OverviewAn embeddable analytics platform designed for software companies
Jaspersoft Studio
Desktop report designer for JasperReports
JasperReports Server
Powerful BI platform and server
Reports Dashboards Self-service
Jaspersoft ETL
Data integration for improved reporting & analysis
JasperReports
World’s most popular Java reporting library
From Data to Delivery
Connect to & manage your platform
Server
Connect toyour data
Data Tier
Inside any appor process
DeliveryYour app
Output
Create beautiful reports & dashboards
Why Jaspersoft for AWS
OOTB multi-tenant
support
Built to 100% modern web
standards
RESTful web service APIs
Visualize.js for advanced
embeddingJavaScript/ HTML5 UI
10 minutes
to deploy
Autodetect for:• RDS• Redshift• EMR
Starts at less
than $1/hour
Pay-as-you-go withno user or data limits
Save with discountedannual subscriptions
Autoscaling
clusters
Infinite elastic scalability
Multi-AZ enterprise
deployments
Patrick Brown - Vice President of MarketingRaj Chary - Vice President of Technology/Architecture
Smart, responsive practice
Math and ELA (Grades 2-8)
Provides students the right challenge at the right time
What is Waggle?
Right Challenge, Right Time Waggle looks for more than correct answers. Waggle continually analyzes each student’s decisions and progress. That way, students get tougher material right when they’re ready.
What is Waggle?
Productive Struggle Waggle motivates students to push themselves forward. How? Through helpful hints, supportive feedback, and achievement badges that build grit and confidence.
What is Waggle?
Constructive Grouping Waggle’s insights means you can easily group students together based on learning needs. All without sacrificing the quality of individual instruction.
What is Waggle?
Waggle: Product Demo• Data Creators
Differentiated learning experience Fun and engaging
• Data Visualizers Seamless integration with application Analytics with a Story Actionable Data
With integration deep dives into TIBCO Jaspersoft (Visualize.js) and AWS services (Redshift)
How did we build Waggle?
How did we build Waggle?• Data Modelling
Lens-based model (distribution keys and sort keys)
Ask and validate Performance Efficiency
(compression, load/unload, vacuum/analyze, in-memory processing, WLM)
How did we build Waggle?• Data Modelling
Lens-based model (distribution keys and sort keys)
Ask and validate Performance Efficiency
(compression, load/unload, vacuum/analyze, in-memory processing, WLM)
• Contextual Design Prototype and
storyboard Capture the “Intent”
Amazon Redshift: Data Warehouse Layout
Write ClusterCompute – dw2.large
Redshift
Read ClusterCompute – dw2.large
Redshift
History ClusterDensity – dw1.xlarge
Redshift
Initial and Incremental {processed} data loads
Periodic Data Snapshots for historical analysis
Data sources
For serving Jaspersoft reports
APIs
OLTP
S3 COPY
S3 UnLoad and Load
S3 UnLoad and Load
Data mart(aggregations)
NodesNodes
Staging
Datamart(aggregations)
NodesS3 UnLoad and Load
S3 UnLoad and Load + UPSERTS
Results and Lessons Learned• Performance Metrics
– Millions of records are processed in <1 minute• LOAD/UNLOAD commands | UPSERTS | S3 COPY
Command – Report queries average < 1 to ~1.5 seconds– {compression} – gained 20+% efficiencies in data retrieval
• Best Practices– {sort keys} – lens-based data model: visualize data in variety of ways – {commit stats} – Redshift is not a transactional system– {nested loops} – no Cartesian products, ensure joins well managed– {queries that queue} – tune the WLM configuration– {query runtimes} – faster query means less queuing– {stats missing} – analyze and vacuum when possible– {alerts with tables} – monitor to ensure queries running optimally
Find TIBCO Jaspersoft on AWS Marketplace
Q & A