KP Partners: DataStax and Analytics Implementation Methodology
-
Upload
planet-cassandra -
Category
Technology
-
view
105 -
download
0
Transcript of KP Partners: DataStax and Analytics Implementation Methodology
Contact Us510.818.9480 | www.kpipartners.com© KPI Partners Inc.
Start Here
Brian Dominguez| Director of Client Services | KPI Partners
DataStax and Analytics Implementation Methodology
2
1. KPI is a Silver Level DataStax Partner
2. KPI is a top tier sponsor at Cassandra Summit• September 22-24, 2015, Santa Clara, CA
3. KPI and its consultants have implemented DataStax at multiple retail and financial services customers
-
DataStax and KPI Partners
KPI uses the DataStax Implementation Methodology
1. Use Case Requirements for Data Model
2. Security and Encryption Requirements
3. Service Level Agreements
4. Operational Requirements (Monitor and Manage)
5. Search Requirements (DataStax Search)
6. Analytics Requirements (DataStax Analytics)
Step 1: Requirements Phase
1. Key to success “get the data model right”
2. Leverage what is in place:1. Query logs
2. Define specific Create, Read, Update, and Delete “CRUD” requirements
3. DataStax Security1. Authentication Req. (i.e. Kerberos, Password, SSL, LDAP, etc.)
2. Authorization Req. (i.e. access to Scheme, Table, or other database components)
4. Encryption1. Client Application to DataStax (the Cluster)
2. Node-to-Node (Inter-Cluster)
Step 1: Requirements Phase – Key-points
5. SLA’s1. Highly recommended “must have”
2. Lack of SLA’s lead to project failure.
6. Understand you are building a mission critical system1. Make sure to define operational monitoring and management of the system
7. DataStax Search1. Define Search Requirements
2. Determine the fields that will be searched on and returned (i.e. multiple search fields or single search field, the use of faceted results vs. ranked list results, etc.)
Step 1: Requirements Phase – Key-points
7. DataStax Analytics1. Analytics requirements should be captured at this time.
8. Analytics requirements should incorporate:1. statistical algorithms,
2. required data sources,
3. data movement/modifications,
4. security/access,
5. other analytical requirements at a clear enough level to enable a thorough design.
Step 1: Requirements Phase – Key-points
1. Data Model Design
2. Data Access Object Design
3. Data Movement Design
4. Operational Design (Management and Monitoring)
5. Search Design
6. Analytics Design
Step 2: Design Phase
1. Data Model Design should clearly include:1. Keyspace Design (Replication Strategy, Name)2. Table Design (Table Names, Partition Keys, Clustering Columns (if applicable),
and physical table properties as necessary (i.e. encryption, bloom filter settings, etc.)
3. Any relationships between tables. Note that database joining within DataStax Enterprise is not technically feasible. However, relationships between tables are still important, especially for the application developers.
Step 2: Design Phase – Key-points
2. When leveraging simple Data Access Objects projects are more successful
1. Simple Data Access Objects are best to encapsulate and abstract data manipulation logic.
2. This is opposed to the current trend in application development, where projects leverage frameworks to encapsulate, abstract, and represent database components as application objects, i.e. Hibernate, LinQ, JPA, ORM, etc.
3. Designing the Data Access Object, as much as possible, up front will help the
application development team as they build out higher-level functionality.
Step 2: Design Phase – Key-points
3. Data Movement Design is essential to your success1. Batch and real-time data integration between systems
2. ETL, Change Data Capture, data pipelines, etc.
3. Data types, transformation logic, error handling, look-ups, and data normalization should be clearly documented.
Step 2: Design Phase – Key-point
4. Operational Design1. Tooling and the techniques used:
1. deploy new nodes, configure and upgrade nodes in the cluster, backup and restore operations, cluster monitoring, OpsCenter use, repairs, alerting, disaster management processes, etc.
2. KPI recommends using a "playbook" approach to Operational Design.
Step 2: Design Phase – Key-points
5. Search Design1. Incorporate items such as:
1.searchable terms, returned terms, tokenizers, filters, multidocument search terms, etc.
6. DataStax Analytics Design1. determine which Analytics components will be leveraged in the
solution.
Step 2: Design Phase – Key-points
1. Infrastructure
2. Deployment and Configuration Management
3. Software Components (Data Model and Application)
4. Unit Testing of Components
Step 3: Implementation Phase
1. Application Development – use Agile or Waterfall methodology as desired by your organization
2. Deployment and Configuration Management Mechanism1. Key in a distributed system is the need to automate as much as possible
2. Opscenter, Docker, Vagrant, Chef, Puppet, etc. should be leveraged.
3. Unit Testing of Components1. More complex with distributed systems compared to single node systems.
2. Specific defects, such as race conditions, are only observed "at scale“
3. unit testing should be executed over a small cluster that contains more than a single node.
4. Tools such as ccm can be used by developers to automate the process of quickly launching test clusters as part of a unit test.
Step 3: Implementation Phase - Key-points
1. Defect tracking (JIRA, Issue Log)
2. Operational readiness checklist completed
Step 4: Pre-Production Testing Phase
1. Critical to enable the project team to identify actual issues prior to going to production “at scale”
2. Minimum 2 week period where the application is running at production scale.
3. It may take several iterations of configuration, code change, and refactoring to enable full execution
Step 4: Pre-production Testing Phase - Key-points
4. Operational Readiness Checklist1. Replace a downed node and a dead seed node
2. Configure and execute repair (within GC_Grace_Period)
3. Add a node to a cluster
4. Replace a downed Data Center
5. Add a Data Center to the cluster
6. Decommission a node
7. Restore a backup
8. At a Cluster Level and Per Node Level, report on errors, throughput, latency, resource saturation, bottlenecks, compactions, flushes, and health
Step 4: Pre-production Testing Phase - Key-points
Highlight the normal, operational mode of an application built on DataStax Enterprise.
Prepare for all eventualities, and address by adding nodes to expand capacity to the system when needed.
Scale with DataStax Enterprise.
Step 5: Scale and Enhancements
Reference Architecture – On premiseTableau via ODBC
R for Visualization (SPARK Analytics)
Reference Architecture – CloudTableau via ODBC
R for Visualization (SPARK Analytics)
23
Next Steps
DataStax Representative KPI Partners
DataStax PricingDataStax Demo
• Schedule a Lunch & Learn• Free 1 Hour DataStax Assessment Call
Contact Brian Dominguez
Who To Contact?
KPI PARTNERSBooth 111
September 22-24