KP Partners: DataStax and Analytics Implementation Methodology

25
Contact Us 510.818.9480 | www.kpipartners.com © KPI Partners Inc. Start Here Brian Dominguez| Director of Client Services | KPI Partners DataStax and Analytics Implementation Methodology

Transcript of KP Partners: DataStax and Analytics Implementation Methodology

Page 1: KP Partners: DataStax and Analytics Implementation Methodology

Contact Us510.818.9480 | www.kpipartners.com© KPI Partners Inc.

Start Here

Brian Dominguez| Director of Client Services | KPI Partners

DataStax and Analytics Implementation Methodology

Page 2: KP Partners: DataStax and Analytics Implementation Methodology

2

Page 3: KP Partners: DataStax and Analytics Implementation Methodology

1. KPI is a Silver Level DataStax Partner

2. KPI is a top tier sponsor at Cassandra Summit• September 22-24, 2015, Santa Clara, CA

3. KPI and its consultants have implemented DataStax at multiple retail and financial services customers

-

DataStax and KPI Partners

Page 4: KP Partners: DataStax and Analytics Implementation Methodology

KPI uses the DataStax Implementation Methodology

Page 5: KP Partners: DataStax and Analytics Implementation Methodology

1. Use Case Requirements for Data Model

2. Security and Encryption Requirements

3. Service Level Agreements

4. Operational Requirements (Monitor and Manage)

5. Search Requirements (DataStax Search)

6. Analytics Requirements (DataStax Analytics)

Step 1: Requirements Phase

Page 6: KP Partners: DataStax and Analytics Implementation Methodology

1. Key to success “get the data model right”

2. Leverage what is in place:1. Query logs

2. Define specific Create, Read, Update, and Delete “CRUD” requirements

3. DataStax Security1. Authentication Req. (i.e. Kerberos, Password, SSL, LDAP, etc.)

2. Authorization Req. (i.e. access to Scheme, Table, or other database components)

4. Encryption1. Client Application to DataStax (the Cluster)

2. Node-to-Node (Inter-Cluster)

Step 1: Requirements Phase – Key-points

Page 7: KP Partners: DataStax and Analytics Implementation Methodology

5. SLA’s1. Highly recommended “must have”

2. Lack of SLA’s lead to project failure.

6. Understand you are building a mission critical system1. Make sure to define operational monitoring and management of the system

7. DataStax Search1. Define Search Requirements

2. Determine the fields that will be searched on and returned (i.e. multiple search fields or single search field, the use of faceted results vs. ranked list results, etc.)

Step 1: Requirements Phase – Key-points

Page 8: KP Partners: DataStax and Analytics Implementation Methodology

7. DataStax Analytics1. Analytics requirements should be captured at this time.

8. Analytics requirements should incorporate:1. statistical algorithms,

2. required data sources,

3. data movement/modifications,

4. security/access,

5. other analytical requirements at a clear enough level to enable a thorough design.

Step 1: Requirements Phase – Key-points

Page 9: KP Partners: DataStax and Analytics Implementation Methodology

1. Data Model Design

2. Data Access Object Design

3. Data Movement Design

4. Operational Design (Management and Monitoring)

5. Search Design

6. Analytics Design

Step 2: Design Phase

Page 10: KP Partners: DataStax and Analytics Implementation Methodology

1. Data Model Design should clearly include:1. Keyspace Design (Replication Strategy, Name)2. Table Design (Table Names, Partition Keys, Clustering Columns (if applicable),

and physical table properties as necessary (i.e. encryption, bloom filter settings, etc.)

3. Any relationships between tables. Note that database joining within DataStax Enterprise is not technically feasible. However, relationships between tables are still important, especially for the application developers.

Step 2: Design Phase – Key-points

Page 11: KP Partners: DataStax and Analytics Implementation Methodology

2. When leveraging simple Data Access Objects projects are more successful

1. Simple Data Access Objects are best to encapsulate and abstract data manipulation logic.

2. This is opposed to the current trend in application development, where projects leverage frameworks to encapsulate, abstract, and represent database components as application objects, i.e. Hibernate, LinQ, JPA, ORM, etc.

3. Designing the Data Access Object, as much as possible, up front will help the

application development team as they build out higher-level functionality.

Step 2: Design Phase – Key-points

Page 12: KP Partners: DataStax and Analytics Implementation Methodology

3. Data Movement Design is essential to your success1. Batch and real-time data integration between systems

2. ETL, Change Data Capture, data pipelines, etc.

3. Data types, transformation logic, error handling, look-ups, and data normalization should be clearly documented.

Step 2: Design Phase – Key-point

Page 13: KP Partners: DataStax and Analytics Implementation Methodology

4. Operational Design1. Tooling and the techniques used:

1. deploy new nodes, configure and upgrade nodes in the cluster, backup and restore operations, cluster monitoring, OpsCenter use, repairs, alerting, disaster management processes, etc.

2. KPI recommends using a "playbook" approach to Operational Design.

Step 2: Design Phase – Key-points

Page 14: KP Partners: DataStax and Analytics Implementation Methodology

5. Search Design1. Incorporate items such as:

1.searchable terms, returned terms, tokenizers, filters, multidocument search terms, etc.

6. DataStax Analytics Design1. determine which Analytics components will be leveraged in the

solution.

Step 2: Design Phase – Key-points

Page 15: KP Partners: DataStax and Analytics Implementation Methodology

1. Infrastructure

2. Deployment and Configuration Management

3. Software Components (Data Model and Application)

4. Unit Testing of Components

Step 3: Implementation Phase

Page 16: KP Partners: DataStax and Analytics Implementation Methodology

1. Application Development – use Agile or Waterfall methodology as desired by your organization

2. Deployment and Configuration Management Mechanism1. Key in a distributed system is the need to automate as much as possible

2. Opscenter, Docker, Vagrant, Chef, Puppet, etc. should be leveraged.

3. Unit Testing of Components1. More complex with distributed systems compared to single node systems.

2. Specific defects, such as race conditions, are only observed "at scale“

3. unit testing should be executed over a small cluster that contains more than a single node.

4. Tools such as ccm can be used by developers to automate the process of quickly launching test clusters as part of a unit test.

Step 3: Implementation Phase - Key-points

Page 17: KP Partners: DataStax and Analytics Implementation Methodology

1. Defect tracking (JIRA, Issue Log)

2. Operational readiness checklist completed

Step 4: Pre-Production Testing Phase

Page 18: KP Partners: DataStax and Analytics Implementation Methodology

1. Critical to enable the project team to identify actual issues prior to going to production “at scale”

2. Minimum 2 week period where the application is running at production scale.

3. It may take several iterations of configuration, code change, and refactoring to enable full execution

Step 4: Pre-production Testing Phase - Key-points

Page 19: KP Partners: DataStax and Analytics Implementation Methodology

4. Operational Readiness Checklist1. Replace a downed node and a dead seed node

2. Configure and execute repair (within GC_Grace_Period)

3. Add a node to a cluster

4. Replace a downed Data Center

5. Add a Data Center to the cluster

6. Decommission a node

7. Restore a backup

8. At a Cluster Level and Per Node Level, report on errors, throughput, latency, resource saturation, bottlenecks, compactions, flushes, and health

Step 4: Pre-production Testing Phase - Key-points

Page 20: KP Partners: DataStax and Analytics Implementation Methodology

Highlight the normal, operational mode of an application built on DataStax Enterprise.

Prepare for all eventualities, and address by adding nodes to expand capacity to the system when needed.

Scale with DataStax Enterprise.

Step 5: Scale and Enhancements

Page 21: KP Partners: DataStax and Analytics Implementation Methodology

Reference Architecture – On premiseTableau via ODBC

R for Visualization (SPARK Analytics)

Page 22: KP Partners: DataStax and Analytics Implementation Methodology

Reference Architecture – CloudTableau via ODBC

R for Visualization (SPARK Analytics)

Page 23: KP Partners: DataStax and Analytics Implementation Methodology

23

Next Steps

DataStax Representative KPI Partners

DataStax PricingDataStax Demo

• Schedule a Lunch & Learn• Free 1 Hour DataStax Assessment Call

Contact Brian Dominguez

[email protected]

or [email protected]

Who To Contact?

Page 24: KP Partners: DataStax and Analytics Implementation Methodology

KPI PARTNERSBooth 111

September 22-24

Page 25: KP Partners: DataStax and Analytics Implementation Methodology