Is Your Project in Trouble on System Performance?

11
1 Is Your Project in Trouble on System Performance? May 14, 2017 ©2017 [Copyright Charles Chow] 1 SATURN 2017 SATURN 2017 - Is Your Project in Trouble on System Performance? ©2017 [Copyright Charles Chow] Is Your Project in Trouble on System Performance? Charles Chow May 2017

Transcript of Is Your Project in Trouble on System Performance?

Page 1: Is Your Project in Trouble on System Performance?

1Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 1

SATURN 2017

SATURN 2017 - Is Your Project in Trouble on System Performance?©2017 [Copyright Charles Chow]

Is Your Project in Trouble on System Performance?

Charles Chow

May 2017

Page 2: Is Your Project in Trouble on System Performance?

2Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 2

SATURN 2017

Agenda

Why do so many projects have system performance issues?

How to salvage a troubled project with performance issues?

Performance and Storage Trade-offs by example

Lessons Learned on architectural impact to the performance

Q&A

Page 3: Is Your Project in Trouble on System Performance?

3Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 3

SATURN 2017

Why do so many projects have system performance issues?

1 32

Lack of non-functional requirements on performance at an early stage of the architectural design

Lack of user-centric approach in detailed design and implementation

Overlook the performance testing at the start of the System Integration Test

Page 4: Is Your Project in Trouble on System Performance?

4Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 4

SATURN 2017

Most common causes of system performance issues

Lack of Network / Server Capacity

Unoptimized and/or Diminutive Databases

Unmanaged Growth of Data

Poor Resource Utilization

Poor Code Quality

Peak User Load or Traffic Spikes

Page 5: Is Your Project in Trouble on System Performance?

5Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 5

SATURN 2017

How to salvage a troubled project with performance issues?

Application Layer

Server Layer

Platform Layer

Application / Service Stack Monitoring Metrics*

* Metrics available may vary by vendor’s willingness to allow for monitoring.

Metric

• Online Transaction Metrics:– Response Time, Throughput, Transaction Pass/Fail

Rate, Error Rate, Distribution)

• Batch Transaction Metrics:– Records Processed & Execution Time

Metric

• Server Metrics:– Http Server, App Server, Database Server, Other

Servers

Metric

• Platform Metrics:– Cloud Services, OS, Storage, etc.

• Identify the root causes by monitoring Key Metrics at each layer of the application stack • Conduct performance testing to analyze on performance on the overall application• Collaborate with business stakeholders on performance issue resolution

Page 6: Is Your Project in Trouble on System Performance?

6Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 6

SATURN 2017

Performance and Storage Trade-offs by example

• This is a web portal project for a customized solution on Sales Opportunities

• The architecture was set on top of existing mobile application

• The client already invested over $2millon on the solution

• Quality attributes workshop (QAW) was not conducted.

• Performance as a critical non-functional requirement was not defined in early phase

Background

On-Premise Client Applications

Hosted Applications

3rd Party Providers

EDW Postgres

Nightly reload (archive old tables

and create new copies)

OAM

iOS App

Portal

SAML SSO Login provider

SFDC Application

User Notes

API web services

Frontend Proxy Server

DB I/O

Page 7: Is Your Project in Trouble on System Performance?

7Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 7

SATURN 2017

Performance and Storage Trade-offs by example

• Currently the API is querying the PostgreSQL database to return results to the Portal Front-End

• These calls are especially expensive due to data volume and data propagation

• Timeouts occur when tested with 50 and more concurrent users during UAT

• Clients wanted to keep current architecture and tech stack and go-live date is 4 weeks away

• Options on database tuning, code optimization, and hardware capacity increase were exhausted

• Introduce Solr in the architecture by moving data aggregation and sorting to Solr as an indexing service

• Indexing of data will be targeted to data sets that take the longest to return due to complex calculations

• The API services will start querying the Solr index for results instead of the PostgreSQL database directly

• Use pre-generated index to serve results from the Solr instance hosted on AWS

Problem Statement

Proposed Solution

Page 8: Is Your Project in Trouble on System Performance?

8Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 8

SATURN 2017

Performance and Storage Trade-offs by example

On-Premise Client Applications

Hosted Applications

3rd Party Providers

EDW Postgres

Nightly reload (archive old tables

and create new copies)

OAM

Portal

iOS App

Portal

SFDC Application

User Notes

API web services

Frontend Proxy Server

DB I/O

SAML SSO Login provider

Updated Architecture

Page 9: Is Your Project in Trouble on System Performance?

9Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 9

SATURN 2017

Performance and Storage Trade-offs by example Performance Testing - Defined and simulated the user “day of life” flow through the different functionalities of the Portal based on user role with sustained load of 100 users.

Scenarios and weightages for each scenario:

User GroupNumber of scenarios

Weightage per user group

Weightage per scenario in user group

User Group 1 3 45%

Scenario 1: 45%Scenario 2: 20%Scenario 3: 35%

User Group 2 3 35%

Scenario 1: 40%Scenario 2: 40%Scenario 3: 20%

User Group 3 2 10%Scenario 1: 75%Scenario 2: 25%

User Group 4 2 10%Scenario 1: 60%Scenario 2: 40%

Parameters Configuration Details

Duration 50 minutes Ramped up from 0 to 100 virtual users (threads) in 10 minutes, sustained load for 30 mins and ramped down from 100 to 0 in 10 minutes

User think time 4-5 seconds/ 10-12 seconds Simulated wait time between steps in scenarios as per user behavior

Data Volume – Open opportunities 17 million to 33 million Simulated increase in opportunities based on estimated growth

Major Performance Activities:

Action Accomplishments

Implemented Solr Indexing

Resolved the major Performance issue

Query and API Tuning

Resolved the specific functional area performance issue

Load TestingCaptured Redis cache issue and mitigated future production issue

Database environment turning

Identified database connection pooling issue and resolved scalability issue

Page 10: Is Your Project in Trouble on System Performance?

10Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 10

SATURN 2017

Performance and Storage Trade-offs by examplePerformance result - average response time for user actions in each scenario was recorded to be within the SLA. • User action in all page loads except performance dashboard - Average response time was below 3 seconds• User action in performance dashboard – Average response time was below 3 secs with spikes averaging 4.2 secs

Home Page Load Opportunity Search Results

My Scorecard Page Load My Account Dashboard – Drill into top/bottom value

Page 11: Is Your Project in Trouble on System Performance?

11Is Your Project in Trouble on System Performance?

May 1–4, 2017

©2017 [Copyright Charles Chow] 11

SATURN 2017

SATURN 2017

Title of the Presentation Goes Here©2017 [Copyright Owner(s)]

Questions?

Thank You