SHARE REGISTRATION + RETAIL INVESTOR SERVICES + EMPLOYEE BENEFITS Architecting Mission Critical...

download SHARE REGISTRATION + RETAIL INVESTOR SERVICES + EMPLOYEE BENEFITS Architecting Mission Critical Applications Don’t forget the Instrumentation!

If you can't read please download the document

Transcript of SHARE REGISTRATION + RETAIL INVESTOR SERVICES + EMPLOYEE BENEFITS Architecting Mission Critical...

  • Slide 1

WWW.EQUINITI.COM SHARE REGISTRATION + RETAIL INVESTOR SERVICES + EMPLOYEE BENEFITS Architecting Mission Critical Applications Dont forget the Instrumentation! Mike Jolliffe Chief Technology Officer Slide 2 2 Overview of Equiniti + Market Leader in UK Share Registration Services + Partnering around 57% of the FTSE 100 & 40% of FTSE 250 + Manage over 24 million shareholder accounts + Offices in London, Worthing, Birmingham, Bristol, Edinburgh and Jersey + Separated from Lloyds TSB Group on October 1 st 2007 after 50 years + Emphasis on growth Slide 3 3 What makes an application Mission Critical? A Business dependency on that application for core business activity Tests such as these may help you determine the importance of the application:- + Would the business survive a major outage of this application? + A need for high-9s availability over a normal processing period + Regulatory drivers for the applications availability such as Crest settlement capability Slide 4 The Goals of this Presentation + To highlight that whilst there is focus on infrastructure availability, there is not always the same degree of attention to application stability + To make a case for investing in the design effort for planning for application failures. + To show that understanding the Users perspective on the applications behaviour helps both the successful development of the application, and all of the ongoing support effort 4 Slide 5 5 Our Mission Critical Application. For Equiniti, a system we call Sirius is at the core of what we do for our Share Registration and Employee Share Scheme clients. This is our Mission Critical Application + Started in 2003, live from Q1 2006, as a 40m project to re-engineer our processes and replace our aging OpenVMS systems. This was always going to be more than a straight application rewrite. + The development was led by Accenture, with HP and Microsoft providing Hardware and Software resources respectively + To date this is over 140,000 man days of effort, producing over 2m lines of code, and 2500 classes. We continue to extend the system for new capabilities as we as a business expand this is not a static application! + Integrated custom workflow, Imaging, real time work prioritisation. + Technology stack is Windows 2003 R2,.Net Framework 3.0, SQLServer 2005. We started out on Framework 1.1 and SQLServer 2000. Slide 6 6 What makes a Mission Critical Application successful Application Design characteristics such as:- + Componentisation / Abstraction delivering Clearly defined interfaces and Service boundaries + Interoperability through those services to other internal or external systems e.g. Integration to Call Centre or Website technologies to re-use functionality already developed for one channel through other delivery mechanisms + Flexibility and adaptability of the actual application to changing business needs Infrastructure Design & Non-functional characteristics such as:- + Design for Availability + Design for Scalability + Processing Performance + Data Integrity i.e. No committed transactions could be lost as part of a system failure Slide 7 7 So you have a Standard application..e.g. Sirius C# / ASP.Net UI for Internal business users C# classes for business logic and data access via Genome (ORM + ) +ORM Object Relational Mapping WCF based web service call SQLServer 2005 Internal UI is just one Channel - any Channel can use the same web services Takes 1.4 million hits per day on average Database of 2TB Some tables partitioned for size Some tables partitioned to achieve data deletions The application exposes web services to be consumed by different channels Over 2500 classes each providing methods to achieve specific business functions Web (any channel) App Database Classic 3-Tier Slide 8 Passive 8 Sirius is physically deployed like this Web App Active Web App Web App Web App Web App Web App X NLB Users Cluster + SAN storage Slide 9 9 Sirius is physically deployed like this Slide 10 10 Sirius is physically deployed like this SAN storage Data Centre 1 Data Centre 2 3 rd location Resilient High speed fibre network Triangulation for DR Synchronously mirrored SAN storage Mission Accomplished! NLB Slide 11 11 Is the Application as resilient as the Infrastructure? Applications must be architected to be as resilient as the Infrastructure to highlight when it fails and what caused it. + Do you architect into the application, from the outset, the basic needs of fault diagnosis? + You measure infrastructure resilience on the time to recover from an outage, if its even detectable by the end user. Do you do that for your Application? + It is not about writing endless logs (but the quality of log entries on a failure is important). + It is about instrumentation in your application that tells you in near-real-time whats happening. + For a successful Application you need to be able to:- + Detect there is a problem (before the users flood the service desk with calls) + Restart failed services &/or Recover the damage that might have been done by a failed process. Slide 12 The End User Perspective + Users see System Availability as their ability to use the application which is not the same as the infrastructure being up and running + It means that the application must be up, running, and performing fast enough for them to get their work done. + Before you start architecting the solution ensure you understand what your users expect you to achieve in terms of availability and performance as to them they tend to be one and the same thing. If it goes slowly it can be almost as bad as not being available at all. + Determine ahead of the development what the impacts of failure will be this helps drive the right architectural and non-functional requirements for a Mission Critical App. + The cost of downtime lost revenue (s) + Reputational damage financial impact (s) + Regulatory breaches & potentially financial penalties (s) 12 Slide 13 Steps to take in the application + As Architects you must consider from the start how errors will be handled within the application and ensure that development standards reflect your decisions + Developers must implement proper error trapping, and make informed decisions with how they raise that error, and the degree of criticality. + Should retries be coded in the app (such as a timeout during a cluster failover) + Should the error be raised to the calling process / written to the Event Log to allow a graceful failure? + What goes into the Event Log must be meaningful and complete + Unique error description and number this allows tools such as System Centre to pick up the error. + Have pre-defined actions configured for System Centre wherever the corrective action is clear from the error code. Treat changes/updates in these actions as part of future code releases so they get deployed with application patches that might change the recovery action. + Ensure precise details of failing component are recorded, including call stack 13 Slide 14 Steps to take in the application + In the case of Windows Services especially, build in support for using WMI to monitor the service. + Over and above any monitoring tool output, consider what reports you can provide Service Management with that will give early warning of problems such as performance degradation. + In addition to any application specific tables you can analyse, some other great sources of information come for free + IIS logs Load to a database every 60 seconds via a SQLAgent job and the Logparser tool to get a picture on interactive page performance + SQLServer 2005 Management Views for query performance and resource utilisation + Infrastructure performance data from Perfmon or WMI calls to show hotspots as they occur + These types of reports tell you about the Application, but they also tell you about how your users make use of your Application. This feeds into planning for infrastructure, support and enhancements 14 Slide 15 Sirius status reports End user performance An analysis of the IIS weblogs from each webserver, imported into a database and displayed via Reporting Services 15 Slide 16 Sirius status reports End user performance Combining the interactive response with a graph that shows background processes allows correlation of performance dips to tasks that may be causing them and hence allowing better scheduling 16 Slide 17 Sirius status reports Performance by Transaction Analysis of the performance by page name is used to highlight those pages that fall outside performance expectations and allows prioritisation of development resource to tune that process. This report supports drilling down through multiple levels to see specific details 17 Slide 18 Final Thoughts + Plan for the Application failing in the same way that we already plan for hardware / networks failing. Get a framework for error management in place and document the big scenarios, you wont catch all of the smaller ones in design. Then tune the error management process during testing + Architect-in the needs of the support teams who will have to diagnose and fix application failures. If they dont have the information they need recorded by the failure event, then the time to rectify is greatly extended. + Before you start architecting the solution ensure you understand what your users expect you to achieve in terms of availability and performance an all the consequences for not achieving these requirements + Share your findings about the application usage with the business end users this can help them change their work patterns, process flows etc to maximise the systems potential 18 Slide 19 References + Systems Centre + Home page Http://www.microsoft.com/systemcenter/operationsmanager/en/us/default.aspx Http://www.microsoft.com/systemcenter/operationsmanager/en/us/default.aspx + SQLServer Reporting + IIS Reports starter pack http://www.microsoft.com/downloads/details.aspx?FamilyID=2805D337-14C7-40E3- 820B-E7EE653C68C0&displaylang=en http://www.microsoft.com/downloads/details.aspx?FamilyID=2805D337-14C7-40E3- 820B-E7EE653C68C0&displaylang=en + Contact details [email protected]@Equiniti.com + Shareview the Shareholder & Investor portal + http://www.Shareview.co.uk http://www.Shareview.co.uk 19 Slide 20 20