SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

18
Copyright © 2016 Splunk Inc. Splunk for Spine 2 Ramen Sen Lead Systems Engineer Health and Social Care Information Centre

Transcript of SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

Page 1: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

Copyright © 2016 Splunk Inc.

Splunk for Spine 2Ramen Sen

Lead Systems EngineerHealth and Social Care Information Centre

Page 2: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

2

About HSCIC

The national provider of information, data & IT systems for commissioners, analysts & clinicians in health and social care.

HSCIC is an executive non-departmental public body, sponsored by the Department of Health.

Page 3: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

3

About NHS SPINE COREThe Spine supports the NHS in the exchange of information across national

and local NHS systems. It connects clinicians and patients to essential national services including:

Electronic Prescription Service

Summary Care Record

Child Protection - Information Sharing

Page 4: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

4

NHS SPINE CORE in numbers…

Handles 6 billion messages every

year

Connects over 28,000 health care

IT systems in 21,000 organisations

Holds over 500 million records and

documents.

Peak daily volume is ~42 million

transactions

Indexes over a billion events a day.

Page 5: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

5

How We Got Started

2003/2004Contracts awarded

for Spine 1

Jul 2011Began developing Spine 2 in-house, text-log index and analysis tool chosen

(Splunk)

July 2013Spine2 went into external testing

with health software suppliers

August 2014Successful transition

to Spine 2

TodayIndexing over a

billion events a day

Page 6: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

6

Requirements for event indexing and search:Index bespoke application logs as well as product logs (NGINX/Riak etc).

Time based reporting with both matrix and charts output.

Support for inexpert users • Form based user driven generation of reports, dashboards

• Google-style query language for power users

Page 7: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

7

Requirements for event indexing and search:

Support for expert users • Includes transaction linking & API support for app building.

Scalability - O(400) GB a day with O(1000) reports every hour.

Horizontal scalability on commodity hardware.

Security (authentication and authorization control).

Page 8: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

8

Platform Performance Monitoring Architecture

A B A BLive Reference

200 Servers

Search headsnon-sensitive

non-patient data

IndexersAccess to security & audit logs

Role-based authentication2-factor authentication

Platform Performance Monitoring Platform Performance Monitoring

Page 9: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

9

We Have a Lot of Use Cases…

Reporting for business programs

SLA/performance reporting

24/7 operational monitoring

Performance/scalability monitoring

Non functional test monitoring

Deployment monitoring Incident investigation Trend analysis

H/W, OS and process monitoring Security monitoring Audit logging

Page 10: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

10

So I’m Going to Focus on Three…

Reporting for business programs

SLA/performance reporting

24/7 operational monitoring

Performance/scalability monitoring

Non functional test monitoring

Deployment monitoring Incident investigation Trend analysis

H/W, OS and process monitoring Security monitoring Audit logging

Page 11: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

11

24/7 Operational Monitoring

Page 12: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

12

24/7 Operational Monitoring

Page 13: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

13

Performance monitoring

Page 14: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

14

Incident Investigation – BusinessIdentify and resolve external incidents where a message doesn’t go through (e.g. GP record transfer or electronic prescription)

Page 15: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

15

Incident Investigation - Business Identify and resolve external incidents where a message has a processing problem.

Page 16: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

Top Tips

Page 17: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

17

Find the ‘transaction in a haystack’ with unique transaction IDs

For all Spine application log points we log:

Log LevelLog ReferenceProcessInternal ID

This allows us to trace a single message journey through the entire system, across all hosts

Page 18: SplunkLive! London 2016 - HSCIC / NHS Digital / Spine 2

Thank you!

18