ITIL Service Design Package - DHCP
-
Upload
duncan-potter -
Category
Technology
-
view
162 -
download
13
Transcript of ITIL Service Design Package - DHCP
Copyright: Duncan J. Potter
1st May 2015
SERVICE DESIGN PACKAGE
Resilient, Wide Area Network, Managed-DHCP Service
This document represents a sample of work by the author carried out in 2010.
Content has been updated to conform to ITIL v3 2011 Service Design standards.
For consistency of presentation, original content has been reformatted to the
common theme used for all work samples by the author.
Content has been vetted to ensure information contained within cannot present a
security risk to any institution in which it originated. Institutional identifiers
have been obscured or removed. Sensitive or security-related technical
information (e.g. IP numbers) has either been removed or, where retention is
required for clarity, substituted with dummy information.
Resilient, Wide Area Network, Managed-DHCP Service 1
SERVICE DESIGN PACKAGE
Resilient, Wide Area Network, Managed-DHCP Service
Contents
Summary ........................................................................................................................................ 3
1.Requirements .............................................................................................................................. 4
1.1 Business Requirements ........................................................................................................ 4
1.1.1 Background .................................................................................................................... 4
1.1.2 Service Changes Required ............................................................................................. 4
1.2 Service Applicability ............................................................................................................. 4
1.3 Service Contacts ................................................................................................................... 5
2. Service Design ............................................................................................................................ 5
2.1 Service Functional Requirements ....................................................................................... 5
2.1.1 Current Functionality, v2 ............................................................................................. 6
2.1.2 Drivers for functionality change ................................................................................... 7
2.1.3 Formal specification of functionality (utility) requirements for v3 ............................ 8
2.1.4 Formal specification of warranty requirements for v3 ................................................ 8
2.1.5 Planned Outcomes for v3............................................................................................... 9
2.2 Service Level Requirements ................................................................................................ 9
2.1.1 Availability ..................................................................................................................... 9
2.2.2 Capacity .......................................................................................................................... 9
2.2.3 Security ........................................................................................................................... 9
2.2.4 Operational Level Requirements ................................................................................ 10
2.3 Service and Operational Management Requirements for v3 .......................................... 10
2.4 Service Design and Topology ............................................................................................. 10
2.4.1 Service Definition ........................................................................................................ 10
2.4.2 Service Cost .................................................................................................................. 11
2.4.3 Using the Service ......................................................................................................... 11
2.4.4 Value Proposition ......................................................................................................... 11
2.4.5 Service Model ............................................................................................................... 11
2.4.6 Service Options ............................................................................................................ 21
2.4.7 Components & Infrastructure v3 ................................................................................ 22
Resilient, Wide Area Network, Managed-DHCP Service 2
2.4.8 Supporting Services ..................................................................................................... 23
2.4.9 Operational Procedures ............................................................................................... 23
2.4.10 Component-Specific Documentation ......................................................................... 25
3. Organizational readiness assessment ................................................................................. 26
3.1 Business Benefit ............................................................................................................. 26
3.2 Financial Assessment ..................................................................................................... 26
3.3 Technical Assessment ..................................................................................................... 26
3.4 Resource Assessment ...................................................................................................... 26
3.5 Organisational Assessment ............................................................................................ 27
4. Service lifecycle plan ............................................................................................................. 27
4.1 Service Programme ............................................................................................................ 27
4.1.1 Management, coordination and integration with any other projects ....................... 27
4.1.2 Management of risks and issues ................................................................................. 27
4.1.3 Scope, objectives and components of the service ....................................................... 28
4.1.4 Skills, competences, roles and responsibilities .......................................................... 28
4.1.5 Processes required ....................................................................................................... 30
4.1.6 Interfaces and dependencies with other services ...................................................... 30
4.1.7 Management of teams, resources, tools, technology, budgets, facilities required ... 31
4.1.8 Management of suppliers and contracts .................................................................... 31
4.1.9 Progress reports, reviews and revision of the programme and plans ...................... 31
4.1.10 Communication plans and training plans ................................................................ 31
4.1.11 Timescales, deliverables, targets and quality targets for each stage ..................... 32
4.2 Service Transition Plan ...................................................................................................... 35
4.2.1 Build policy, plans and requirements ......................................................................... 35
4.2.2 Transition Strategy...................................................................................................... 35
4.2.3 Build Policy .................................................................................................................. 35
4.2.4 Testing policy, plans and requirements ..................................................................... 35
4.2.6 Deployment policy, release policy, plans and requirements ..................................... 37
4.3 Service Operational Acceptance Plan ............................................................................... 38
4.4 Service Acceptance Criteria ............................................................................................... 39
Appendix 1 Component-Specific Documentation ................................................................... 40
Appendix 2 Risk Register ........................................................................................................ 48
Resilient, Wide Area Network, Managed-DHCP Service 3
SERVICE DESIGN PACKAGE
Resilient, Wide Area Network, Managed-DHCP Service
SUMMARY
Service Name: Managed DHCP
Provider: Managed Desktop Group, Computing Service
Type: Supporting
Supported Services: Managed Desktop, Managed Power
Service History: v3 Resilient Linux architecture, DHCP Relay-based
v2 Mixed NetWare / Novell OES architecture, VLAN-based
v1 Novell-only architecture, VLAN-based
This document: v2 to v3 update, July-November 2010
Overview: Major update driven by architectural change in the wide
area network (WAN) which the service relies upon.
Resilient, Wide Area Network, Managed-DHCP Service 4
SERVICE DESIGN PACKAGE
Resilient, Wide Area Network, Managed-DHCP Service
1.REQUIREMENTS
1.1 Business Requirements The initial agreed and documented business requirements.
1.1.1 Background
The University operates a Wide Area Network (WAN) across the city servicing
geographically dispersed departments and colleges. A University Computing Service
programme to upgrade the resiliency of the WAN is currently underway, ensuring each site
(college or university department) is served by both a primary and secondary router to guard
against failure of either one.
The University Computing Service also provides a Managed Desktop service comprising over
1,800+ network entities (PCs, Macs, printers) on 47 WAN sites including 18 departments, 22
Colleges and 3 Computer Service teaching labs. The Managed Desktop service itself depends
upon a further, supporting network service, the Managed DHCP service, which provides IP
numbers to network devices, facilitates remote-imaging and also provides information used
by a Managed Power service which safely powers down both managed and unmanaged PCs
across the WAN, both overnight and after a specified amount of idle time. DHCP stands for
Dynamic Host Configuration Protocol.
1.1.2 Service Changes Required
The current Managed DHCP service architecture is incompatible with the new WAN
architecture. To sustain operation of the Managed Desktop and Managed Power services, a
compliant DHCP service requires to be introduced prior to completion of the WAN resiliency-
upgrade project.
1.2 Service Applicability This defines how and where the service would be used. This could reference business, customer
and user requirements for internal services.
The new DHCP service will be established and operated by the Managed Desktop Group
within the Computing Service in support of the Managed Desktop service which they also
provide.
Service-related infrastructure will reside within the central Operations Room of the central
Computing Service. This excludes routers which are considered part of the network
infrastructure, and which are strategically located throughout the WAN.
Resilient, Wide Area Network, Managed-DHCP Service 5
1.3 Service Contacts The business contacts, customer contacts and other stakeholders in the service.
Business Contacts - University Computing Service
Service owner Head of Systems, Managed Desktop Group
Service design Head of Systems, Managed Desktop Group
Service operation Applications Team, Managed Desktop Group
Service Issues Helpdesk
Service Enquiries Head of Systems, Managed Desktop Group
Technical Advisors - University Computing Service
Networking Head, Network Services Group
Windows Managed Desktop Group
Linux UNIX Support Group
Apple Mac Support Group
Customer Contacts Site Computing Officers, Colleges and University Departments
WAN Programme liaison Head, Network Services Group, University Computing Service
2. SERVICE DESIGN
2.1 Service Functional Requirements The changed functionality (utility) of the new or changed service, including its planned outcomes
and deliverables, in a formally agreed statement of requirements (SoR).
As a very large change in both architecture and functionality is required, in order to
communicate the required changes the current service architecture is first described in
overview and more detail.
The architecture is, necessarily, somewhat complex but is greatly simplified in the new
service design.
Resilient, Wide Area Network, Managed-DHCP Service 6
2.1.1 Current Functionality, v2
Summary
The Managed DHCP service offers the following functionality
i. Provides IP configuration information to Managed Desktop service Intel PCs
and network printers
ii. Facilitates PXE-based imaging of Intel PCs, returning the location of
intermediate boot files hosted on Windows servers within the Computing
Service and downloaded via the TFTP protocol
iii. Provides printers with the location of their configuration files; these are
hosted on Windows servers within both the Computing Service and sites
themselves, downloaded via the TFTP protocol
The service underpins all 47 Managed Desktop sites.
It operates via 6 DHCP server processes within the University Computing Service,
with each process serving an average of 8 sites. Each process resides on a separate
physical server which also performs other duties.
Apple desktops within the Managed Desktop service receive both their IP & image
configuration from a separate Apple NetBoot server.
Detailed
The network aspects of the service require to be conveyed as major functionality changes in
this area will take place.
To service a DHCP broadcast request originating from hosts on any particular site, a
DHCP server requires to be on the same VLAN (network broadcast domain) as that
site.
Each site has its own, unique numerical VLAN ID and all VLANs are routed across
the WAN. As VLANs are broadcast-domain entities, the DHCP broadcast mechanism
whereby requests/assignment of IP configuration information is made, is therefore
also operable across the WAN.
As a legacy, a group of sites is handled by a specific DHCP server.
There are 6 servers in all, 5 running NetWare 6.5 & 1 running Novell OES.
DHCP information itself is stored within Novell eDirectory.
After passing through the WAN, VLAN traffic from any particular group of sites
passes through a series of switches within the Computing Service.
o Some switches pass traffic from every site VLAN to Windows and Apple
servers to facilitate PC and Mac imaging operations.
o Other switches pass only VLAN traffic from the sites which a DHCP server
supports to that server, so as to minimise network traffic.
Resilient, Wide Area Network, Managed-DHCP Service 7
Architecture of the current, v2 Managed DHCP service
Figure 1. Architecture of the current, v2 Managed DHCP service
2.1.2 Drivers for functionality change
Primary
The VLAN-enabled broadcast-forwarding mechanism underpinning the current
DHCP service, v2, will cease to be available as the WAN Resiliency Upgrade
programme will remove all VLAN traffic from the WAN by December 2010.
DHCP and Mac imaging activities require to be handled by the new Managed DHCP
service as the Apple NetBoot server is scheduled for retirement.
Resilient, Wide Area Network, Managed-DHCP Service 8
Secondary
The current service, v2:
Requires in-depth network configuration
(which admittedly then remains largely static for the duration of the service lifetime)
Is spread over servers having different operating systems with different management
interfaces
Lacks centralised monitoring
Lacks resiliency against infrastructure failure or service configuration error
2.1.3 Formal specification of functionality (utility) requirements for v3
The updated service, v3, should facilitate:
Provision of IP information to hosts (PC, Mac, printer) on Managed Desktop service
sites via the Dynamic Host Configuration protocol (DHCP)
PXE-based imaging for Managed Desktop Intel PCs to permit remote deployment of
operating systems and pre-configured applications, referred to here as desktop
images
1NetBoot-based imaging for Managed Desktop Apple PCs, permitting remote
deployment of operating systems and pre-configured applications, with the further
potential to host Mac image files
Operation of the Managed Power service by publishing DHCP host configuration
information via LDAP
Configuration of network printers in Managed Desktop sites by
o returning printer configuration-file location information
o hosting printer configuration files
o serving printer configuration files
2.1.4 Formal specification of warranty requirements for v3
Simple architecture, easing operation and support
Centralised monitoring and diagnostics, enhancing issue resolution time
High resiliency against
o infrastructure component failures
o service configuration errors
Point-in-time recovery to previous DHCP configurations for each site
Ease of operation - service use, configuration, backup
1 NetBoot uses the Boot Server Discovery Protocol (BSDP) to communicate network boot image options
between clients and servers. BSDP is implemented within the Vendor Options of DHCP as defined in
RFC 2132.
Resilient, Wide Area Network, Managed-DHCP Service 9
2.1.5 Planned Outcomes for v3
A resilient Managed DHCP service entailing
Establishment of a service architecture and associated infrastructure, meeting the
formal requirements of functionality and warranty
Removal of Managed Desktop site VLAN information from Computing Service
network switches and of DHCP processes from Novell servers hosting the current
service
Configuration of WAN routers (by Network Services), adding IP Helper addresses of
new DHCP servers as a replacement for VLAN-based DHCP broadcast forwarding
Documentation of the new service and training of staff required to maintain &
support its operation
2.2 Service Level Requirements The service level requirements (SLR), representing the desired warranty of the service for a new
or changed service. Once specific service level targets have been agreed and validated, the revised
or new service level agreement (SLA), including service and quality targets.
2.1.1 Availability
Service Availability Target: 24/7 x 365 is highly desirable
Service Availability - Mandatory: normal University working days and hours
Formally supported operational hours: normal University working days and hours
Maintenance windows: service functionality should be not be degraded during
formally-supported operational hours
2.2.2 Capacity
The service should easily accommodate increases in usage of the Managed Desktop
service which it primarily supports, with a minimum initial capacity sufficient to
accommodate 100% growth of the current Managed Desktop service.
2.2.3 Security
Physical access to Computing Service infrastructure elements will, as far as possible,
be confined to Computing Service personnel. Where elements of such infrastructure
reside on Managed Desktop site locations (e.g. routers, switches) they should be
protected from physical access by unauthorised personnel.
Network-access to service infrastructure and associated computer-processes
comprising the service should be restricted to a) hosts requiring such access as part of
normal service operation and b) Computing Service personnel / hosts.
Resilient, Wide Area Network, Managed-DHCP Service 10
2.2.4 Operational Level Requirements
The Managed DHCP service is itself dependent upon proper functioning of:
The University WAN, whose operation is managed by Network Services group of the
Computing Service
Local area networks (LANs), whose operation is managed by staff within
o Network Services Group
o Managed Desktop Group
o Site computing staff (Colleges, University Departments)
The VMware vSphere environment, operated by the Managed Desktop Group, which
hosts all virtual servers used by the service.
2.3 Service and Operational Management Requirements for v3 Management requirements to manage the new or changed service and its components, including
all supporting services and agreements, control, operation, monitoring, measuring and reporting.
The service falls within the remit of the Managed Desktop Group, Computing Service, to
operate, monitor, measure and improve upon.
Operational management requirements are limited to adding a periodic check of the service’s
web interface to routine monitoring rotas of Computing Service operations staff.
2.4 Service Design and Topology
2.4.1 Service Definition
The Managed DHCP service is a core supporting network service underpinning
The Managed Desktop service
The Managed Power service
Service functionality is formally specified in the Service Functional Requirements section
earlier, but in summary the Managed DHCP service facilitates:
Centralised IP configuration of 1,800+ network entities (Intel/Apple PCs, printers) on
47 WAN sites including 18 departments, 22 Colleges and 3 Computer Service
teaching labs.
PC imaging operations
Network printer configuration
Access to host PC information required by the Managed Power service
Resilient, Wide Area Network, Managed-DHCP Service 11
2.4.2 Service Cost
The cost of the service is absorbed as part of the Managed Desktop service.
There is no charge to customers for the Managed Power service, this being an added bonus
arising from information contained within the Managed DHCP service.
2.4.3 Using the Service
The service will be operated by the Managed Desktop Group within the University
Computing Service, in close liaison with Computing Officers of Managed Desktop sites.
Site computing officers should
Provide a range of IP numbers for Managed Desktop instances at their sites to the
Desktop Service Group who will configure the service
Have access to service debugging facilities
Issues arising from use of the service should be communicated to the Helpdesk, University
Computing Service.
Questions about the service may be directed to the Helpdesk above or, if preferred, the
Applications Team / Head of Systems within the Managed Desktop Group.
2.4.4 Value Proposition
The Managed DHCP service provides value to its customers and users when
Managed Desktop PCs and network printers are provided with IP configuration
information remotely
Managed Desktop PCs can be remotely deployed with a pre-designed image
comprising OS and configured applications
Both Managed Desktop and unmanaged PCs registered by site computing officers can
be powered down at night and after specified periods of inactivity, reducing electricity
costs, by the Managed Power service (which used Managed DHCP service
information)
... without need of time-consuming, on-site, per-device, manual entry of static IP
configuration information by site computing staff or University Computing Service.
2.4.5 Service Model
For ease of reference, the v3 service model is divided into three distinct categories:
1. Primary Functionality - how hosts are assigned IP information by the service
2. Configuration - how the service handles changes to host information
3. Diagnostics - how the service presents diagnostic information
Before each is presented, a very brief overview of the service architecture is deemed helpful.
Resilient, Wide Area Network, Managed-DHCP Service 12
The architecture of the new, v3 service is straightforward:
a) Each WAN site is served by two Cisco routers, each configured (via Cisco's IP Helper
mechanism) to forward DHCP broadcast traffic to two central DHCP servers.
b) Each DHCP server supports the entire Managed Desktop service, holding IP
information for every host on every site. Both servers will offer IP information to the
host and the host will choose which one to accept.
c) The DHCP protocol also supports provision of additional information to hosts:
o a printer may request the location of its configuration file
o an Intel or Apple PC may request the location of its boot kernel image server
Each DHCP server receiving such a request will respond with the required
information, if such is configured.
Printer configuration files and PC boot kernel images are stored on servers,
accessed via the TFTP protocol. Here, a Windows Server hosts PC boot kernel
images and the both Linux DHCP servers host Apple PC and printer
configuration files.
d) The DHCP servers are Linux based virtual machines. They forward all DHCP Linux-
service logs to a new, central Syslog server which stores it in a MySQL database for
speed of access. The Syslog server is also a Linux-based virtual machine.
e) A Web server presents service diagnostic information, querying both the Syslog
server and receiving information from the DHCP servers directly on the service
running status. The web server is also a Linux-based virtual machine.
Resilient, Wide Area Network, Managed-DHCP Service 13
The New, v3 Service Architecture
Figure 2: Architecture of the new, v3 Managed DHCP service
With reference to the above architecture, the first service model - primary functionality - is
presented in Figure 3 next.
Resilient, Wide Area Network, Managed-DHCP Service 14
Service Model - v3 Primary Functionality
Figure 3: Service Model - v3 Primary Functionality
Resilient, Wide Area Network, Managed-DHCP Service 15
Service Model - v3 Primary Functionality
Note that in the process model illustrated in Figure 3, a Windows/Linux PC boots and is then
provided with IP configuration information by the primary DHCP server, then it requests
and receives the location of the next server which stores its boot-kernel image.
An Apple PC boot would follow an identical process, but obtain its boot-kernel image
from the DHCP servers (which also run TFTP services)
A network printer would follow an identical process, but obtain its printer
configuration file from the Linux DHCP server.
Principles of Operation
BOOT and DISCOVER
A Managed Desktop site host (Intel / Apple PC or network printer) boots.
If the host has IP information then the boot process continues, otherwise it issues a
DHCP Discover broadcast packet.
SWITCHING and ROUTING
Ethernet switches on the site will pass the Discover broadcast to one or other of the site's
primary and secondary routers.
Appropriate site switches require to have the VLAN on which Managed Desktop
hosts reside added to their switch-switch, and switch-routers, VLAN-trunk routes.
If the VLAN information is correctly configured the packet is forwarded along ports
tagged with the designated VLAN, else it is dropped.
Whichever site router receives the Discover packet, it recognises the broadcast as belonging
to DHCP and forwards it to both the Primary and Secondary DHCP servers.
To succeed:
Each site router requires to have IP Helper correctly configured else the packet is
dropped.
Appropriate Computing Service switches, including VMware virtual switches, require
to have the VLAN on which the DHCP servers reside added to their switch-switch
VLAN-trunk routes and on appropriate switch-server VLAN-trunk ports.
If the VLAN information is correctly configured the packet is forwarded along ports
tagged with the designated VLAN, else it is dropped.
OFFER and REQUEST
Both DHCP servers receive the Discover broadcast and both issue an Offer response to the
host.
Resilient, Wide Area Network, Managed-DHCP Service 16
In order for the Offer packet to reach the host, the networking conditions under
Switching and Routing must be satisfied
The host receives both Offers and decides which offer to accept (usually the first server offer
received) but issuing a Request packet to one of the DHCP servers.
In order for the Request packet to reach the server, the networking conditions under
Switching and Routing must be satisfied
ACKNOWLEDGE
Upon receipt of the Request packet, the DHCP server issues a formal Acknowledge packet
and internally records that it is serving the host with IP information for a leased period of
time.
In order for the Acknowledge packet to reach the host, the networking conditions
under Switching and Routing must be satisfied
LOG
Every DHCP-related request (Discover Offer Request Acknowledge) is logged by each DHCP
server to a (virtual) Syslog server. A MySQL database, optimised for DHCP information
logging, stores this information which is captured via an rsyslog daemon.
All TFTP server instances, be they running on Linux or Windows servers, also log to the
syslog server.
A Web server acts as interface to this information, which is covered in the Service
Diagnostics service model area.
The service model for configuration follows.
Resilient, Wide Area Network, Managed-DHCP Service 17
Service Model - v3 Configuration
Figure 4: Service Model - v3 Configuration
Resilient, Wide Area Network, Managed-DHCP Service 18
Service Model - v3 Configuration
Layout and Administration of the DHCP configuration files
At server process start-up, the DHCP server process reads a single, master configuration file
which itself include other configuration files. Changes in configuration are only enacted after
the DHCP server process is restarted.
With so many hosts spread over many sites, ease of administration is a core Managed DHCP
Service requirement. To this end, the main DHCP process configuration text file is short and
include statements incorporate configuration files for each site's subnet, PCs, Macs and
printers. There are thus 4 x 47 files configuration files, each type being stored in its own
directory (so 4 directories). This also facilitates rapid location of the relevant configuration
file to change.
Principles of Operation
All changes to configuration files should be made on the Primary DHCP server.
If a configuration file change is detected, customised web search pages are generated
for service debugging and an LDAP tree configuration file is generated.
If the Primary DHCP service restarts successfully, the changed configuration files
and files generated above are mirrored (via rsync) to the Secondary DHCP server; the
secondary DHCP server is then restarted (from the Primary via SSH).
If the Primary DHCP service fails to start, the Secondary DHCP server continues to
provide the Managed DHCP service whilst the Primary is down. Therefore human
checking of service status should be conducted prior to, and after, any configuration
file change.
The operational steps described above are covered below in a little more detail.
BOOT
The Primary DHCP server boots and starts the DHCP server process, reading it's
configuration file (which includes the 4 x 47 smaller configuration files)
LOOP
Monitor the configuration files on the Primary DHCP server for changes, every 2
minutes.
o When a file is modified, its timestamp changes; this is then compared to the
timestamp of log files generated when a change was last detected
Should a file have changed, before restarting the DHCP service, generate:
o a web page stating which configuration file changed with a link to the file
itself; if the service fails to start it is likely an error was made in this file
Resilient, Wide Area Network, Managed-DHCP Service 19
o a change log for the subnet, PC, Mac or printer, depending which type of
configuration file was changed; the timestamp of this file is used in the first
step above
o customised search/debug web pages for each host (more on this later)
o an LDAP tree configuration file containing information about every host in
the service
o logs related to every step of this cycle, with associated date/time information,
and running status of the DHCP service itself; these will assist with in-depth
debugging should such be required
RESTART
Restart the Primary DHCP service to re-read its changed configuration files
MIRROR
If the restart succeeded:
o Mirror the configuration files to the secondary DHCP server and restart the
DHCP server process there
o Mirror the customised search/debug web pages to the Managed DHCP web
server
o Refresh the LDAP service, reading the new LDAP tree configuration file
Whether the restart succeeded or failed
o Mirror all logs to the Managed DHCP web server; this will provide an easy-
to-use web interface to valuable debugging information
Point-in-time Configuration Recovery
A key warranty specification was point-in-time recovery of site DHCP configuration
information. Not shown in Figure 4, a configuration backup script runs daily (via the cron
scheduler) and performs a complete copy of all site DHCP configuration files to a sub-
directory on the DHCP server. This permits rapid restoration of both site subnet and site
host information with a high-level of granularity.
As site configuration files are text files, they take very little disk space. A full month of daily
backups is kept, the backup task itself running at 6am.
The service model for diagnostics follows.
Resilient, Wide Area Network, Managed-DHCP Service 20
Service Model - v3 Diagnostics
Figure 5: Service Model - v3 Diagnostics
Illustrating the relationship between the diagnostics web interface and the scripts which
power it, all stemming from a site's DHCP configuration files. A text description follows.
Resilient, Wide Area Network, Managed-DHCP Service 21
Service Model - v3 Diagnostics
Principles of Operation
Formal service warranty requirements include centralised monitoring and diagnostics .
Q1: How is monitoring and diagnostic information recorded?
Shell scripts and PERL scripts involved in Service Configuration record the outcomes
of their actions by writing small, text-based log files. These are transferred to the web
server; some may include HTML/PHP mark-up to assist with viewing.
The DHCP server process itself logs all transactions to the 2Syslog server.
Q2: How is monitoring and diagnostic information presented?
For diagnostic purposes, one has available:
A small number (~12) of script-generated, text-based log files
Very large numbers (~ hundreds of thousands per week) of DHCP-related Syslog
database records
A web interface presents this information very simply, as follows:
A master HTML diagnostics page presents to the user an alphabetical list of
Managed Desktop service sites
o When selected, each site present links specifying: PC, Mac or Printer records
o When selected, each link presents an alphabetical list of the site's PCs, Macs
or Printers by their hostname, showing MAC address and the assigned IP
number.
o When selected, the Syslog database server is queried and host results are
returned, processed for readability, and displayed in typically a few seconds
o Additional service health and diagnostic information is included via links to
the text-based log files (some with HTML mark-up) generated by the scripts
shown in Figure 4, permitting in-depth debugging.
2.4.6 Service Options
The Managed DHCP service is a supporting / enabling service and has no options.
2 To accomplish this, the local Syslog process on each DHCP server (virtual machine) requires be
configured to forward events to the rSyslog (remote Syslog) service on the Syslog/database server
where these are finally logged in a MySQL database.
Resilient, Wide Area Network, Managed-DHCP Service 22
2.4.7 Components & Infrastructure v3
Figure 6: Components and Infrastructure v3
Resilient, Wide Area Network, Managed-DHCP Service 23
2.4.8 Supporting Services
The Managed DHCP service is dependency upon correct functioning of the following services
and infrastructure provided by the Managed Desktop Group:
VMware vSphere virtual server service
Network switch infrastructure
Dependencies outside the Managed Desktop Group which are owned by both the Computing
Service and Managed Desktop sites themselves:
Computing Service core Ethernet switch and WAN router infrastructure
Host Ethernet / NIC cards (sites only)
2.4.9 Operational Procedures
This section covers
Standard operational procedures
Maintenance procedures
Standard Operational Procedures
CH1. Changes to DHCP host IP configuration information
Managed Desktop site contacts, usually computing officers, will notify an
administrator in the Managed Desktop Group that a host has changed and
provide configuration information. This may be directly via telephone or email or
via the central Help Desk.
Administrative staff within the Managed Desktop Group will then
o check the DHCP service is running via its diagnostic web interface
o update the relevant configuration file on the Primary DHCP server
o wait at least 2 minutes for the automated detection cycle to detect the change
o verify the change has been implemented by checking the diagnostic web
interface
o reboot the host and again use the diagnostic web interface to check correct
assignment of IP information
CH2. Changes to printer configuration information
As with CH1, however site contacts provide the change in printer configuration
information
Administrative staff within the Managed Desktop Group then
o Locate the printer's configuration file, which may reside on either of the
virtual DHCP servers and will be found under the TFTP root directory
o Update the printer configuration file
o Reboot the network printer
Resilient, Wide Area Network, Managed-DHCP Service 24
o Use the diagnostic web interface to check correct assignment of the
configuration file via TFTP download
o Liaise with the site contact to ensure the required change has been enabled.
Maintenance Procedures
The following maintenance procedures should be performed by administrative staff within
the Managed Desktop Group.
MT1. Virtual server health-check
The virtual Linux servers at the heart of the Managed DHCP service are each configured to
be fully-automated in their service provision and maintenance tasks. Specifically:
After a change to any DHCP configuration file, the Primary DHCP server recreates
all necessary search web pages and LDAP representations of this information, and
restarts the service. If the restart is successful, this information is replicated to the
secondary DHCP server.
All DHCP records are exported weekly from the Syslog database by a shell script.
The database is then entirely cleared and optimised, the latter compacting the
physical database file after the removal of records.
Weekly Actions:
Check each virtual server has sufficient disk capacity; this is especially important
for the Syslog MySQL database server
Check the Syslog MySQL database automated backup is being performed
Check the DHCP servers are generating in-depth debug logs files
Check DHCP service functionality by performing a dummy update on a
configuration file; saving the file will update its timestamp and should trigger the
service refresh cycle
MT2. Virtual DHCP server backup
Monthly actions:
Notify administrators in the Managed Desktop Group that a maintenance period
is scheduled during which no changes should be made
Shutdown the primary virtual DHCP server. The secondary server will handle all
DHCP requests
Copy the virtual server configuration and disk files to an offline location. See the
vSphere service documentation for the approved backup procedures
Restart the primary DHCP server
Repeat this process for the secondary DHCP server
Resilient, Wide Area Network, Managed-DHCP Service 25
MT3. Virtual Syslog Database server backup
This server has no service resiliency other than vSphere's vMotion feature which will restart
the server on another ESX host should the host the virtual server resides on lose contact
with the network.
Nonetheless, it may be periodically shutdown after normal service hours or during a
scheduled maintenance session which should typically take no longer than 15 minutes.
Monthly actions
Notify administrators in the Managed Desktop Group that a maintenance period
is scheduled during which no changes should be made
Notify Managed Desktop site contacts that the Web interface to the service
diagnostics will be unavailable during the maintenance period
Shutdown the virtual Syslog server
Copy the virtual server configuration and disk files to an offline location. See the
vSphere service documentation for the approved backup procedures
Restart the virtual Syslog server
MT4. Ethernet switch configuration backup
Ethernet switches used by the service falls under the control of
Managed Desktop Group, Computing Service
Network Services, Computing Service
Site computing contacts
Monthly actions
It is important that a backup exists of each switch's configuration and that this is
kept up to date following any changes made to the switch. As these are
infrequent, a monthly cycle is considered appropriate. For Cisco switches using
IOS, the mechanism is detailed under the "component-specific documentation" in
Appendix 1.
2.4.10 Component-Specific Documentation
This information has been moved to Appendix 1.
Resilient, Wide Area Network, Managed-DHCP Service 26
3. ORGANIZATIONAL READINESS ASSESSMENT Organizational readiness assessment’ report and plan, including: business benefit, financial
assessment, technical assessment, resource assessment and organizational assessment, together
with details of all new skills, competences, capabilities required of the service provider
organization, its suppliers, supporting services and contracts
This is the business case that outlines the expected benefits and costs plus assessments of the
required resources and capabilities of the service provider.
3.1 Business Benefit
This critical supporting service primarily facilitates the commercially-provided
Managed Desktop Service provided by the Computing Service which is a high-
visibility, high-value service to the organisation, supporting research and teaching
needs for 30,000 registered users across the University.
The Managed DHCP service facilitates the Managed Power service which has saved
Colleges and Departments many hundreds of thousands of pounds annually by
remotely powering down inactive Intel PCs overnight. A widely acknowledged, high
financial saving is thus achieved, of particular value to smaller departments.
Sustaining the Managed DHCP service therefore presents itself as a very high-
priority imperative for the Computing Service.
3.2 Financial Assessment
As Open Source software is used to deliver the solution, development and operation
of the new service incurs no explicit financial cost. Staff and infrastructure
component costs are covered by existing services and Computing Service funding.
As cited above, the service supports the Managed Desktop Service; income from that
service funds two (vital) posts within the Managed Desktop Group.
3.3 Technical Assessment
All required infrastructure for the new service is available.
The new service architecture is much simpler than the former service and is built
around commonly used components and commonly understood scripting languages
and technologies, easing further development and issue resolution.
3.4 Resource Assessment
Infrastructure resources required are sufficient and available for immediate use.
Human resources do present an issue. As the project has stalled for some years due
to a commitment by the former lead developer to host the service on a favoured
technological platform, technical development must be assigned to another individual
to meet the 5 month design-transition-operation timescale remaining.
The Managed Desktop Group does have sufficiently skilled individuals capable of
developing the new service however all are heavily engaged in other mission-critical
projects. Given the short timescale and unavailability of other capable individuals,
Resilient, Wide Area Network, Managed-DHCP Service 27
the work requires – somewhat unusually – to be carried out by the Head of Systems,
Managed Desktop Group, directly.
3.5 Organisational Assessment
Development activities will take place within the Computing Service. As the existing
service architecture is incompatible with the new WAN architecture, there is strong
commitment to introduce the new service. This arise from the high financial cost of
the WAN Resiliency Upgrade programme and continued risk of WAN-wide
disruption from using the current architecture.
Strong willingness to collaborate on technical matters requiring to be addressed in
the new service exists among stakeholders.
Expert-level technical knowledge is available (within the Computing Service) on all
components, computer services and technologies required to design, transition and
operate the service.
4. SERVICE LIFECYCLE PLAN
4.1 Service Programme An overall programme or plan covering all stages of the lifecycle of the service, including the
timescales and phasing, for the transition, operation and subsequent improvement of the new
service.
4.1.1 Management, coordination and integration with any other projects
Service Design / Service Transition
The new Managed DHCP service will be implemented as a project; the project
manager is the Head of Systems, Managed Desktop Group and project
administration will be kept to an absolute minimum given the time constraints on
implementation.
The project manager will coordinate with the WAN Resiliency Upgrade programme
manager which will see progressive removal of Managed Desktop site VLANs from
the WAN on a site-by-site basis.
4.1.2 Management of risks and issues
Risks will be identified and classified by the project manager and recorded in a risk
register; the risk register may be found in Appendix 2.
Resilient, Wide Area Network, Managed-DHCP Service 28
4.1.3 Scope, objectives and components of the service
Scope
The Managed DHCP Service is a key supporting service of the Managed Desktop
Service offered to all Schools and Colleges within the University, operated across the
WAN spanning the town, and currently having 1,800+ managed Intel PCs
(Windows/Linux), Apple Macs and network printers, primarily for student access.
Objectives
Service objectives are automated provision of IP configuration information to
Managed Desktop host PCs and printers; this also facilitates remote PC image
deployment and printer configuration.
Key Components
Key components of the service may be found in Figure 6.
4.1.4 Skills, competences, roles and responsibilities
Head of Systems, Managed Desktop Group, Computing Service
o Responsibilities - Service Design / Transition
project manager - service architect - lead technical developer
communication of project status
communication with site contacts on configuration of site
infrastructure & co-ordination of site migration to new service
o Responsibilities - Service Operation
technical advisor / trainer on service to Managed Desktop Group
o Skills Required - Service Design / Transition
project management PRINCE2
technical administration Linux - NetWare - vSphere - Cisco
scripting PERL - BASH
configuration MySQL - SQL- Apache - LDAP - DHCP
web development HTML - PHP - CSS
Managed Desktop Group, Computing Service
o Responsibilities - Service Design / Transition
configuration of Intel PCs PXE boot environment in support of PC
image deployment
acceptance testing of Intel PC (Windows) imaging operations
acceptance testing of printer configuration file deployment
o Responsibilities - Service Operation
operate and improve all processes within the Managed DHCP service
Resilient, Wide Area Network, Managed-DHCP Service 29
Managed Desktop Site Contacts
o Responsibilities - Service Design / Transition
coordinating with project manager, configure site Ethernet switches
(where applicable)
inform site users of potential service disruption during transition
activities
test host receipt of IP configuration information and printer
configuration files, providing feedback to the project manager
o Responsibilities - Service Operation
appraise Managed Desktop Service administrators of changes to site
host information e.g. new printer, new PC, changed Ethernet card
etc.
UNIX Support Group, Computing Service
o Responsibilities - Service Design / Transition
acceptance testing of 3Linux Managed Desktop Service environment
o Responsibilities - Service Operation
assist with Managed DHCP Service issue resolution relating to the
Managed Desktop: Linux environment
Mac Support Group, Computing Service
o Responsibilities - Service Design / Transition
provision of technical advice on NetBoot / Boot Service Discovery
Protocol & Mac imaging functionality
testing of Mac imaging operations, including acceptance testing
end-of-life transition for Mac NetBoot servers replaced by the new
service
o Responsibilities - Service Operation
assist with Managed DHCP Service issue resolution relating to the
Managed Desktop: Mac environment
Network Services Group, Computing Service
o Responsibilities - Service Design / Transition
provision of technical advice on advanced DHCP configuration, WAN
and Ethernet switch configurations
WAN configuration, specifically configuration of DHCP Relay and
removal of VLAN information
service transition testing during VLAN removal
programme management: WAN Resiliency Upgrade
3 Whilst the Managed Desktop Service is provided by the Managed Desktop Group, the Linux image for
dual-booting Intel PCs is provided and configured by the UNIX Support Group.
Resilient, Wide Area Network, Managed-DHCP Service 30
o Responsibilities - Service Operation
assist with Managed DHCP Service technical advice and issue
resolution, especially relating to WAN and Ethernet switch
configuration
4.1.5 Processes required
These are listed under Operational Procedures earlier.
4.1.6 Interfaces and dependencies with other services
The Managed DHCP Service is provided by the Managed Desktop Group and has the
following interfaces with other services:
Interface with: Managed Desktop Service
Windows environment
Image / applications provided by Managed Desktop Group
Linux environment
Image / applications provided by UNIX Support Group
Mac environment
Image / applications provided by Mac Support Group
Interface with: Managed Power Service
Operated by Desktop Services Group
Interface with: Managed Print Service
Operated by Desktop Services Group
Service Dependencies
Dependency: VMware Virtual Server environment
Description: Hosts the virtual servers used by the service
Operated by: Managed Desktop Group
Dependency: Ethernet switch environment, Managed Desktop Group
Description: Facilitates DHCP network traffic flow within Computing Service
Operated by: Managed Desktop Group
Dependency: WAN Router environment
Description: Facilitates DHCP network traffic flow across the WAN
Operated by: Network Support Group
Dependency: Ethernet switch / router environment, Managed Desktop sites
Description: Facilitates DHCP network traffic flow within Managed Desktop sites
Operated by: Managed Desktop Site contacts
Resilient, Wide Area Network, Managed-DHCP Service 31
4.1.7 Management of teams, resources, tools, technology, budgets,
facilities required
The Project Manager is also Head of Systems, Managed Desktop Group who provide
the service and the majority of dependency services, with responsibility for the
majority of service infrastructure components; normal group management procedures
apply.
4.1.8 Management of suppliers and contracts
None required for this project; Open Source software is used for all virtual server OS
and processes; Microsoft Windows server and (dependency) VMware vSphere
components are licensed within the Managed Desktop Group; all necessary
infrastructure elements are place.
4.1.9 Progress reports, reviews and revision of the programme and plans
The short time window for project completion mandates that formal progress
reporting will be confined to affirmation of key stage completion via normal internal
communication channels.
Reporting of progress shall, however, be communicated at least twice weekly to the
Programme Manager, WAN Resiliency Upgrade programme.
4.1.10 Communication plans and training plans
Communication Plan
The project manager will however appraise key stakeholders (Desktop Service Group
administrators, WAN Resiliency Upgrade Programme Manager) of progress on a site-
by-site basis, as sites are migrated to the new service.
Training Plan
The Service Design Package will be made available on the (restricted-access)
Computing Services Wiki.
Desktop Services Group VMware vSphere administrators will be appraised of the
function of the new virtual servers and service backup/recovery procedures.
Desktop Services Group administrators involved in DHCP administration will
initially engage in hands-on training sessions on low-impact sites (e.g. Computing
Service Managed Desktop sites) to gain familiarity with the service.
Resilient, Wide Area Network, Managed-DHCP Service 32
4.1.11 Timescales, deliverables, targets and quality targets for each stage
Figure 7: Gantt chart showing project stages and timescale
The project to move from v2 to v3 of the Managed DHCP service will comprise 7 stages, each
described and shown in the Gantt chart below, between July and November 2010.
Stage 1: Target - Establish a functional development environment
Objective
Establish a Linux DHCP service, gaining experience in configuration &
administration sufficient to provide advanced functionality required to respond
to Intel PXE and Mac BSDP protocol requests, facilitating host imaging.
Deliverables
A Linux-based, development (virtual) DHCP Server
Basic DHCP functionality e.g. IP configuration provided to test Intel PCs / Macs
/ Printers.
Advanced DHCP functionality e.g. appropriate responses to PXE / BSDP
provided to test Intel PCs / Macs / Printers.
Stage 2: Target - Migrate DHCP records
Objective
Convert the current Managed Desktop Service DHCP records to format required
by new service
Deliverables
Site DHCP-record import scripts to read proprietary Netware 6.5 and LDAP-
format Novell OES format DHCP records.
Site DHCP subnet & host configuration files, generated by the above.
Resilient, Wide Area Network, Managed-DHCP Service 33
Stage 3: Target -Establish manually-configured resiliency, enhance security
Objective
Bring online a further DHCP server to act as a secondary server, replicating
information from the primary, with security measures configured.
Deliverables
Secondary, virtual Linux-based DHCP server.
Documented, secure mechanism to replicate - manually at this stage -
configuration files between the original (primary) and secondary DHCP server.
Documented and applied server and service security configuration.
Stage 4: Target - Trial deployment of new service
Objective
Migrate a pilot Managed Desktop site (within the Computing Service) to the new
service, gain experience from - and resolve - issues encountered, build confidence
in new service among stakeholders.
Deliverables
Fully-functional DHCP environment supporting site host IP configuration
provision, PXE/BSDP imaging activities and printer configuration file provision.
Stage 5: Target -Establish service automation & advanced debugging
functionality
Objective
Elevate service functionality to fully-automatic detection of changes to DHCP
host configuration information, with automatic replication of information
between primary / secondary servers, also providing in-depth debugging
functionality.
Deliverables
Documented, automated service functionality and resiliency mechanisms
including scripts facilitating detection of configuration file changes & scripts
facilitating synchronisation of DHCP host information between primary and
secondary servers
An easily navigable web-interface to service health / host debugging information
LDAP interface to service information & scripts used to generate such
Resilient, Wide Area Network, Managed-DHCP Service 34
Stage 6: Target -Migrate remaining Managed Desktop sites to new service
Objective
Complete the migration of all sites to the new service
Deliverables
DHCP subnet & host configuration files for each site
Documented and backed-up Ethernet switch configurations for Managed
Desktop Services switches, with site VLANs removed
Documented backup and recovery-mechanism for Ethernet switch configurations
Documented DHCP Relay information for site computing contacts for popular
routers e.g. Cisco IOS with IP Helper
Stage 7: Target -Remove all original DHCP service components from
infrastructure
Objective
Remove unnecessary components and configurations from infrastructure
delivering the former, v2 service
Deliverables
Updated VLAN configurations from each Ethernet switch in the Computing
Service, removing site VLANs from port & trunk port settings
Updated Ethernet switch configurations from Novell servers, unbinding site
VLANs
Remove Novell OES servers from operation
Resilient, Wide Area Network, Managed-DHCP Service 35
4.2 Service Transition Plan
4.2.1 Build policy, plans and requirements
Overall transition strategy, objectives, policy, risk assessment and plans including: build policy,
plans and requirements, including service and component build plans, specifications, control and
environments, technology, tools, processes, methods and mechanisms, including all platforms.
(Note that the text in grey above is the formal ITIL recommended content for this document;
only essential elements have been included below).
4.2.2 Transition Strategy
The new v3 service -
o will be deployed on a pilot site within the Computing Service, for service
operation stakeholders to gain confidence in using the new service
o will then be deployed one WAN-based Managed Desktop site at a time
Site VLAN information central to operation of the current v2 service will be
retained (and/or backed up as technically necessary) on participating Ethernet
switches and Novell DHCP servers until the new v3 service is operational
Site DHCP information on current v2 service hosts will be retained until the new
v3 service is operational to permit fallback in case of a disaster scenario
4.2.3 Build Policy
Servers will (wherever possible) be deployed on the VMware vSphere virtual
server environment due to the high levels of resiliency this platform offers.
4.2.4 Testing policy, plans and requirements
Testing policy, plans and requirements, including test environments, technology, tools, processes,
methods and mechanisms. Testing must include: functional testing, component testing, including
all suppliers, contracts and externally provided supporting products and services; user acceptance
and usability testing; system compatibility and integration testing; service and component
performance and capacity testing; resilience and continuity testing; failure, alarm and event
categorization, processing and testing; service and component, security and integrity testing;
logistics, release and distribution testing; management testing, including control, monitoring,
measuring and reporting, together with backup, recovery and all batch scheduling and processing
The Managed DHCP service itself comprises a relatively small number of elements:
1. Linux virtual servers
2. Processes running on those servers (Apache, DHCP, MySQL)
3. Information about the DHCP Linux service (configured hosts, service status, logs)
4. Transfers of information between servers
5. Shell/PERL scripts to control the sequence of desired activities
Resilient, Wide Area Network, Managed-DHCP Service 36
4.2.5 Testing Policy
All elements of the service (1-5 above) will pass repeated, thorough testing prior to
deployment on any Managed Desktop Site.
Key aspects of the service to be tested are:
1. Basic service functionality – serving IP information to hosts
2. Advanced service functionality – e.g. facilitating imaging
3. Logging – a functional web interface to the rsyslog database information
4. Resiliency – against typical failure scenarios
5. Security – e.g. servers and Linux services are properly secured
Test Details
1. Basic Service Functionality Testing
o Test: Hosts reliably receive IP configuration information
2. Advanced Service Functionality Testing
o Test: Intel PC PXE-imaging operations reliably succeed
o Test: Mac imaging operations reliably succeed
o Test: Printer configuration files are downloaded and used
o Test: Transfer of information (folder synchronization) between Linux hosts
succeeds; success/failure of all such transfers are logged
o Test: Secure, remote commands (ssh) between Linux hosts succeed;
success/failure of all such commands is logged
o Test: MySQL database optimization scripts run on schedule (via cron, results
logged) and database is observed to be compacted / results returned faster
afterwards
o Test: LDAP host information is successfully retrieved by the Managed Power
service
3. Logging Testing
o Test: DHCP records are observed in local Syslog files of each DHCP server
o Test: DHCP records are observed in the MySQL database of rsyslog server
o Test: Custom web-search pages are generated after each and every DHCP
configuration file change
o Test: Web Searches return current information on host DHCP logs
o Test: LDAP tree information is refreshed after each and every DHCP
configuration file change
o Test: Induced failure of the primary DHCP service via configuration file error
will result in the web interface showing – in red – the failure of the primary
server
Resilient, Wide Area Network, Managed-DHCP Service 37
4. Resiliency Testing
o Test: Induced failure of vSphere host results in vMotion of hosted virtual servers
to another vSphere server (this could be considered a vSphere service test but is
included for completeness)
o Test: Induced failure of the primary DHCP service via configuration file error
will result in non-replication of (erroneous) configuration information to the
secondary server
o Test: Simulated failure of one cable in all Ethernet switch 4VLAN trunk route
ports upon which the service relies has no effect on service functionality
5. Security Testing
o Test: All TCP/IP 5access to Linux virtual servers from hosts on unauthorized
subnets is denied
o Test: The Linux Firewall will disallow connections from any remote hosts to 6disallowed services; specifically, Telnet access to any Linux virtual server from
any host on any subnet is denied
4.2.6 Deployment policy, release policy, plans and requirements
Deployment policy, release policy, plans and requirements, including logistics, deployment,
staging, deployment environments, cultural change, organizational change, technology, tools,
processes, approach, methods and mechanisms, including all platforms, knowledge, skill and
competence transfer and development, supplier and contract transition, data migration and
conversion.
Development Policy
The v3 Managed DHCP service will be developed on a small number of hosts (PCs,
Macs, Printers) within the Computing Service to verify reliable, basic functionality. This
will be followed by advanced functionality development.
Deployment Policy
A pilot deployment on a single Managed Desktop site (within the control of the
Computing Service) will take place after advanced functionality has been developed and
reliability / security testing has been passed. Competence in using the service will be
developed by Managed Desktop Group administrators during this stage
4 Only site VLAN information is removed from Ethernet switches; non-WAN, Computing Service VLAN information
is necessarily retained as part of normal switch configuration. 5 Linux virtual server services are secured via TCP Wrappers (hosts.allow, hosts.deny). 6 On servers hosting the DHCP service for example, typically only that Linux service is permitted to run by the
Firewall.
Resilient, Wide Area Network, Managed-DHCP Service 38
Pending successful and reliable operation, the service will then be deployed on a single,
remote Managed Desktop volunteer site located across the WAN. This will be the first
fully realistic, live, WAN-based service deployment.
Pending successful and reliable operation on the volunteer site, the service will then be
rapidly deployed – due to the tight project time window – to all Managed Desktop sites.
All Site DHCP host information will be converted from Netware/Novell OES format to
the format used by the Linux DHCP service during this time.
4.3 Service Operational Acceptance Plan Overall operational strategy, objectives, policy, risk assessment and plans including: Interface
and dependency management and planning; Events, reports, service issues, including all changes,
releases, resolved incidents, problems and known errors, included within the service; and any
errors, issues or non-conformances within the new service; Final service acceptance
General Computing Service Policy (for reference)
Services provided by the Computing Service should, wherever possible, provide
interfaces to assess their operational status / health (e.g. up, degraded, down).
Service Operational Monitoring
The Managed DHCP service web interface will be added to the schedule of services to
be periodically monitored by Computing Services central operations staff.
Additionally, daily monitoring of the service health will become a responsibility of
administrators within the Managed Desktop Group.
Service Issues
Problems and issues will be reported to the Managed Desktop Group for resolution
either directly or via the Computing Service Helpdesk
The primary service architect & technical developer will act as advisor in issue
resolution and retain principle responsibility for issue resolution throughout the
lifetime of the service.
Known issues and their workarounds (if known) will be added to the Managed DHCP
service page of the Computing Service Wiki.
Resilient, Wide Area Network, Managed-DHCP Service 39
4.4 Service Acceptance Criteria Development and use of service acceptance criteria for progression through each stage of the
service lifecycle, including: All environments; Guarantee and pilot criteria and periods.
The service acceptance criteria (SAC) comprise a set of criteria used to ensure that a service
meets its expected functionality and quality and that the service provider is ready to deliver the
new service once it has been deployed.
Service Transition
The service will be accepted for transition to live operation upon fulfillment of the acceptance
criteria checklist below.
Service functionality (utility), both basic and advanced, has been verified -
o in a development environment
o in the pilot site within the Computing Service
o in the first WAN-based Managed Desktop site
Service warranty elements have been successfully verified -
o in the pilot site within the Computing Service
o in the first WAN-based Managed Desktop site
Service operation stakeholders have -
o been briefed on the new service design, components, functionality, resiliency
mechanisms and processes involved
o been briefed on functionality and warranty testing results
o had hands-on training sessions in accessing & configuring the service,
recovering from configuration-errors, and are confident in use of the service
Service deployment schedule has been made known to, and agreed by, all
stakeholders. In particular -
o customers (site Computing representatives) have been informed of the need
for the change in service, appraised of the anticipated migration date of their
site to the new service, are aware of potential transition issues and known
actions to remedy common issues
o Computing Service operations staff have been informed of the transition to
live operation schedule and are monitoring the service health via the web
interface
Appendix 1 – Component Specific Documentation 40
APPENDIX 1 COMPONENT-SPECIFIC DOCUMENTATION
A1.1 Component: Computing Service Cisco Intermediate / End Switches
Configuration Backup
[Reminder: Security-sensitive information has been replaced by dummy data for this sample document]
Many Ethernet switches have resilient pathways, however some legacy switches may not.
Hardware failure of an older, non-resilient switch is accomplished most quickly via
replacement with an identical model of which many are available within the Computing
Service. The configuration of each Cisco switch would then require restoration.
To accomplish configuration restoration, a backup must first be made. Follow the steps below
in which a switch configuration is backed up to a central Linux server and simply reverse the
copy order to restore a configuration.
Quick Backup facts
Backups are made to the TFTP service of Linux General Database VM
The TFTP service is intentionally disabled
The Linux Firewall, intentionally, does not permit the backup traffic
Enabling the Backup Service on Linux
1. Enable the TFTP Service on the Linux Server which will receive the switch backup.
The service is disabled by default for security and will require to be opened for only a
few seconds:
Yast2 – Network Services – Network Services xinetd – Toggle the tftpd service from OFF to ON
2. Open the Firewall for UDP traffic. The sequence below illustrates how to permit any
host to access the Linux server via UDP, you are advised to restrict the rule to the
specific network of the switch you are backing up:
Yast2 – Security and Users – Firewall
Custom Rules …Add…
Source Network: 0/0
Protocol: UDP
Destination Port: leave blank
Source Port: leave bank
3. From a Windows host, use PuTTY to SSH to the switch (e.g. myswitch1.myorg.uk)
4. Login to the switch and replicate the following, where the IP address is that of the
Linux general database VM on which the TFTP service is running:
myswitch1#>enable Password: myswitch1# copy running-config tftp: Address or name of remote host []? 177.135.13.14 Destination filename [myswitch1-confg]? !!
Appendix 1 – Component Specific Documentation 41
3644 bytes copied in 2.173 secs (1822 bytes/sec) myswitch1#
5. Should the transfer takes more than a couple of seconds, restart the xinetd service on
the Linux TFTP host:
bash> service xinetd restart
6. Copy the switch configuration file to a safe location and remove the original: bash> cp /tftpboot/* /home/someuser/cisco-cfg
bash> chown -R someuser.users /home/ someuser/cisco-cfg
bash> rm /tftpboot/*
7. Now undo step 1 above, toggling TFTP from ON to OFF
8. Undo the step 2 above, deleting the Firewall rule added in step 2.
Show switch VLAN Configuration Information
Under Windows, use PuTTY to SSH to the switch of interest.
Show summary information about VLANs
Switch1>show vlan brief
Show information about VLAN Trunking Protocol (VTP)
Switch1>show vtp status
Show a summary of interfaces; these show at a glance which ports are connected to what.
Switch1>enable
Switch1#show interfaces summary
Removing VLANs from trunk ports of Cisco IOS switches
In this example, trunk port 4/0/1 of a multi-switch stack requires to have VLAN 806
removed.
First, check the status of the port:
Switch1>enable
Switch1>show int Te 4/0/1/ status
Port Name Status Vlan Duplex Speed Type
Te4/0/1 route-centr-4 connected trunk full 10G 10GBase-LR
Switch1#
Next, remove VLAN 805:
Switch1>enable
Appendix 1 – Component Specific Documentation 42
Switch1#config t
Switch1 (config)#>
Switch1 (config)#> int Te 4/0/1
Switch1 (config-if)#> switchport mode trunk
Switch1 (config-if)#> switchport trunk allowed vlan remove 805
Switch1 (config)#> end
Switch1>copy running-config startup-config
Reversal of the above, if required, can be accomplished via the following command sequence:
Switch1>enable
Switch1#config t
Switch1 (config)#>
Switch1 (config)#> int Te 4/0/1
Switch1 (config-if)#> switchport mode trunk
Switch1 (config)#> switchport trunk allowed vlan add 805
Switch1 (config)#> end
Switch1>copy running-config startup-config
Be aware that the VLAN trunking protocol (VTP) on a master Cisco switch can have two
modes: server and client. If in server mode, removing a VLAN will result in the change being
propagated to all switches configured as clients. This may, or may not, be the desired result
of such a change.
You can check on key connections to other switches via
Switch1> show interface status
Port Name Status Vlan Duplex Speed Type
Te4/0/1 route-main-2 connected trunk full 10G 10GBase-LR
Te4/0/2 mgmt-sw3_Te3/0 connected trunk full 10G 10GBase-LR
Appendix 1 – Component Specific Documentation 43
A1.2 Component: Syslog Virtual Server
[Reminder: All security-sensitive information has been replaced by dummy data for this sample
document]
Securing MySQL
The virtual Linux server is secured via TCP Wrappers. It has no general user access and
only those services strictly necessary for its purpose are enabled.
By default, root doesn’t require a password to access a MySQL server process. This has been
changed so that root does require a password. Additionally, the MySQL process runs as a
normal user (not root), mysqluser.
A MySQL user mysqlmaint with privileges to conduct only the maintenance operations is
called to perform the maintenance tasks. The account password is stored in that user’s
~./my.conf and has 400 permissions (user read, write only).
Configuring MySQL for Syslog
[Reminder: All security-sensitive information has been replaced by dummy data for this sample
document]
First, create the database to hold syslog information and a user to access this database
(rsyslog)
mysql> CREATE DATABASE Syslog;
mysql> GRANT ALL ON Syslog.* TO usyslog@localhost IDENTIFIED BY 'p-removed';
This also creates MySQL user 'usyslog' which has password 'p-removed' and can only access
the Syslog.* database from localhost.
Now create the tables required inside Syslog database by loading the script that comes with
the rsyslog package:
bash> mysql </home/someuser/rsyslog/plugins/createSyslogDB.sql
That's the database setup complete. It has all the tables required to hold syslog information
passed to it by the rsyslog daemon.
Database Optimisation: maintenance tasks and scripts
[Reminder: All security-sensitive information has been replaced by dummy data for this sample
document]
The Syslog server MySQL database contains many other system event records other than
those generated by the DHCP service which, itself, generates many tens of thousands of
entries per day.
Various maintenance operations runs periodically (via cron) to -
Appendix 1 – Component Specific Documentation 44
purge the database of non-DHCP records
optimise the database following a purge
backup the database, storing one week's traffic for offline debugging purposes
To store the scripts required, create a folder /var/lib/mysql/scripts and assign ownership to
mysqluser – the user account under which the MySQL service runs:
bash> chown -R mysqluser /var/lib/mysql/scripts
bash> chgrp -R mysqluser /var/lib/mysql/scripts
Creating a Syslog Archive Script
We want a database table capable of holding older Syslog database information. Copy the
original Syslog database setup script and rename it as follows:
bash> cd /home/someuser/rsyslog/plugins/
bash> cp createSyslogDB.sql /var/lib/mysql/scripts/createSyslogArchiveTable.sql
Edit the createSyslogArchiveTable.sql script so the first two lines read as follows:
bash> vi createSyslogArchiveTable.sql
USE Syslog;
Create TABLE EventsArchive
and run the MySQL script to create the table:
bash> /usr/bin/mysql -h localhost -u mysqlmaint < /var/lib/mysql/scripts/createSyslogArchiveTable.sql
Now that the archive table is present, we can create a script to populate it with archive
DHCP records as follows:
bash> vi copyDHCPtoarchive.sql
INSERT INTO Syslog.EventsArchive SELECT * FROM Syslog.SystemEvents WHERE Syslogtag='dhcpd:';
Run the script manually to populate the archive table:
bash> /usr/bin/mysql -h localhost -u mysqlmaint < copyDHCPtoarchive.sql
and test the script
mysql> USE Syslog;
mysql> SELECT Syslogtag,Message FROM EventsArchive LIMIT 12;
Now we need to schedule these and other maintenance tasks to run automatically. The idea
is to archive DHCP messages weekly, empyting out the main Syslog database immediately
afterwards.
1. Sunday 4AM - clear out archive table
2. Sunday 4.30AM - optimise archive table (compacts database)
3. Sunday 5AM - create fresh archive of week's DHCP logs
4. Sunday 6AM - delete all System events
5. Sunday 6.30AM - optimise system events table (compacts database)
Appendix 1 – Component Specific Documentation 45
The cron schedule of maintenance user mysqlmaint is shown below:
# Live: Clear out the Archive MySQL table on Sunday nights before a fresh archive is made, at 4AM
00 04 * * 0 /usr/bin/mysql -h localhost -u mysqlmaint -e "delete from Syslog.EventsArchive" >> /dev/null
# Live: Optimise the archive table (reclaims space)
30 04 * * 0 /usr/bin/mysql -h localhost -u mysqlmaint Syslog -e "optimize table Syslog.EventsArchive" >> /dev/null
# Live: Create a fresh archive of DHCP logs every Sunday at 5AM (from SystemEvents to EventsArchive)
00 05 * * 0 /usr/bin/mysql -h localhost -u mysqlmaint < /var/lib/mysql/scripts/copyDHCPtoarchive.sql >> /dev/null
# Live: Delete contents of Syslog.SystemEvents table every Sunday at 6.00AM
00 06 * * 0 /usr/bin/mysql -h localhost -u mysqlmaint Syslog -e "delete from Syslog.SystemEvents" >> /dev/null
# Live: Optimize Syslog.SystemEvents table at 06:30AM on a Sunday (reclaims space)
30 06 * * 0 /usr/bin/mysql -h localhost -u mysqlmaint Syslog -e "optimize table Syslog.SystemEvents" >> /dev/null
For development purposes, faster database responses are achieved by removing records older
than one day. Note that in live operation, the loss of logging information incurred might
hinder debugging activities.
bash> mysql Syslog -e "DELETE FROM Syslog.SystemEvents WHERE ReceivedAt<DATE_SUB(curdate(),INTERVAL 1 DAY)"
Further Optimising MySQL - create indexes
The rsyslog database will store a very large number of records and will be significantly faster
if indices are established. Create the script /var/lib/mysql/scripts/create_indices.sql as follows
(this requires to be done only once).
bash> vi /var/lib/mysql/scripts/create_indices.sql
USE Syslog;
CREATE INDEX index_DeviceReportedTime ON SystemEvents(DeviceReportedTime);
CREATE INDEX index_FromHost ON SystemEvents(FromHost);
CREATE INDEX index_ReceivedAt ON SystemEvents(ReceivedAt);
CREATE INDEX index_SysLogTag ON SystemEvents(SysLogTag);
CREATE INDEX index_Facility ON SystemEvents(Facility);
CREATE INDEX index_Priority ON SystemEvents(Priority);
CREATE INDEX index_DeviceReportedTime ON SystemEventsArchive(DeviceReportedTime);
CREATE INDEX index_FromHost ON SystemEventsArchive(FromHost);
CREATE INDEX index_ReceivedAt ON SystemEventsArchive(ReceivedAt);
CREATE INDEX index_SysLogTag ON SystemEventsArchive(SysLogTag);
CREATE INDEX index_Facility ON SystemEventsArchive(Facility);
CREATE INDEX index_Priority ON SystemEventsArchive(Priority);
As before, run the MySQL script to create the indices.
Appendix 1 – Component Specific Documentation 46
bash> /usr/bin/mysql -h localhost -u mysqlmaint < /var/lib/mysql/scripts/create_indices.sql
Log Rotation
The RSyslog daemon logs to the local /var/log/messages file as well as to the MySQL
database, so we must regularly rotate these logs.
See “How to use logrotate” (http://linuxers.org/howto/howto-use-logrotate-manage-log-files)
for an excellent description of the configuration files below, and also the logrotate man pages
(http://linux.die.net/man/8/logrotate) which is very useful.
Key Information
/etc/cron.daily/logrotate is the script which performs the log rotation, run daily as the
location suggests.
The above script reads /etc/logrotate.conf which 'includes' files from /etc/logrotate.d
where individual config files for certain services may be stored
With respect to syslog, the file /etc/logrotate.d/syslog specifies the log rotation for
/var/log/messages
Two logs require special attention
We need to edit the /etc/logrotate.d/syslog config file to rotate daily, keep one previous
log and compress it (as the MySQL database keeps one week's worth of logs)
We need to edit the /etc/logrotate.d/mysql config file as well, since it will have a huge
number of records
Configuring the rsyslog daemon
First, specify that the syslog daemon should actually be rsyslog by editing /etc/sysconfig/syslog:
SYSLOG_DAEMON="rsyslog"
Turn off the many MARK statements that Syslog itself generates with the line:
RSYSLOGD_PARAMS="-m 0 -r"
Configure rsyslog via the /etc/rsyslog.conf file, enabling MySQL storage of events and
reception of remote UDP syslog calls:
# Enable MySQL logging to Syslog database
$ModLoad ommysql.so *.* :ommysql:localhost,Syslog,rsysloguser,thepassword
#Provide UDP syslog reception
$ModLoad imudp
$UDPServerAddress 192.168.88.5
$UDPServerRun 514
Appendix 1 – Component Specific Documentation 47
# Provide TCP syslog reception
$ModLoad imtcp
# Include config files:
$IncludeConfig /etc/rsyslog.d/*.conf
Once configured, estart the service - remember that it is 'syslog' and not 'rsyslog' which is the
service name to restart as its config file specifies rsyslog should actually be used.
bash> service syslog restart
Appendix 2 – Risk Register 48
APPENDIX 2 RISK REGISTER Risks have been identified and categorised in order of impact below, by the project manager.
R1: Project is completed late, impacting WAN Resiliency upgrade programme
Scenario
The new Managed DHCP service is late, being completed significantly after the
WAN Resiliency Upgrade project stage in which VLANs should be removed.
Impact
Very High
Any delay to the WAN Resiliency Upgrade project would prolong the
organisation's use of a less-reliable WAN architecture with consequent
possibilities for major WAN disruption, negatively impacting all University and
College operations.
Probability
Very high (without mitigation)
Moderate (with mitigation)
Mitigation
Implementation of industry-standard Linux-based DHCP service, dropping
problematic Novell OES solutions.
Personal takeover of the project by the (new) Head of Systems, Managed Desktop
Group, given only 5 months now remain until the required completion date.
Close-liaison between the two project managers on stages of both projects and
technical implementation issues
Rationale
Progress on the new Managed DHCP project has been very slow (~years) as the
sole developer struggled with hosting the new DHCP service on a Novell OES
platform. In particular, major difficulties were encountered in serving Apple
NBDP requests. Also on Novell OES, DHCP configuration is stored within eDir;
tools to view advanced aspects of this information such as site subnet
configurations, are non-existent.
Adoption of industry standard platforms (Linux) and server processes means core
service functionality can rapidly be accomplished via tried and tested means.
Additional functionality is also straightforward to realise, given the ease of
secure Linux-Linux communications.
Takeover by an experienced technical developer holding project management
certifications significantly enhances the probability of project completion,
particularly important given the long project duration (~2 years) without
significant progress to date and deadline of 5 months.
The deadline remains exceptionally tight, however, given the scope of development required.
Appendix 2 – Risk Register 49
R2: Serious service disruption via misconfiguration of network infrastructure
Scenario
Incorrect configuration of Ethernet switches during development and transition
adversely impacts other/all services of the Managed Desktop Group.
Impact
Very High
Loss of access to central file store and applications for users
Crashing of ~1,800 Managed Desktop PCs and printers with loss/corruption of
user data
Probability
High (without mitigation)
Moderate / Low (with mitigation)
Mitigation
Identification and configuration-backup of all Ethernet switches within the
Managed Desktop Group
Training of technical staff involved in switch configuration
Testing of switch-configuration change, backup and recovery mechanisms on
development Ethernet switches prior to making changes on live infrastructure
Close liaison with Network Services, granting approval of configuration changes
prior to implementation
Rationale
Within the Managed Desktop Group, Ethernet switches are historically managed
by an individual with network-management responsibilities who was tasked with
developing the (stalled) new Managed DHCP Service.
Documentation on the switches is not of a sufficient quality to make progress
with their reconfiguration; what configuration information exists is largely out of
date. Given the exceptional sensitivity of IT services on Ethernet switch VLAN
configuration, and exceptionally high impact in the event of mis-configuration,
responsibility for switch configuration should temporarily be taken over by a
senior member of staff.
Given the Head of Systems is also the technical architect and developer for the
new service, it is logical that this responsibility be assigned likewise.
Appendix 2 – Risk Register 50
R3: Lack of functionality of new Managed DHCP Service
Scenario
The new DHCP service fails to meet basic / advanced functionality requirements
Impact
High
The service primarily supports the commercially-provided Managed Desktop
Service upon which 30,000 registered users rely throughout the University and
Colleges
Probability
Low
Mitigation
Formal identification of functionality requirements
Close liaison with technical experts in each area of functionality during service
design, development and deployment.
Rationale
Linux-based DHCP services are highly reliable, industry standard entities
capable of providing all functionality required of the DHCP Server component of
the Managed DHCP service.
Extensive DHCP configuration knowledge exists within the Network Service
group of the Computing Services; extensive Linux knowledge exists within the
UNIX Support group who provide the Linux image to the Managed Desktop
Service. Close liaison with both groups during development and transition should
resolve any issues encountered.
Appendix 2 – Risk Register 51
R4: Lack of warranty of new Managed DHCP Service
Scenario
The new DHCP service fails to meet basic / advanced warranty requirements
Impact
High
As with functionality, the service primarily supports the commercially-provided
Managed Desktop Service upon which 30,000 registered users rely throughout
the University and Colleges
Probability
Low
Mitigation
Formal identification of warranty requirements
Utilisation of high-resilience components such as vSphere
Creation of a simple service model which is built around warranty as much as
functionality, anticipating routine configuration file errors will be made
Extensive testing of each and every element of the service
Rationale
The service model anticipates common causes of service degradation and is
resilient against such.
Once in operation, components of the service – other than DHCP configuration
files – essentially do not change, limiting the scope for adverse impacts on the
service.
The simplicity of the service model, despite requiring extensive scripting
complexity to achieve in practice, coupled with extensive debugging features
mean service health is readily monitored. Rich diagnostic information should
enable rapid recovery from minor issues.