Texas Nodal Section 9: Invoicing ERCOT, Settlements & Billing SDAWG - August 2007.
Texas Nodal Market Implementation Program Infrastructure Update October 22, 2007
description
Transcript of Texas Nodal Market Implementation Program Infrastructure Update October 22, 2007
Lead from the frontTexas Nodalhttp://nodal.ercot.com 1
Texas Nodal Market Implementation Program Infrastructure Update
October 22, 2007
Lead from the frontTexas Nodal
INF: Project Summary - Initial Charter with Approved Changes
Project area: Infrastructure
Description: Provision of development, testing, EDS and production environments across the Program
Vendor(s): IBM, EMC, OracleProject Manager: David Forfia
Key deliverables/short term deliverables:– Hardware specifications– Hardware procurement– Data center capacity resolution– IT Services Catalogue– Service Level Agreements for all Nodal projects– Project development & test (FAT) environments– Integration testing (SAT) environments– EDS environments– Production environments
– Market Participant Identity Management (new)– Release Management processes (from INT)– Oracle 10g upgrade for EDW (new)– High availability monitoring environment (new)
Key Assumptions:– Infrastructure capacity can be incrementally added as the project progresses using IBM’s capacity upgrade on-demand
model– Data center capacity issues will be resolved in the next 90 daysChallenges/Risks:– Existing Data Center capacity (power)
Comments:– IT Operations will be the first ERCOT function to transition to Nodal operations, starting with setting up development
environments
http://nodal.ercot.com 2TPTF Nodal Update 10/22/07
Lead from the frontTexas Nodal
INF: Project Summary - Initial Delivery Plan
EDS 3 Release
NMMS
Computing Infrastructure
Q2 2008 Q3 2009Q4 2008Q3 2008Q2 2007Q1 2007Q4 2006 Q4 2007 Q1 2008Q3 2006 Q1 2009 Q2 2009
EDS 3
Design, Build, Pre-FAT FAT ITEST
12/01/08Real Time Operations
GO LIVE
ITESTBuild FATRequirements, Conceptual Design
Requirements, Conceptual Design EDS 4 EDS 4
MP SAT
EDS 3 Data Validation
Planning UI
FAT ITEST Common model update processConceptual Design Build
6/30/08LMP Market
Readiness Criteria MET
Q3 2007
Operations UI
3/31/08Single Entry Model
GO LIVE
12/08/08Day Ahead Market/CRR
GO LIVEZonal Shutdown GO/NO GO
Storage
Software
Data Center Capacity
Design to Min Requirements
Database Licensing Strategy
Portal/Integration Licensing Strategy
Decommissioning plan
Data center virtualization
Colocation
Existing Data Center Upgrade
New Data Center Facility
EDS 4 Release
Capacity UpgradeDev
Disaster Site #1
Design to Min Requirements
Prod #1 Prod #2 Dev
DR #2
p5p5
p5p5 Capacity
needs met?
Capacity Planning
Capacity Upgrade on Demand Enabled
Sufficient Power Recovered ?
Sufficient Power Recovered ?
Capacity Upgrade
Initial Storage Upgrade
Capacity Planning
P U S H
SunS egdErot
m metysorci s s
9 019
9900H I T A CH I
S ty s e msDa at
St o rag eFr e ed omHi tac hi
R E A D Y
AL A R M
M E S S A G E
EME RGE NCYUN ITOFFPOWER
UN ITEM ERG ENC YPO WER OF F
P U S H
R E A D YA LA RM
M E S S A GE Sunm ic r o sy s t e m s
S t o r E d g e9 9 1 0
D a ta S y st e m sH I T A C H I
9 900Hi ta ch i Fr ee do m
St or ag e
P U S H
SunS egdErot
m metysorci s s
9 019
9900H I TA C H I
S ty s e m sD a at
S to rag eFr e ed omHi tac hi
R E A D Y
AL A R M
M E S S A G E
EM ERGE NCYUNIT OFFPO WER
UNITEM ERGE NCYPO WER OFF
PU SH
R E AD YAL A R M
M E S S AGE Sunm i cr o s y st e m s
S t o r E d ge9 9 1 0
D a t a S y s te m sH I T A C H I
99 00Hi ta ch i F re e do m
St o rag e
Capacity Upgrade
Capacity Upgrade
Capacity Upgrade
Capacity Upgrade
P U S H
SunS egdErot
m metysorci s s
9 019
9900H I TA C H I
S ty s em sD a at
S to ra geFr ee d omH itac hi
R E AD YAL AR M
M E S S AG E
EM ERG ENC YUN IT
OF FPO WER
P U S H
SunS egdErot
m metysorci s s
9 019
9900H I TA C H IS ty s e m sD a at
St or a geF re ed omHi tac hi
R E A D YA L A R M
M E S S A G E
EM ERG EN CYUNIT
OFFPO WER
PU SH
SunS egdErot
m metysorci s s
9 019
9900H I T A C HI
S ty s em sD a at
St o ra geFr ee do mHi tac hi
R E AD YAL AR M
M E S S AG E
EME RG ENCYUN IT
OFFPOWER
P U S H
SunS egdErot
m metysorci s s
9 019
9900H I TA C H I
S ty s e m sD a at
St or a geF re ed omHi tac hi
R E A D YA L A R M
M E S S A G E
EM ERG EN CYUNIT
OFFPO WER
P U SH
SunS egdErot
m metysorci s s
9 019
9900H I T A C HI
S ty s em sD a at
St o ra geFr ee do mHi tac hi
R E AD YAL AR M
M E S S AG E
EME RG ENCYUN IT
OFFPOWER
http://nodal.ercot.com 3TPTF Nodal Update 10/22/07
Lead from the frontTexas Nodal
INF: Project Summary – Actual Delivery Plan is slightly later than planned
EDS 3 Release
NMMS
Computing Infrastructure
Q2 2008 Q3 2009Q4 2008Q3 2008Q2 2007Q1 2007Q4 2006 Q4 2007 Q1 2008Q3 2006 Q1 2009 Q2 2009
EDS 3
Design, Build, Pre-FAT FAT ITEST
12/01/08Real Time Operations
GO LIVE
ITESTBuild FATRequirements, Conceptual Design
Requirements, Conceptual Design EDS 4 EDS 4
MP SAT
EDS 3 Data Validation
Planning UI
FAT ITEST Common model update processConceptual Design Build
6/30/08LMP Market
Readiness Criteria MET
Q3 2007
Operations UI
3/31/08Single Entry Model
GO LIVE
12/08/08Day Ahead Market/CRR
GO LIVEZonal Shutdown GO/NO GO
Storage
Software
Data Center Capacity
Design to Min Requirements
Database Licensing Strategy
Portal/Integration Licensing Strategy
Decommissioning plan
Data center virtualization
Colocation Existing Data Center Upgrade
New Data Center Facility
EDS 4 Release
Capacity UpgradeDev
Disaster Site #1
Design to Min Requirements
Prod #1 Prod #2
DevDR #2
p5p5
p5p5Capacity
needs met?
Capacity Planning
Capacity Upgrade on Demand Enabled
Sufficient Power Recovered ?
Sufficient Power Recovered ?
Capacity Upgrade
Initial Storage Upgrade
Capacity Planning
P U S H
SunS egdErot
m metysorci s s
9 019
9900HI T A C H I
S ty s e m sDa at
S to ra geFr e ed omHi tac hi
R E A D Y
AL A R M
M E S S A G E
EM ERGE NC YUNIT OF FPO WER
UN ITEM ERG EN CYPO WER O FF
P U S H
R E A D YA L A R M
M E S S A GE Sunm ic r o sy s t e m s
S t o r E d g e9 9 1 0
D a ta S y s te m sH I T A C H I
990 0Hi ta ch i Fr ee do m
St or ag e
PU SH
SunS egdErot
m metysorci s s
9 019
9900HI T AC H I
S ty s e msDa at
St or ag eFr ee do mHit ach i
R E A D Y
A LA RM
M E S S A G E
EM ERGE NC YUNIT OFFPO WER
UNITEM ERGE NC YPO WER OF F
P U S H
R E A D YA LA RM
M E S S A GE Sunm i c ro s y st e m s
S t o r E d g e9 9 1 0
D a ta S y st e m sH I T A C H I
99 00Hi ta ch i F r ee do m
S to ra ge
Capacity Upgrade
Capacity Upgrade
Capacity Upgrade
Capacity Upgrade
P U S H
SunS egdErot
m metysorci s s
9 019
9900H I T A CH I
S ty s e msDa at
St or ag eFr e ed omHit ac hi
R E A D YA L A RM
M E S S A G E
EM ERG ENC YUN IT
OFFPO WER
PU SH
SunS egdErot
m metysorci s s
9 019
9900H I T A CH IS ty s e msDa at
St o ra geFr ee do mHi tac hi
R EA D YAL A R M
M ES SA G E
EME RG ENCYUN IT
OFFPOWER
P U S H
SunS egdErot
m metysorci s s
9 019
9900HI T A C H I
S tys e m sDa at
S to ra geFr ee d omH itac hi
R E AD YAL AR M
M E S S AG E
EM ERG ENC YUN IT
OF FPO WER
PU SH
SunS egdErot
m metysorci s s
9 019
9900H I T A CH I
S ty s e msDa at
St o ra geFr ee do mHi tac hi
R E A D YA L A R M
M E S S A G E
EME RG ENCYUN IT
OFFPOWER
P U S H
SunS egdErot
m metysorci s s
9 019
9900HI T A C H I
S tys e m sDa at
St or ag eFr ee d omH itac hi
R E AD YA L AR M
M E S S AG E
EM ERG ENC YUN IT
OF FPO WER
http://nodal.ercot.com 4TPTF Nodal Update 10/22/07
Lead from the frontTexas Nodal
Data Center virtualization was the only viable strategy to make the Nodal timeline
• Move to a Collocation Site– RFP issued in October 2006
• Insufficient capacity available to support ERCOT specialized needs
• Expand Existing Data Centers– Taylor
• Lead times for core equipment longer than Nodal program
– Austin• Existing facilities already expanded to maximum capacity• Long term viability of the facility not determined
• Acquire a new Data Center Facility – Lead times for acquisition and relocation outside Nodal timelines– Currently being explored with the viability of the Austin facility
http://nodal.ercot.com 5TPTF Nodal Update 10/22/07
Lead from the frontTexas Nodal
Expanding existing data center capacity is an integrated process
• There are 3 components which are balanced to ensure a reliable data center
– Standby Generator Capacity– Uninterruptible Power Supply (UPS) capacity– Data Center Air Conditioning (DCAC) capacity
• The maximum capacity for equipment is determined by the minimum carrying capacity of any one of the components.
• We have taken all steps possible to maximize the current capacity of the data centers to running at the available capacity of the UPS systems in each site.
http://nodal.ercot.com 6
All possible near term facility upgrades have been completed.
Lead from the frontTexas Nodal
The growth in Nodal server deployments and capacity was correctly forecast
Feb 2006 Mar 2006 Apr 2006 May 2006 Jun 2006 Jul 2006 Aug 2006 Sep 2006 Oct 2006 Nov 2006 Dec 2006 Jan 2007 Apr 2007 Jul 2007 Oct 2007 2008 2009 2010 2011
5 kVA
25 kVA
20 kVA
15 kVA
10 kVA
40 kVA
35 kVA
30 kVA
50 kVA
45 kVA
50 kVA
30 kVA
35 kVA
40 kVA
45 kVA
15 kVA
20 kVA
25 kVA
5 kVA
10 kVA
70 kVA
65 kVA
60 kVA
55 kVA
80 kVA
75 kVA
Dev Virtualization2/5
Fastrak 1/3
Non-EMS Dell Refresh5/5
Dev Virtualization2/5
QA Move to ACC2/5
$970K
Retire HDS 1/5
Domain Restructuring1/5
Test/Prod Virtualization 4/5
Test/Prod Virtualization4/5
Market Redesign Dev Buildout3/4
Database Hosting Environment Refresh3/5
Market Redesign Test Buildout2/4
Market Redesign Prod Buildout2/4
Non-EMS Dell Refresh5/5
EMS Dell Refresh3/5
EMS Dell Refresh3/5
Domain Restructuring5/3
Recla
matio
nCo
nsum
ption
Small (<200Hrs)
Med (200-1000Hrs)
Large (>1000Hrs)
Project Impact
Initiated
Not Defined
Below Line
Status
Risk/RewardLow 1…….5 High
HW Decommission 1/5
Current TCC Threshold: 202.5kVA
$2.8M
How Much is Too Much?
TCC PDUs are rated for 225kVA with a not-to-exceed rating of 90%.
Can we operate at 95%? Likely, but not
smart. What if we only focused
on active projects?Domain Restructuring will free up ~8kVA and QA will move 37kVA from TCC to ACC. This will push the threshold out to early fall.
Calculating Power: Pulled from Aperture, these
figures are nameplate values with a 70%
adjustment for manufacturer conservatism
When do we Buy?The server market is going
through dramatic change over the next few years with
virtualization, multi-cores, and reduced thermals driving
demand for innovation. Short answer: tomorrow is always
better.
X86 Failure Rates:Months 0-36: 6%Months 0-48: 50% Months 0-60: 100%
Includes failures that impact and do not impact service.
Source: Gartner
Understanding the requirements in terms of service demands is critical in making technology decisions. Thinking in terms of 1:1 replacements will lead to
overspend and undercommit.
2008Today
1M TpMIntel IA-64 (Madison)
64 Processors
82 RU / 42 kVA
Intel IA-64 (Tukwila)
4 Processors
4RU / 4 kVA
2006 Xeon Processor Lineup
Q1Paxville DP
Q2Perf. Optimized
Dempsey
Q2Rack Optimized
Dempsey
Q3Woodcrest
Q4Ultra-Dense Woodcrest
There is light at the end of the tunnel!
Is Virtualization a Dream?Intel, AMD, Microsoft, HP,
and others are heavily vested in virtualization.
The technology is production ready today and
will be ubiquitous in 24 months
Benefits of Decommissioning and Tech Refresh:
Aside from power reclamation, this will define technology
lifecycle, create a process for technology refresh and validate
IT as the owner of ERCOT technologies.
Why is tech refresh risky?Technologically, it is fairly
straightforward, easily packaged, and readily
outsourced. Coordination and cooperation are the
real challenge.
Density is the Real Problem:
In the datacenter, thermal demand is a product of
power consumption. Space is an independent
variable. ERCOT has usable space, but no
usable power. Thus, we need to increase density.
EA Recommendations
Relieves TCC Congestion, Defines utility computing for ERCOT
Initiate DC Virtualization PR-60011
2
Relieves TCC Congestion, Prep for DR
Accelerate QA Buildout PR-40070
1
Relieves TCC congestion, Preps business for IT ownership of tech lifecycles
Initiate Decommission/Dell Refresh Project
1
BenefitActionPriority
Relieves TCC Congestion, Defines utility computing for ERCOT
Initiate DC Virtualization PR-60011
2
Relieves TCC Congestion, Prep for DR
Accelerate QA Buildout PR-40070
1
Relieves TCC congestion, Preps business for IT ownership of tech lifecycles
Initiate Decommission/Dell Refresh Project
1
BenefitActionPriority
ERCOT Enterprise ArchitectureDC Capacity Plan AssessmentBrian A Cook 02/28/2006
http://nodal.ercot.com 7TPTF Nodal Update 10/22/07
Lead from the frontTexas Nodal
Majority of the roadmap is completed, but not all assumptions were right
Achievements to date
Power Recovery Executed the Enterprise Architecture Power Recovery
Plan Development Storage RetiredDevelopment VirtualizedDomain RestructuredQuality Assurance Moved to ACCRetired unused equipment½ Server refreshes EMS & Non-EMSTest/Prod Virtualization¾Database Hosting Refresh
Identified additional compression activitiesRelocated Development databases servers to Blue BuildingAustin SAN RefreshTaylor SAN RefreshDatabase Server ClusteringACC to dedicated UPSRemote access server farm redesigno Application Server Refresho Self cooled equipment racks (ordered)
Key assumptions which were invalid
Power consumption server would drop exponentially• Power consumption per CPU has declined almost as much as
assumed• Server memory power consumption has offset power savings in CPU
power consumption
Nodal redundancy requirements would mirror Zonal• The nodal systems require more active/passive and active/active
deployments than the current Zonal market.• The required level of redundancy and recoverability for the nodal
systems was not fully understood when the projections were made in February 2006.
Nodal environment requirements would mirror Zonal• Integrating a large number of best of breed solution requires more
environments to successfully develop the integration points.• Market participants required structured and unstructured testing
environments to complete their development activities.
http://nodal.ercot.com 8TPTF Nodal Update 10/22/07
Lead from the frontTexas Nodal
Server consolidation timeline was constrained to minimize risks
• We will do this with the minimum disruption• We have a plan to:
– Minimize the risk– Maximize the benefit– Lower overall costs– Safeguard Market Operations– Improve service levels– Not affect Texas Set 3.0
• However, there are always risks to be aware of in server migration• We are working with all the project managers and ERCOT committees to reduce risk and to
optimize the timing.
• In the process of server consolidation, one production migration was deferred and successfully rolled back two.
– The net effect was a delay in final migrations and power recovery by 5 weeks.
http://nodal.ercot.com 9TPTF Nodal Update 10/22/07
Lead from the frontTexas Nodal
Migration Metrics
Servers
Starting Total Decom Retired % Remaining
74 16 5 72%
Databases
Starting Total MigratedRetired/Refresh % Remaining
91 58 27 7%
Applications
Total Files Migrated Remaining % Remaining
62 5 57 92%
ScriptsScripts Reviewed Remaining % Remaining
229 229 0 0%
Production Database Storage (in GB)Pre-MigrationBytes Used
Post MigrationBytes Used Reclaimed
Storage Savings
12,080 8,480 3,240 36%
Servers
Annual maintenance contracts will be cancelled on the retired database and application servers
Servers will be made available on the secondary market to recoup their residual values
Databases
Unused or underused databases where eliminated resulting in additional licenses for use at Nodal vendor locations
Database Storage
Properly sizing the databases and compressing the data files for data which has been removed is resulting in
a 36% reduction in used storage in production.
Server consolidation will result in lower expenses during Nodal and after Nodal implementation
http://nodal.ercot.com 10TPTF Nodal Update 10/22/07
Lead from the frontTexas Nodalhttp://nodal.ercot.com 11
System requirements are driven by project detailed design documents
Each project team has an assigned architect who is responsible for the architecture for the project’s deliverables
that are consolidated into a deployment diagram by IDA
that become individual work requests for operations
to deploy the systems
Lead from the frontTexas Nodalhttp://nodal.ercot.com 12
Market trials and Integration testing will drive changes to the environments
The assumption in the infrastructure plan is that the initial deployment specifications will have to be changed.
The infrastructure technologies selected as the core for the Nodal system were picked because they adapt well to change.
Lead from the frontTexas Nodalhttp://nodal.ercot.com 13
TPTF Nodal Update 10/22/07
• Provides the ability to scale up to the maximum capacity of the system to meet usage demands
• System can recover from multiple component failures (CPU/Memory/Power Supply/ IO) with spare capacity within the system.
• System availability above the 99.9% threshold.
• Provides the ability to balance the load across all applications running on the system.
• Minimum power usage configuration.
Critical Path Mitigation Strategies Capacity issues – Scale Up Option
Scale up options are implemented on the largest computing systems in the data center. Extra capacity is available to enable when necessary inside the system or can be added without system down time.
Lead from the frontTexas Nodalhttp://nodal.ercot.com 14
TPTF Nodal Update 10/22/07
• Requires duplicate computing resources on two separate physical servers.
• Provides the ability to scale up to the maximum capacity of all systems in the cluster to meet usage demands.
• Classic Windows / Linux capacity expansion strategy typically with a load balancing appliance.
• Provides the ability to do maintenance on a server without impacting the system.
• System availability above 99.99%
Critical Path Mitigation Strategies Capacity issues – Scale Out Option
Scale out options are implemented on the smaller computing systems in the data center. Processing load is split across multiple systems to meet the business requirements.
ERCOT will utilize both a scale up and scale out strategy to meet the business requirements of Nodal
Lead from the frontTexas Nodal
Critical Path Mitigation Strategies Procurement and Architectural Delays
Risk Mitigation Strategy
Long procurement lead times Define system requirements as soon as possible and place orders.
Utilize existing systems or virtual machines where appropriate
Late Technical Architecture design documents Preorder unassembled equipment and have ERCOT staff built to specification
Authorize overtime to build systems
Pre-build standard server configurations based upon service catalogue and assign to projects as requirements become known
Server consolidation decommissions behind schedule
Run data center in the safe “buffer” zone of capacity
http://nodal.ercot.com 15TPTF Nodal Update 10/22/07
Lead from the frontTexas Nodal
Questions
http://nodal.ercot.com TPTF Nodal Update 10/22/07 16