NGOP Status and Plans Jim Fromm Marc Mengel Jack Schmidt May 2, 2006.
-
Upload
dominick-grant -
Category
Documents
-
view
213 -
download
0
Transcript of NGOP Status and Plans Jim Fromm Marc Mengel Jack Schmidt May 2, 2006.
Today’s talk…Today’s talk…
• Current Status• Farms/CMS/General Server split
• Recent Enhancements• Performance Tuning• Configuration File cleanup• CMS Enhancements.
• Future Enhancements
Current Status: Current Status: Farms/CMS/General Server Farms/CMS/General Server SplitSplit
• Goals:• Relieve bottlenecks by splitting out
the servers• Reduce configuration upgrade times• Provide groups with independence• Simplify the General server by
consolidating the two machines into one.
Current Status: Current Status: Farms/CMS/General Server Farms/CMS/General Server Split Split
• Bottlenecks• Farms and CMS Server hangs have
been non-existent since split.• General Server has experienced
occasional hangs, but to a lesser degree (still two systems).
• This goal has been successfully met.
Current Status: Current Status: Farms/CMS/General Server Farms/CMS/General Server Split Split
• Reduction of configuration upgrade times• Prior to the split, it took 2+ hours to
perform a system configuration upgrade when things went well. • Farms/CMS
• Takes less than 20 minutes to perform a configuration upgrade
• Less monitored elements per server• One status engine allowed for the removal of Warshall’s
algorithm for finding the transitive closure of a graph.
Current Status: Current Status: Farms/CMS/General Server Farms/CMS/General Server SplitSplit
• General Server• Configuration upgrade time reduced to
less than 30 minutes
• Recent parser optimizations will likely cut configuration upgrade times to ¼ .
• This goal has been successfully met.
Current Status: Current Status: Farms/CMS/General Server Farms/CMS/General Server Split Split
• Server Independence• Both CMS and Farms are up to speed with
doing their own configurations.• Upgrades are performed only when they need
them.• CMS (Gary Stiehr) has taken the initiative to
add several features.• Both groups have taken advantage of the
splitting of the cluster. • This goal has been successfully met.
Current Status: Current Status: Farms/CMS/General Server Farms/CMS/General Server SplitSplit
• General Server Consolidation• Not complete: still using two servers.• Doesn’t have the urgency as the
other items, and has been easy to put on the backburner.
• Need to make this a priority.
Recent Enhancements Recent Enhancements
• Performance Tuning• Preprocessor speedup.
• Marc Mengel implemented a change that improved performance of the XML preprocessor.
• NGOP preprocessor expands If_xxx/For_xxx tags• Was using 90% CPU on startup.• This was a known python performance issue.
• Stunning improvements on configuration upgrade times!
Recent EnhancementsRecent Enhancements
• Configuration File Cleanup• New "grand unified" XML Document
Type Description http://www.fnal.gov/docs/products/ngop/ngop_unified.dtd
• XML editor friendly • Works well with Merlin XML editor.
Recent EnhancementsRecent Enhancements
• CMS • No Downtimes: Modified to allow multiple
status engines roles to be defined for one set of definitions. This allows re-configuration on one while the other remains active, eliminating downtimes due to configuration upgrades.
• Used the SE API to create GUI that only shows “bad” things.
• Developed a generic plug-in agent that allows for a standard way of defining agents in the CMS system.
Future EnhancementsFuture Enhancements
• Dynamic Configuration Upgrades• By far the most difficult enhancement
to implement.• CMS needs have been addressed with
the multiple status engine solution.• With reduction of configuration
upgrade times coupled with the CMS workaround, this requirement becomes a very low priority.
Future Future Enhancements(Cont)Enhancements(Cont)
• CMS specific requested enhancements:• Marking Monitored Elements down across clusters.• Accelerate alarms based on time (i.e. yellow becomes red
after 8 hours)• Verify scalability to CMS planned growth.• Documentation upgrade
• General • Improvement of logging subsystem• Research UDP protocol issues
• Dropped packet issue seems under control with recent network tunings
• May need to do this anyway to address CMS requirements for scalability.
• Web/Swatch agents need DELAY/GAP parameters• “Anti” rules for Swatch agent
Future Future Enhancements(Cont)Enhancements(Cont)
• Wish List• Real dynamic configuration • SNMP agent• Email watcher
SummarySummary• Split of farms and CMS has been successful:
• Quicker reconfigs result in less downtime.• Splitting load has reduced NGOP hangs.• CMS and Farms groups are managing things on their
own timetable.• Need to consolidate General server to one machine
• New release is needed:• New CMS requests• Investigate potential scalability issues.• Improved logging• New and improved agents.• Revamp documentation and website.• Develop maintainable metrics
InformationInformation
• Main Site:http://www-isd.fnal.gov/ngop/ngop.html
• Documentation:• Users Guide- http://www-isd.fnal.gov/ngop/current/ngop_ug.htm• Admin Guide- http://www-sd.fnal.gov/ngop/current/ngop_admin_guide.htm