Nick LeRoy Computer Sciences Department University of Wisconsin-Madison [email protected] Hawkeye.
-
Upload
ariel-newton -
Category
Documents
-
view
214 -
download
0
Transcript of Nick LeRoy Computer Sciences Department University of Wisconsin-Madison [email protected] Hawkeye.
![Page 1: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/1.jpg)
Nick LeRoyComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor/hawkeye
Hawkeye
![Page 2: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/2.jpg)
www.cs.wisc.edu/condor
What is Hawkeye?
› A monitoring and management tool for distributed systems
› That's great, but... What does that mean? What can Hawkeye do for me?
![Page 3: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/3.jpg)
www.cs.wisc.edu/condor
What is does that mean?
› Hawkeye is a tool that can be used to monitor various aspects of your computers
› Examples: System load monitoring Watching for run-away processes Monitoring the health of your Condor
pool
![Page 4: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/4.jpg)
www.cs.wisc.edu/condor
What can Hawkeye do?
› Hawkeye can alert you when things go wrong. For example, Hawkeye can: Alert you when virtually any condition is
found Alert you when various Condor problems
are identified Allow you to specify your own custom
alerts
![Page 5: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/5.jpg)
www.cs.wisc.edu/condor
Why Hawkeye?
› Make system administration easier› Make Condor pool maintenance
easier
![Page 6: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/6.jpg)
www.cs.wisc.edu/condor
Hawkeye Monitoring Agent
Hawkeye Architecture
Hawkeye Module
Hawkeye Module
Hawkeye Monitoring Agent
CondorPool
Grid
Hawkeye Module
Hawkeye Manager
![Page 7: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/7.jpg)
www.cs.wisc.edu/condor
Hawkeye Matchmaking
› Hawkeye alerts are done using ClassAd matchmaking.
MachineAd
TriggerAd
Match Alert
![Page 8: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/8.jpg)
www.cs.wisc.edu/condor
Hawkeye ClassAds
› Hawkeye uses ClassAds to represent collected data Schema-free data representation Provides matching mechanism Represent whatever data you gather in
a way that works best for you
![Page 9: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/9.jpg)
www.cs.wisc.edu/condor
Hawkeye ClassAds
› Example ClassAd “snippet”:RAM_MemFree = 841932800
RAM_MemShared = 0
RAM_MemTotal = 1055367168
RAM_SwapCached = 0
RAM_SwapFree = 2147483647
RAM_SwapTotal = 2147483647
![Page 10: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/10.jpg)
www.cs.wisc.edu/condor
Hawkeye ClassAds
› Example ClassAd “snippet” #2:Condor_NumExecs = 2
Condor_NumMasters = 1
Condor_NumRunaway = 2
Condor_NumSchedds = 0
Condor_NumShadows = 0
Condor_NumStartds = 1
Condor_NumStarters = 2
Condor_RunawayPids = "3214,8753”
![Page 11: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/11.jpg)
www.cs.wisc.edu/condor
Sample Alert Trigger
[
AlertTrigger = ( MyType == "Pool" && Absent.count > 5 );
AlertSeverity = ( Absent.count > 5 ) ? 1 : 0;
Name = "Absent Nodes";
AlertText = StrCat(Absent.count,
" machines are missing in ",
Name)
]
![Page 12: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/12.jpg)
www.cs.wisc.edu/condor
Hawkeye at UW
› Currently at UW, we're using Hawkeye: To monitor our Condor cluster To aid in detecting and correcting
cluster problems To monitor the US/CMS testbed health
![Page 13: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/13.jpg)
www.cs.wisc.edu/condor
›
![Page 14: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/14.jpg)
www.cs.wisc.edu/condor
![Page 15: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/15.jpg)
www.cs.wisc.edu/condor
![Page 16: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/16.jpg)
www.cs.wisc.edu/condor
Customizing Hawkeye
› Hawkeye allows you to run your own custom “modules” to gather data.
› Hawkeye allows you in set your own custom “alerts”, on attributes generated by “standard” and “custom” modules.
![Page 17: Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu Hawkeye.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bff71a28abf838cbe5d2/html5/thumbnails/17.jpg)
www.cs.wisc.edu/condor
What is the status of Hawkeye?
› Hawkeye 1.0 Release Candidate 1 (RC1)
› Current module library includes modules to monitor system load, users, disk space, Condor, and more
› Available from http://cs.wisc.edu/condor/hawkeye