Security componentsof the CERN farm nodes
Vladimír BahylCERN - IT/FIO
Presented by Thorsten Kleinwort
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada2
Outline
Current state– Our typical system– Possible risks
Protection methods– Security related– Against denial of service
Conclusion
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada3
A typical farm node
2 CPUs / 1 GB RAM / 20 GB disk 100 Mbit network CERN RedHat Linux 7.3.3 50-70 users
70 primary interactive nodes
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada4
Risks
Security related:– Exploits to the system to get root– Services started on unprivileged ports– System can be used to
scan other nodes originate spam
Denial of service:– “Heavy” processes started– Disk filled by “runaway” jobs
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada5
Our protection methods
Keep the system secure and up-to-date (with CDB & SPMA) Log more verbosely than default Collect the logs centrally Scan for certain patterns in the logs Keep the system accounting Provide secure access methods only Transfer sensitive information securely
– E.g. password files – but in general anything; use GPG for encryption Monitor the current state
– Disk – quota is enabled– CPU usage – beniced daemon
Incident reaction – as quick as possible– No later than the next working day – compromised account is blocked
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada6
Log more
It is always good to have more information to go back to in case of a need
Daemons are configured to log as much data as is convenient– portmap -v
netlog – the ultimate kernel module– logs TCP activity– outputs a line whenever a listening socket or an incoming or outgoing
connection is established– it logs the program concerned, the session id, process id and user id– it also logs connection details (protocol, local/remote addresses and
ports– provides extremely useful data in forensic investigation
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada7
Netlog example
Incoming connections:Oct 14 21:49:43 node kernel: netlog: info: connect start TCP 172.17.35.98:44073 <- 192.168.161.90:52453 31440 31476 240 pmg_agent
Oct 14 21:49:43 node kernel: netlog: info: connect stop TCP 172.17.35.98:44073 <- 192.168.161.90:52453 31440 31476 240 pmg_agent
Outgoing connections:Oct 14 18:18:49 node kernel: netlog: info: connect start TCP 172.17.35.98:34434 -> 192.168.11.10:80 22163 23183 1648 wget
Oct 14 18:18:52 node kernel: netlog: info: connect stop TCP 172.17.35.98:34434 -> 192.168.11.10:80 22163 23183 1648 wget
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada8
Collect the logs
Make sure that all daemons log via the syslog facility Combine the logs in a single file
– Option in /etc/syslog.conf Process log files locally on each node
– Combine the connection data– Remove uninteresting information
E.g. node boot messages– Compress
Forward to a central place= Oracle database– Do it in regular intervals to prevent loss of data (~ every hour)
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada9
Scan for certain patterns
For example:– IRC activity
IRC servers violate CERN’s computing rules– SUID ptrace exploit attempts
2003/07/23-13:10:38 Uu ? node.cern.ch[172.17.35.98] kernel request_module[net-pf-14]: waitpid(28284,...) failed, errno 512
2002/01/31-01:46:23 Uu ? node.cern.ch[172.17.35.98] kernel ptrcchk: uid=19201 tried ptrace on suid/sgid file /usr/bin/passwd
– Generated by a proprietary kernel module
– network sniffer2003/10/03-17:44:54 Uu ? node.cern.ch[172.17.35.98] kernel device eth0 entered promiscuous mode
– repeated login failures– etc.
This data can use used together with network IDS results
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada10
Keep system accounting
Also very important element in forensics analysis
3 months of raw data on each node– Will soon be centralized
Compressed with bzip2 -9 Parsed, summarized and stored centrally for
statistical purposes
[email protected], Autumn HEPiX 2003, Triumf, Vancouver, Canada11
Conclusion
When does it work well ?– Repeated intruder activity
When there is a new intrusion pattern we quickly add a new scan pattern
– Intruder doesn’t know our infrastructure
What are the limits ?– First time when there is a new way to break in we
do not know about– Intruder discovers our infrastructure (clusters)
Top Related