Deployment and operation of a centralised system for log ... · Deployment and operation of a...

10
Deployment and operation of a centralised system for log management and analysis Whitepaper - Case report with arguments for deployment of a SEM/SIEM solution including user case scenarios There exist three basic reasons why organisations should consider deploying a system for centralised management and analysis of logs. Each of these reasons can have a different weight depending on the type of the organisation: Operations Security Compliance and audit The present article describes the options in each of these areas with practical real-life examples of the use of LOGmanager. Of course, the range of available options is much wider and the presented user cases are only illustrative. Operations A critical IT incident is the pivotal term in today’s IT world; just like taxes or death, it cannot be evaded, because sooner or later it is certain to come. First of all, what is a critical IT incident – it is a situation whereby a business application or infrastructure linked to that application become non-functional. Such a situation calls for immediate response and the organisation’s IT team needs to be able to collaborate on a speedy removal of the defect according to the nature of the incident. In this context, two concepts are typically used – MTTR (Mean Time To Repair) and RCA (Root Cause Analysis). The role of the IT Department is to find the underlying cause as quickly as possible, eliminate it, and subsequently analyse the reasons of the outage including its context and determine corrective actions to prevent occurrence of an identical or similar issue in the future. A practical example illustrating the use of LOGmanager to address a critical IT incident : The approximate time when the outage occurred is known. Currently, nothing except broadcast domains communicates. You go to LOGmanager and select the relevant time period, plus or minus 5 minutes, in the “All Event Overview” dashboard. You select major and critical events and start reviewing the logs in the selection starting from the beginning of the selected time slot. When reviewing the logs, you find out that the central router lost connection with its OSPF peers.

Transcript of Deployment and operation of a centralised system for log ... · Deployment and operation of a...

Page 1: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

Deployment and operation of a centralised system for log management and analysis Whitepaper - Case report with arguments for deployment of a SEM/SIEM solution including user case scenarios

There exist three basic reasons why organisations should consider deploying a system for centralised management and analysis of logs. Each of these reasons can have a different weight depending on the type of the organisation:

• Operations

• Security

• Compliance and audit The present article describes the options in each of these areas with practical real-life examples of the use of LOGmanager. Of course, the range of available options is much wider and the presented user cases are only illustrative.

Operations A critical IT incident is the pivotal term in today’s IT world; just like taxes or death, it cannot be evaded, because sooner or later it is certain to come. First of all, what is a critical IT incident – it is a situation whereby a business application or infrastructure linked to that

application become non-functional. Such a situation calls for immediate response and the organisation’s IT team needs to be able to collaborate on a speedy removal of the defect according to the nature of the incident. In this context, two concepts are typically used – MTTR (Mean Time To Repair) and RCA (Root Cause Analysis). The role of the IT Department is to find the underlying cause as quickly as possible, eliminate it, and subsequently analyse the reasons of the outage including its context and determine corrective actions to prevent occurrence of an identical or similar issue in the future. A practical example illustrating the use of LOGmanager to address a critical IT incident: The approximate time when the outage occurred is known. Currently, nothing except broadcast domains communicates. You go to LOGmanager and select the relevant time period, plus or minus 5 minutes, in the “All Event Overview” dashboard. You select major and critical events and start reviewing the logs in the selection starting from the beginning of the selected time slot. When reviewing the logs, you find out that the central router lost connection with its OSPF peers.

Page 2: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

www.logmanager.cz

This rendered the routing process non-functional. Also, the links to the internal DNS servers were lost.

However, physical links and layer 2 network protocol (MSTP, LLDP) conversion capabilities remained

unaffected. You cancel the selection of major and critical events and focus on the information

available from the time before the OSPF failure. In the time slot preceding the loss of connection, you

find large amount of information about automatic modification of security profile filters in the

Intrusion Prevention System. This is a hint implying a potential issue – the IPS might by blocking the

OSPF traffic. To verify this possibility, you switch the IPS into a transparent mode, in which it does not

block any traffic. Upon analysing the logs, you find out that the IPS erroneously distributed a profile

for an IDS mode. The underlying cause was an error in the profiles description. After switching off the

IPS, the OSPF link is restored. Further analysis of IPS logs reveals that a filter was activated after

making configuration changes to the IPS. This blocked the transmission of packets for which TTL=1.

The number of hits for that particular filter has been increasing continuously over the reviewed period.

It is clear to a network specialist that packets with TTL=1 included also routing protocol packets and

that their blocking resulted in non-functional routing tables. By reviewing the Audit logs you find out

that your colleague from the Security Department incorrectly renamed a safety profile in the IPS three

days ago, and the profile was subsequently propagated automatically. You have identified the cause

and removed the defect. There is a lesson to be taken to prevent occurrence of similar issues in the

future: disable fully automated distribution of security profiles and allow only assisted distribution –

i.e., the system should prompt for profile distribution, but the distribution must be manually approved

by the administrator.

LOGmanager screenshots:

The change in the number of blocked events over time suggests that something has changed. By selecting a time period starting before the critical IT incident occurred, we see that the number of TTL=1 events significantly increased.

Page 3: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

www.logmanager.cz

Log format unification and centralisation of logs. Distribution of logs across different systems and devices, different retention times and different language used in the logs can potentially cause issues, for which it is advisable to deploy a centralised

system. Every device takes a different approach to log management, records logs in a different machine language (typically unique per vendor) and uses different amount of local storage to keep logs; this results in different retention periods for which the logs are preserved. If you are looking for a particular record because you need to address an operational issue, you’ll need to go through logs stored on different devices, understand where to look for the information you need and keep searching... A centralised system such as LOGmanager conveniently collects logs from all devices and stores them in a single place. Furthermore, it uses parsers to translate the logs into a common language that is easy to understand and indexes the records to enable fast searching. When you later need to deal with an operational issue, you have a single source of information to turn to - where you can review logs generated by switches, firewalls, MS Active Directory and the application that became inaccessible, and you can identify the cause of the issues quickly and efficiently. Using centralised logs in LOGmanager for analysing an operational issue. One of the employees is not able to logon from his laptop to a wireless network protected by 802.1X authentication. Available input information – username from the Active Directory and the MAC address of the wireless network card in the laptop. Simply select any MAC address and change its value to the value you are looking for. Go through the logs from the centralised wirelesses switch and look for the MAC address you are interested in. You find information about repeated unsuccessful logon attempts for a specific MAC address. You check in the Active Directory, but the user’s account has not been blocked. Upon reviewing the RADIUS server logs (Network Policy Server in AD or Freeradius) you find out that the certificate or password used by the laptop for the 802.1X authentication process has expired. Screenshot from this action follows on the next page.

Page 4: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

www.logmanager.cz

Fast analysis thanks to centralised logs. Thanks to the centralisation of logs in LOGmanager, IT operators can quickly analyse multiple sources of information without having to obtain administrator access to each of the systems. Logs stored in LOGmanager

cannot be deleted or modified. Therefore, the technical staff without administrator privileges are able to review logs from most of the operational systems without needing to access the systems themselves. As part of their job, they can thus analyse routine operational issues and communicate requests for their resolution upstream in the IT hierarchy. An example of the use of LOGmanager by an IT technician without access rights to applications and/or the virtualisation system. Users complain that their critical application has been responding with significant delay in the past 30 minutes and some of the tasks take up to 1 minute to accomplish. Normally, the application responds within seconds. Upon reviewing the logs from the corresponding database server in LOGmanager, you haven’t identified any critical log. The log from VMware vCenter shows significant incidents identified as “IO Latency Increase” from 2ms to 428ms. Upon reviewing the log from the NFS subsystem, you find out that there was an outage of disk 0:8 in the disk array and that the relevant array initiated a repair of the defective disk. The delay in the disk subsystem caused delayed response of the application accessing the NFS. The technician identifies the defect and passes the information to the NFS administrator for immediate action. To perform this analysis, the technician did not need any access rights to the virtualisation system or to NFS, yet the problem was still quickly resolved. Screenshot from this action follows on the next page.

Page 5: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

www.logmanager.cz

LOGmanager screenshots: Overview of events from the Virtualisation system for the past 6 hours:

IO Latency High event detail:

Security

The most significant benefits in the area of security are the protection of logs against tampering and the ability to proactively identify potential security risks, fine-tune configurations and track changes. Once a record is stored in LOGmanager, it cannot be

deleted or modified. Organisations dealing with security issues often find out that the attacker deleted all records about malicious activity in the systems and devices to which he obtained unauthorised access. Therefore, it can be very difficult to get information about the attacker’s activities and make such information available for a thorough forensic analysis of the incident. An example showing how to trace commands entered from the command line on a computer used by the hacker as the point of entry. A prerequisite is to ensure correct distribution of the LOGmanager component WES (Windows Event Sender) on individual workstations and servers in the organisation and to have the auditing rules set up correctly. By default, Windows systems do not provide logs with sufficient amount of information and they need to be configured appropriately. LOGmanager can store information about activities on each workstation obtained not only from the four basic Windows logs (Application, Security, Setup and System), but also from the installed Applications and Service logs. Here is an example of an extended log containing information about an attempt to access a resource requiring administrator access rights: The SMB client failed to connect to the share. Error: {Access Denied} A process has requested access to an object, but has not been granted those access rights. Path: \198.19.254.146\ADMIN$

Page 6: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

www.logmanager.cz

Drill down in the events. The logs can be easily filtered, to avoid being overloaded with logs that are not relevant for the given operational or security context. In this case, an IT security technician wants to know which commands triggering remote processes were run from the command line. A tool frequently used not only by administrators but also by hackers is PsExec – a light-weight tool that let’s execute processes on other systems, complete with full interactivity for console applications, without having to manually install client software. PsExec’s most powerful uses include launching interactive command-prompts on remote systems and remote-enabling tool. In the example, a record about running the psexec command from the command line was found with identification of stations on which the command was used.

If you need to see the commands most frequently used in conjunction with psexec.exe, just select msg.commandline in the fields menu and LOGmanager will show a dynamic list of commands run in the given context. Of course, it is also possible to initiate an alert when a specific command is run on a workstation. This ensures that the security administrator is informed immediately if anyone tries to use a suspicious command, such as the commands shown in the table below.

Other commands called from the command line suggested for tracking: arp.exe; at.exe; bcdedit.exe; bcp.exe; chcp.exe; cscript.exe; csvde; dsquery.exe; ipconfig.exe; mimikatz.exe; nbtstat.exe; nc.exe; netcat.exe; netstat.exe; nmap; nslookup.exe; netsh; OSQL.exe; powershell.exe; powercat.ps1; psexecsvc.exe; psLoggedOn.exe; procdump.exe; qprocess.exe; query.exe; rar.exe; reg.exe; route.exe; runas.exe; rundll32; schtasks.exe; sethc.exe; sqlcmd.exe; sc.exe; ssh.exe; sysprep.exe; systeminfo.exe; net.exe; reg.exe; tasklist.exe; tracert.exe; vssadmin.exe; whoami.exe; wscript.exe; wmic.exe

Page 7: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

www.logmanager.cz

Proactive approach – If you set up alerts in LOGmanager properly, the system will inform you automatically about security incidents detected on the basis of monitored parameters and your organisation will be able to quickly respond to any issues. Furthermore,

LOGmanager collects information about changes implemented in individual systems, which allows you to easily identify who made a change and what was the result. You can also monitor failed logon attempts to systems in which sensitive data are stored, attempts to test security rules within the network, and so on. An example of using LOGmanager for issuing an alert notifying of an attempt to gain unauthorised access under a username with administrator or operator privileges. For systems storing sensitive data that shouldn’t be normally accessed by an administrator or which should be accessed only exceptionally, it is typically necessary to monitor all attempts of unauthorised access. The Alerts can be set up using a built-in template that can be modified to include specific administrator and operator usernames.

The Alerts settings can be easily tested and the formatting of notifications can be customised to fit your needs.

Page 8: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

www.logmanager.cz

Compliance

There are also many challenges in the area of regulatory compliance. Organisations belonging to the state’s critical infrastructure are required to archive and analyse logs capturing activities in individual areas defined by law as well as logs from individual systems and resources. However, as there are not many organisations belonging to the state’s critical infrastructure, the most typical user cases relate to the need to ensure compliance with a new EU regulation – the GDPR, which will come into force in May 2018. Compliance with the General Data Protection Regulation is mandatory for all organisations that annually process more than 5,000 records about entities operating in the EU. The term “records” refers to any personally identifiable information, including data about customers, employees and/or legal entities. In practice, personally identifiable information can be anything that can be used to identify an entity: the name, address, e-mail address, social security number, ID card number, but also IP addresses, which the given entity uses and which are recorded by the organisation. The GDPR requires that organisations implement functional processes enabling to document a security issue in detail and they must also be able to produce such information for the purposes of an audit or potential investigation. The requirements of GDPR provisions are described in very general terms and it is necessary to define new corporate policies, roles and responsibilities.

Audit/Reports – When a security audit is performed in an organisation it is essential to have a system capable of generating reports based on the auditor’s requirements. LOGmanager allows you to generate reports not only in a graphic form but also in the CSV

format with a structure according to the auditor’s requirements. You can select any of the parsed fields and include them in the report and you can export a file with tens of thousands of lines, if needed. LOGmanager further supports the option to access the log database directly via REST-API and process database queries directly from your own reporting tool. A real-life example: LOGmanager is a solution that monitors and provides robust evidence documenting access to systems containing personal data and generates alerts when the systems detect an attempt to obtain unauthorised access. In the case of an attack, it provides a full set of logs from the relevant systems that can be provided to the investigators and/or to the national audit authority. The investigator will want to obtain complete information about all attempts to access systems containing critical data in a text format including the fields identifying the exact time of the incident, username, whether the access was granted or denied, and what was the source address of the remote system attempting to gain access.

Page 9: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

www.logmanager.cz

A screenshot of the export of selected data in the CSV format on the basis of auditor’s request:

Screenshot of a source report with information about successful and unsuccessful login attempts to a system containing sensitive data:

Page 10: Deployment and operation of a centralised system for log ... · Deployment and operation of a centralised system for log management and ... An example showing how to trace commands

www.logmanager.cz

FAQ: What logs and events do I need to collect and monitor? The answer lies in the understanding of the main purpose of collecting machine data. A general rule applies that everything that has some value for the purpose of the collection should be collected. There are three main purposes – operational, security and legal. The settings of the source systems need to be modified accordingly. This is a continuous activity because new systems are added and existing ones modified or removed on an ongoing basis. It is necessary to constantly monitor changes in the IT structures and incorporate also Logging and Audit items in the change management system. As far as the volume of data is concerned it is always better to collect as much information as possible and to collect the data in the highest level of detail. Filtering out irrelevant data is easy with LOGmanager. A comparison can be used: when you have a proper tool available, it will be easy to find a needle in a haystack; however, you will not be able to find it, if it has not been put in the haystack in the first place. Therefore, LOGmanager detailed documentation contain guides, how to properly configure log source devices. This include also 25 pages’ guide, how to configure Microsoft Advanced Audit policies through the GPO settings in AD. For how long should I keep machine data? Here, the answer is easy. LOGmanager provides more than sufficient and quickly accessible storage capacity for its database. For example, when collecting 250 GB of data per day, the retention period is at least one year on the highest model of LOGmanager. This goes beyond the majority of regulatory and operational requirements. Once the given disk space is exhausted, LOGmanager simply notify the administrator and then delete the oldest day of logs stored in the system. Storing new and consistency of the older data is not affected during this operation. How much will it cost? LOGmanager is a system with no hidden costs. The system does not use a licencing model based on the number of devices or the total volume of logs. The quoted price covers the solution as such plus optimised hardware, which the vendor replaces after 3 or 5 years depending on the type. Software upgrades and technical support for the first year are included in the price. The price for extended support period and for SW upgrades is set at 15% of the product’s purchase price.