Download - Malware detection in SIMARGL project

Malware detection in SIMARGL project

Seminarium ZCB-ZSUT, 20.01.2021

dr hab. inż. Artur Janicki, prof. uczelniemail: [email protected]

mailto:[email protected]

Agenda

• General facts about the SIMARGL Project

• SIMARGL tools and assets

• WUT contribution to SIMARGL

• State-of-the art review

• Network anomaly detection

• Steganalysis in graphical files

• Project metrics

• Future work

Motivation

• Malware – growing and important problem

• Damage to the world economy caused by cybercrime expected to reach 6 trillion of US

dollars per year in 2021

• Viruses, worms, trojan horses, spyware, ransomware, cryptojacking, Advanced Persistent

Threats (APTs), stegomalware, fileless malware, banking trojans, mobile malware…

• Emerging techniques: information hiding, evasion, obfuscation, masquerade

SIMARGL – Secure Intelligent Methods for Advanced RecoGnitionof malware and stegomalware

Main strategic objectives:

• To provide effective methods to counter attacks, cyber

crimes and a broad range of malware including

stegomalware

• To propose, implement and validate innovative machine

and deep learning methods do detect malware (including

stegomalware), ransomware and network anomalies

✓ Acronym SIMARGL

(Horizon 2020, topic SU-ICT-01-2018)

✓ Coordinator: FernUniversität in Hagen

✓ Scientific coordinator: Prof. Joerg Keller

✓ Project coordinator: Prof. Michał Choraś

✓ 14 Consortium members

✓ 7 countries

✓ Duration: 36 months

(1 May 2019 – 30 Apr 2022)

✓ Total cost of the project: 6,076 M€

SIMARGL logo ;)

Simargl or Semargl or Si(e)margł is a deity or mythical creature in East

Slavic mythology, depicted as a winged lion or dog. His wife is Kupalnitsa,

goddess of night. He is also a father of Kupalo and Kostroma. Zoryas, solar

goddesses who are servants or daughters of the deity Dazhbog, keep Simargl

chained to the star Polaris in the constellation Ursa Minor, to prevent him

from breaking free and destroying the constellation, and causing the world to

end. Simargl is also the father of Skif and the founder of Cythia.

An idol of Semargl was present in the pantheon of Great Prince Vladimir I of

Kiev.

It may be the equivalent of Simurgh in Persian mythology, which is also

represented as a griffin with a dog body.

[Wikipedia]

[Akkera-S, www.slawoslaw.pl]

The Consortium (well balanced)

• FernUniversität in Hagen (the coordinator)

from Germany

• Netzfactor GmbH from Germany

• Airbus CyberSecurity SAS from France

• Thales SIX GTS France from France

• Consiglio Nazionale delle Ricerche from Italy

• NUMERA S.p.a. from Italy

• Pluribus-One from Italy

• Institute of International Relations from Czechia

• ITTI Sp. z o.o. from Poland

• Warsaw University of Technology

from Poland

• CERT Orange Polska from Poland

• Software Imagination & Vision

(SIMAVI) from Romania

• RoEduNet (ARNIEC Agency) from

Romania

• Stichting CUIng Foundation from the

Netherlands

SIMARGL work breakdown structure

• WP1 Management (FUH)

• WP2 Architecture Specification of the SIMARGL Toolkit to Detect and Counter Malware and Stegomalware (ITTI)

• WP3 Legal, Social Sciences and Humanities Aspects of the SIMARGL Toolkit to Detect and Counter Malware and

Stegomalware (IIR)

• WP4 Design and Development of Innovative Solutions to Detect and Counter Malware and Stegomalware (TCS)

• WP5 Design and Development of intelligent and holistic SIMARGL toolkit (services) (Pluribus)

• WP6 Integration, Validation and Demonstration (SIMAVI)

• WP7 Communication, Dissemination and Training for selected LEAs (CUIng)

• WP8 Exploitation and Impact (Pluribus)

SIMARGL tools and assets

SIMARGL toolkit architecture

Overall

dashboard /

Graphical view

Visualization and UI/UX concept

Exploitation Directions

Technological

Enhancements

Procedural & Methodological Enhancements

Academic

Research

Product-Specific

Improvements

Integrated

SIMARGL

Solution

Training for

LEAS

Policies

Academic exploitation and impact

Industry exploitation and impact

End-users and LEAs exploitation

and impact

WUT team in SIMARGL

• dr hab. inż. Artur Janicki, prof. uczelni – project manager at WUT

• dr hab. inż. Mariusz Rawski, prof. uczelni

• dr hab. inż. Krzysztof Szczypiorski, prof. uczelni

• dr inż. Katarzyna Wasielewska

• mgr inż. Mikołaj Płachta – PhD student

• mgr inż. Paweł Szumełda

• inż. Mikołaj Kowalczyk

• supporting students

WUT contribution

• Co-ordinating efforts on preparing and publishing a state-

of-the-art review in malware development and detection

• Research on anomaly detection in network traffic analysis

• Research on steganalysis in JPEG files

• Other research, mostly related with information hiding

and applications of machine learning

State-of-the-art review

• Meta-survey part (28 other surveys reviewed),

with gaps identified

• Evolution of malware, incl. hiding-related threats

• Evolution of malware detection

• Evolution of machine learning applied to

malware detection

• Attack trends and research directions

• Published: IEEE Access, vol. 9, 2021, pp. 5371-

5396

Creating a dataset with network traffic data

• Problem with network traffic datasets• very fragmented, usually contain traffic related to selected threats only;

• detection results reported for such fragmented data – very good, but not realistic.

• We created our dataset• fused on the network flow level;

• flows originating from multiple sources: CIC IDS 2017, ISOT BOTNET, CTU 13 DATASET, Booters ddos;

• data from www.malware-traffic-analysis.net included.

• Dataset contains 22 million flows• 8 million malicious, 15 million benign.

http://www.malware-traffic-analysis.net/

Research on malware-related anomaly detection

• Researching detection of anomalous flows

• Experiments with various feature spaces and classifiers, uni- and bidirectional flows.

• For CICIDS 2017 database, compared to reference study (I. Sharafaldin, A. Lashkari and A. Ghorbani “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization”)• increase of precision from 98% to 99.9%, • increase of recall from 97% to 99.7% and • increase of F1 from 97% to 99.8%.

• Initial detection for the fused dataset (22 million flows) results • Accuracy: 79%, F1 score: 61%. • Much worse, but more realistic.

• Research on feature selection and the use of ensemble classifiers.

Plan: to use an ensemble classifier

[Amini et al.]

Research on anomaly detection for subsets of data

• Subsets related with attacks

• Experiments with feature selection

• Finding an optimal classifier for a given subset

• Finding an optimal operating point for each classifier

Research on detecting steganography in JPEG images

• Developing JPEG StegoChecker module for detecting JPEG files with suspicious content (within regular JPEG file structure).

• Databases: BOSS (10k images) and iStego100k (100k image pairs: cover + stego).

• Steganographic algorithms: J-UNIWARD, nsF5, UERD.

• Researching various parameters, e.g., Discrete Cosine Transform Residual (DCTR), Gabor Filters Residual (GFR), Phase-Aware Projection Model (PHARM).

• Researching various ML classifiers: shallow (e.g., decision trees, SVMs) and deep (e.g., RNN, CNN in various configurations).

• Very challenging…

cover

stego

Research on detecting steganography in JPEG images

• Idea: using an ensemble classifier [Kodovsky, Friedrich, 2011]

• Database: BOSS, steganographic algorithm: nsF5, parameters: PHARM

• Base learner: Fisher Linear Discriminant (FLD)

• bpnzAC = 0.4; d = 12600; dsub = 2000; L = 69 => accuracy = 98.9%

[Kodovsky, Friedrich, 2011]

Making an intrusion detection system (IDS) stego-aware

• Adapting an open-source IDS system (Zeek) to be able to detect steganographic transmission.

• Investigating various hidden channels based on ICMP, IP, TCP, MQTT and SIP.

• Enhancing Zeek with stegano-aware detection scripts.

• 23 steganographic scenarios tested, 21 successfully detected (91% success rate).

• Integrating Zeek with Kibana and Elasticsearch and an alerting system.

Other WUT’s contribution

• 2 BSc SIMARGL-related theses written and defended.

• 1 PhD candidate doing a research for SIMARGL.

• Contribution to deliverables: D1.1, D1.2, D2.1, D2.4, D4.1, D4.2, D4.3, D5.1, D6.1, D8.1.

• A publication on ML for IoT security: Skowron M., Janicki A., Mazurczyk W.: “Traffic Fingerprinting Attacks on Internet of Things using Machine Learning”, w: IEEE Access, vol. 8, 2020, ss. 1-15.

• A publication on hidden channels in VoIP telephony: Radej A., Janicki A., “Modification of Pitch Parameters in Speech Coding for Information Hiding”, in Proc. 23rd International Conference on Text, Speech, and Dialogue (TSD 2020), Brno, Czechia, Sept. 8-11, 2020, ser. LNCS vol. 12284. Springer, 2020, pp. 513-523.

• 2 publications in progress, 2 BSc theses in progress.

SIMARGL publications (1/2)1. Szary, P., Mazurczyk, W., Wendzel, S., Caviglione, L., Design and performance evaluation of reversible network covert channels, In Proceedings of the 15th

International Conference on Availability, Reliability and Security (pp. 1-8), August 2020.

2. Puchalski, D., Caviglione, L., Kozik, R., Marzecki, A., Krawczyk, S., Choraś, M., Stegomalware detection through structural analysis of media files. In

Proceedings of the 15th International Conference on Availability, Reliability and Security (pp. 1-6), August 2020.

3. Komisarek M., Choras M., Kozik R., Pawlicki M., Real-time stream processing tool for detecting suspicious network patterns using machine learning,

ARES'20: Proceedings of the 15th International Conference on Availability, Reliability and Security, Dublin, August 2020 (CORE B).

4. Pawlicka A., Choras M., Pawlicki M., Cyberspace threats: not only hackers and criminals. Raising the awareness of selected unusual cyberspace actors -

cybersecurity researchers' perspective, ARES'20: Proceedings of the 15th International Conference on Availability, Reliability and Security, Dublin, August

2020 (CORE B).

5. Carrega A., Caviglione L., Repetto M., Zuppelli M., Programmable Data Gathering for Detecting Stegomalware, 2nd International Workshop on Cyber-

Security Threats, Trust and Privacy Management in Software-defined and Virtualized Infrastructures (SecSoft), Ghent, Belgium, July 2020.

6. Radej A., Janicki A., Modification of Pitch Parameters in Speech Coding for Information Hiding, in Proc. 23rd International Conference on Text, Speech, and

Dialogue (TSD 2020), Brno, Czechia, Sept. 8-11, 2020, ser. LNCS vol. 12284. Springer, 2020, pp. 513-523.

7. Saenger J., Mazurczyk W., Keller J., Caviglione L., VoIP network covert channels to enhance privacy and information sharing, Future Generation Computer

Systems, 2020.

8. Mazurczyk W., Powojski K., Caviglione L., IPv6 Covert Channels in the Wild, Central European Cybersecurity Conference, pp. 10:1 - 10:6, Munich, Germany,

November 2019.

9. Mazurczyk W., Szary P., Wendzel S., Caviglione L., Towards Reversible Storage Network Covert Channels, Criminal Use of Information Hiding Workshop, 14th

International Conference on Availability, Reliability and Security, pp. 69:1 - 69:8, Canterbury, UK, August 2019.

10. Choraś M., Pawlicki M., Puchalski D., Kozik R., Machine Learning – the results are not the only thing that matters! What about security, explainability and

fairness?, in Proc of ICCS 2020, Computational Science 2020, LNCS 12140, Springer, June 2020 (Core A).

11. Spiekermann D., Keller J., Impact of Virtual Networks on Anomaly Detection with Machine Learning, In: Proc. 2nd Int.l Workshop on Cyber-Security

Threats, Trust and Privacy management in Software-defined and Virtualized Infrastructures (SecSoft 2020 at NetSoft 2020), July 2020.

12. Skowron M., Janicki A., Mazurczyk W., Traffic Fingerprinting Attacks on Internet of Things Using Machine Learning, in IEEE Access, vol. 8, pp. 20386-20400,

2020, doi: 10.1109/ACCESS.2020.2969015.

13. Heinz C., Mazurczyk W., Caviglione L., Covert Channels in Transport Layer Security, In Proc. of European Interdisciplinary Cybersecurity Conference (EICC

2020), Rennes, France, November 2020

14. Keller J., Wendzel S., Covert Channels in One-Time Passwords Based on Hash Chains, in: Proc. European Interdisciplinary Cybersecurity Conference (EICC

2020), Nov. 2020.

15. Caviglione L., Choras M., Corona I., Janicki A., Mazurczyk W., Pawlicki M., Wasielewska K., Tight Arms Race: Overview of Current Malware Threats and

Trends in Their Detection, IEEE Access, vol. 9, pp. 5371-5396, 2021.

SIMARGL publications (2/2)

SIMARGL success criteria

Measure Key Performance Indicator Expected numbers Current numbers

End-user involvement

Organisation of end-user workshops and demonstration 3 3

End-users invited to the project events 30 ?

Training sessions conducted 2 0 *

End-user staff trained on SIMARGL solutions 15 0 *

Technical tools SIMARGL toolkit 1, in 2-3 releases 0 **

Technical tools (network level)Technical tools (at network level) developed and integrated into the

SIMARGL framework5 5

Technical tools Innovative Algorithms, Methods and Solutions 5 5

Dissemination (traditional

publicity)

Publications submitted and published 914 + 2 accepted + 6

submitted

Conference presentations 9 13 + 2 upcoming

Dissemination material prepared (leaflets, posters, newsletters, etc.) 9 11Dissemination and exploitation

activities

Participation in industry-oriented events 9 2

Consultation actions with industry representatives 6 0

Impact on academia / researchUndergraduate assignments and MSc/PhD thesis on SIMARGL topic 6 7

Seminars at universities 9 4

Online presenceProject website popularity (number of visitors) 1000/year 4283 in first 18 months

Social media channels used to share project progress and results 2 4

• Training sessions are planned for the second half of the SIMARGL

• The first release of the SIMARGL toolkit is expected at M24

Future work

• Further research work on anomaly detection in network traffic analysis

• Further research work on steganalysis in JPEG files

• Working on publications

• Integrating our solutions with other components/SIMARGL toolkit

Malware detection in SIMARGL projecthttps://simargl.eu

Seminarium ZCB-ZSUT, 20.01.2021

dr hab. inż. Artur Janicki, prof. uczelniemail: [email protected]

https://simargl.eu/

mailto:[email protected]