Malware detection in SIMARGL project
Seminarium ZCB-ZSUT, 20.01.2021
dr hab. inż. Artur Janicki, prof. uczelniemail: [email protected]
Agenda
• General facts about the SIMARGL Project
• SIMARGL tools and assets
• WUT contribution to SIMARGL
• State-of-the art review
• Network anomaly detection
• Steganalysis in graphical files
• Project metrics
• Future work
Motivation
• Malware – growing and important problem
• Damage to the world economy caused by cybercrime expected to reach 6 trillion of US
dollars per year in 2021
• Viruses, worms, trojan horses, spyware, ransomware, cryptojacking, Advanced Persistent
Threats (APTs), stegomalware, fileless malware, banking trojans, mobile malware…
• Emerging techniques: information hiding, evasion, obfuscation, masquerade
SIMARGL – Secure Intelligent Methods for Advanced RecoGnitionof malware and stegomalware
Main strategic objectives:
• To provide effective methods to counter attacks, cyber
crimes and a broad range of malware including
stegomalware
• To propose, implement and validate innovative machine
and deep learning methods do detect malware (including
stegomalware), ransomware and network anomalies
✓ Acronym SIMARGL
(Horizon 2020, topic SU-ICT-01-2018)
✓ Coordinator: FernUniversität in Hagen
✓ Scientific coordinator: Prof. Joerg Keller
✓ Project coordinator: Prof. Michał Choraś
✓ 14 Consortium members
✓ 7 countries
✓ Duration: 36 months
(1 May 2019 – 30 Apr 2022)
✓ Total cost of the project: 6,076 M€
SIMARGL logo ;)
Simargl or Semargl or Si(e)margł is a deity or mythical creature in East
Slavic mythology, depicted as a winged lion or dog. His wife is Kupalnitsa,
goddess of night. He is also a father of Kupalo and Kostroma. Zoryas, solar
goddesses who are servants or daughters of the deity Dazhbog, keep Simargl
chained to the star Polaris in the constellation Ursa Minor, to prevent him
from breaking free and destroying the constellation, and causing the world to
end. Simargl is also the father of Skif and the founder of Cythia.
An idol of Semargl was present in the pantheon of Great Prince Vladimir I of
Kiev.
It may be the equivalent of Simurgh in Persian mythology, which is also
represented as a griffin with a dog body.
[Wikipedia]
[Akkera-S, www.slawoslaw.pl]
The Consortium (well balanced)
• FernUniversität in Hagen (the coordinator)
from Germany
• Netzfactor GmbH from Germany
• Airbus CyberSecurity SAS from France
• Thales SIX GTS France from France
• Consiglio Nazionale delle Ricerche from Italy
• NUMERA S.p.a. from Italy
• Pluribus-One from Italy
• Institute of International Relations from Czechia
• ITTI Sp. z o.o. from Poland
• Warsaw University of Technology
from Poland
• CERT Orange Polska from Poland
• Software Imagination & Vision
(SIMAVI) from Romania
• RoEduNet (ARNIEC Agency) from
Romania
• Stichting CUIng Foundation from the
Netherlands
SIMARGL work breakdown structure
• WP1 Management (FUH)
• WP2 Architecture Specification of the SIMARGL Toolkit to Detect and Counter Malware and Stegomalware (ITTI)
• WP3 Legal, Social Sciences and Humanities Aspects of the SIMARGL Toolkit to Detect and Counter Malware and
Stegomalware (IIR)
• WP4 Design and Development of Innovative Solutions to Detect and Counter Malware and Stegomalware (TCS)
• WP5 Design and Development of intelligent and holistic SIMARGL toolkit (services) (Pluribus)
• WP6 Integration, Validation and Demonstration (SIMAVI)
• WP7 Communication, Dissemination and Training for selected LEAs (CUIng)
• WP8 Exploitation and Impact (Pluribus)
Exploitation Directions
Technological
Enhancements
Procedural & Methodological Enhancements
Academic
Research
Product-Specific
Improvements
Integrated
SIMARGL
Solution
Training for
LEAS
Policies
Academic exploitation and impact
Industry exploitation and impact
End-users and LEAs exploitation
and impact
WUT team in SIMARGL
• dr hab. inż. Artur Janicki, prof. uczelni – project manager at WUT
• dr hab. inż. Mariusz Rawski, prof. uczelni
• dr hab. inż. Krzysztof Szczypiorski, prof. uczelni
• dr inż. Katarzyna Wasielewska
• mgr inż. Mikołaj Płachta – PhD student
• mgr inż. Paweł Szumełda
• inż. Mikołaj Kowalczyk
• supporting students
WUT contribution
• Co-ordinating efforts on preparing and publishing a state-
of-the-art review in malware development and detection
• Research on anomaly detection in network traffic analysis
• Research on steganalysis in JPEG files
• Other research, mostly related with information hiding
and applications of machine learning
State-of-the-art review
• Meta-survey part (28 other surveys reviewed),
with gaps identified
• Evolution of malware, incl. hiding-related threats
• Evolution of malware detection
• Evolution of machine learning applied to
malware detection
• Attack trends and research directions
• Published: IEEE Access, vol. 9, 2021, pp. 5371-
5396
Creating a dataset with network traffic data
• Problem with network traffic datasets• very fragmented, usually contain traffic related to selected threats only;
• detection results reported for such fragmented data – very good, but not realistic.
• We created our dataset• fused on the network flow level;
• flows originating from multiple sources: CIC IDS 2017, ISOT BOTNET, CTU 13 DATASET, Booters ddos;
• data from www.malware-traffic-analysis.net included.
• Dataset contains 22 million flows• 8 million malicious, 15 million benign.
Research on malware-related anomaly detection
• Researching detection of anomalous flows
• Experiments with various feature spaces and classifiers, uni- and bidirectional flows.
• For CICIDS 2017 database, compared to reference study (I. Sharafaldin, A. Lashkari and A. Ghorbani “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization”)• increase of precision from 98% to 99.9%, • increase of recall from 97% to 99.7% and • increase of F1 from 97% to 99.8%.
• Initial detection for the fused dataset (22 million flows) results • Accuracy: 79%, F1 score: 61%. • Much worse, but more realistic.
• Research on feature selection and the use of ensemble classifiers.
Research on anomaly detection for subsets of data
• Subsets related with attacks
• Experiments with feature selection
• Finding an optimal classifier for a given subset
• Finding an optimal operating point for each classifier
Research on detecting steganography in JPEG images
• Developing JPEG StegoChecker module for detecting JPEG files with suspicious content (within regular JPEG file structure).
• Databases: BOSS (10k images) and iStego100k (100k image pairs: cover + stego).
• Steganographic algorithms: J-UNIWARD, nsF5, UERD.
• Researching various parameters, e.g., Discrete Cosine Transform Residual (DCTR), Gabor Filters Residual (GFR), Phase-Aware Projection Model (PHARM).
• Researching various ML classifiers: shallow (e.g., decision trees, SVMs) and deep (e.g., RNN, CNN in various configurations).
• Very challenging…
cover
stego
Research on detecting steganography in JPEG images
• Idea: using an ensemble classifier [Kodovsky, Friedrich, 2011]
• Database: BOSS, steganographic algorithm: nsF5, parameters: PHARM
• Base learner: Fisher Linear Discriminant (FLD)
• bpnzAC = 0.4; d = 12600; dsub = 2000; L = 69 => accuracy = 98.9%
[Kodovsky, Friedrich, 2011]
Making an intrusion detection system (IDS) stego-aware
• Adapting an open-source IDS system (Zeek) to be able to detect steganographic transmission.
• Investigating various hidden channels based on ICMP, IP, TCP, MQTT and SIP.
• Enhancing Zeek with stegano-aware detection scripts.
• 23 steganographic scenarios tested, 21 successfully detected (91% success rate).
• Integrating Zeek with Kibana and Elasticsearch and an alerting system.
Other WUT’s contribution
• 2 BSc SIMARGL-related theses written and defended.
• 1 PhD candidate doing a research for SIMARGL.
• Contribution to deliverables: D1.1, D1.2, D2.1, D2.4, D4.1, D4.2, D4.3, D5.1, D6.1, D8.1.
• A publication on ML for IoT security: Skowron M., Janicki A., Mazurczyk W.: “Traffic Fingerprinting Attacks on Internet of Things using Machine Learning”, w: IEEE Access, vol. 8, 2020, ss. 1-15.
• A publication on hidden channels in VoIP telephony: Radej A., Janicki A., “Modification of Pitch Parameters in Speech Coding for Information Hiding”, in Proc. 23rd International Conference on Text, Speech, and Dialogue (TSD 2020), Brno, Czechia, Sept. 8-11, 2020, ser. LNCS vol. 12284. Springer, 2020, pp. 513-523.
• 2 publications in progress, 2 BSc theses in progress.
SIMARGL publications (1/2)1. Szary, P., Mazurczyk, W., Wendzel, S., Caviglione, L., Design and performance evaluation of reversible network covert channels, In Proceedings of the 15th
International Conference on Availability, Reliability and Security (pp. 1-8), August 2020.
2. Puchalski, D., Caviglione, L., Kozik, R., Marzecki, A., Krawczyk, S., Choraś, M., Stegomalware detection through structural analysis of media files. In
Proceedings of the 15th International Conference on Availability, Reliability and Security (pp. 1-6), August 2020.
3. Komisarek M., Choras M., Kozik R., Pawlicki M., Real-time stream processing tool for detecting suspicious network patterns using machine learning,
ARES'20: Proceedings of the 15th International Conference on Availability, Reliability and Security, Dublin, August 2020 (CORE B).
4. Pawlicka A., Choras M., Pawlicki M., Cyberspace threats: not only hackers and criminals. Raising the awareness of selected unusual cyberspace actors -
cybersecurity researchers' perspective, ARES'20: Proceedings of the 15th International Conference on Availability, Reliability and Security, Dublin, August
2020 (CORE B).
5. Carrega A., Caviglione L., Repetto M., Zuppelli M., Programmable Data Gathering for Detecting Stegomalware, 2nd International Workshop on Cyber-
Security Threats, Trust and Privacy Management in Software-defined and Virtualized Infrastructures (SecSoft), Ghent, Belgium, July 2020.
6. Radej A., Janicki A., Modification of Pitch Parameters in Speech Coding for Information Hiding, in Proc. 23rd International Conference on Text, Speech, and
Dialogue (TSD 2020), Brno, Czechia, Sept. 8-11, 2020, ser. LNCS vol. 12284. Springer, 2020, pp. 513-523.
7. Saenger J., Mazurczyk W., Keller J., Caviglione L., VoIP network covert channels to enhance privacy and information sharing, Future Generation Computer
Systems, 2020.
8. Mazurczyk W., Powojski K., Caviglione L., IPv6 Covert Channels in the Wild, Central European Cybersecurity Conference, pp. 10:1 - 10:6, Munich, Germany,
November 2019.
9. Mazurczyk W., Szary P., Wendzel S., Caviglione L., Towards Reversible Storage Network Covert Channels, Criminal Use of Information Hiding Workshop, 14th
International Conference on Availability, Reliability and Security, pp. 69:1 - 69:8, Canterbury, UK, August 2019.
10. Choraś M., Pawlicki M., Puchalski D., Kozik R., Machine Learning – the results are not the only thing that matters! What about security, explainability and
fairness?, in Proc of ICCS 2020, Computational Science 2020, LNCS 12140, Springer, June 2020 (Core A).
11. Spiekermann D., Keller J., Impact of Virtual Networks on Anomaly Detection with Machine Learning, In: Proc. 2nd Int.l Workshop on Cyber-Security
Threats, Trust and Privacy management in Software-defined and Virtualized Infrastructures (SecSoft 2020 at NetSoft 2020), July 2020.
12. Skowron M., Janicki A., Mazurczyk W., Traffic Fingerprinting Attacks on Internet of Things Using Machine Learning, in IEEE Access, vol. 8, pp. 20386-20400,
2020, doi: 10.1109/ACCESS.2020.2969015.
13. Heinz C., Mazurczyk W., Caviglione L., Covert Channels in Transport Layer Security, In Proc. of European Interdisciplinary Cybersecurity Conference (EICC
2020), Rennes, France, November 2020
14. Keller J., Wendzel S., Covert Channels in One-Time Passwords Based on Hash Chains, in: Proc. European Interdisciplinary Cybersecurity Conference (EICC
2020), Nov. 2020.
15. Caviglione L., Choras M., Corona I., Janicki A., Mazurczyk W., Pawlicki M., Wasielewska K., Tight Arms Race: Overview of Current Malware Threats and
Trends in Their Detection, IEEE Access, vol. 9, pp. 5371-5396, 2021.
SIMARGL publications (2/2)
SIMARGL success criteria
Measure Key Performance Indicator Expected numbers Current numbers
End-user involvement
Organisation of end-user workshops and demonstration 3 3
End-users invited to the project events 30 ?
Training sessions conducted 2 0 *
End-user staff trained on SIMARGL solutions 15 0 *
Technical tools SIMARGL toolkit 1, in 2-3 releases 0 **
Technical tools (network level)Technical tools (at network level) developed and integrated into the
SIMARGL framework5 5
Technical tools Innovative Algorithms, Methods and Solutions 5 5
Dissemination (traditional
publicity)
Publications submitted and published 914 + 2 accepted + 6
submitted
Conference presentations 9 13 + 2 upcoming
Dissemination material prepared (leaflets, posters, newsletters, etc.) 9 11Dissemination and exploitation
activities
Participation in industry-oriented events 9 2
Consultation actions with industry representatives 6 0
Impact on academia / researchUndergraduate assignments and MSc/PhD thesis on SIMARGL topic 6 7
Seminars at universities 9 4
Online presenceProject website popularity (number of visitors) 1000/year 4283 in first 18 months
Social media channels used to share project progress and results 2 4
• Training sessions are planned for the second half of the SIMARGL
• The first release of the SIMARGL toolkit is expected at M24
Future work
• Further research work on anomaly detection in network traffic analysis
• Further research work on steganalysis in JPEG files
• Working on publications
• Integrating our solutions with other components/SIMARGL toolkit
Malware detection in SIMARGL projecthttps://simargl.eu
Seminarium ZCB-ZSUT, 20.01.2021
dr hab. inż. Artur Janicki, prof. uczelniemail: [email protected]
Top Related