RETHINKING GRAPH-BASED APPROACHES: AN EMPIRICAL STUDY OF FEATURE ENGINEERING IN NETWORK INTRUSION DETECTION
DOI:
https://doi.org/10.21063/jtif.2026.V14.1.%25pKeywords:
Intrusion Detection, Cybersecurity, Graph Theory, Machine Learning, Feature EngineeringAbstract
Although graph-based feature engineering has become widely used in network intrusion detection systems (NIDS), there is a severe lack of empirical research on determining whether the addition of network topology features results in a real positive improvement over the operation of the system or it simply adds complexity to the system. Our paper gets into this gap by critically assessing the performance of graph-based methods as compared to conventional statistical features via systematic comparative analysis across several machine learning paradigms. Using the UNSW-NB15 dataset, we employed a graph-theoretic characteristics that included measures of centrality, the community structure identification and the topological analysis, which were compared to traditional traffic-based characteristics. Results revealed a counterintuitive finding where incorporating graph features consistently degraded detection performance across all algorithms, with statistically significant accuracy reductions observed in multiple classifiers. Random Forest experienced modest performance decline, while Support Vector Machines and RBF Networks showed more substantial degradation. Computational analysis also demonstrated that graph feature extraction imposed substantial overhead compared to traditional feature computation, representing approximately nineteen-fold increase in processing time.
References
[1] D. E. Denning, "An intrusion-detection model," IEEE Transactions on Software Engineering, vol. SE-13, no. 2, pp. 222-232, Feb. 1987. http://dx.doi.org/10.1109/TSE.1987.232894
[2] T. F. Lunt, "A survey of intrusion detection techniques," Computers & Security, vol. 12, no. 4, pp. 405-418, 1993. http://dx.doi.org/10.1016/0167-4048(93)90029-5
[3] M. Iliofotou, P. Pappu, M. Faloutsos, M. Mitzenmacher, S. Singh, and G. Varghese, "Network monitoring using traffic dispersion graphs (TDGs)," in Proc. ACM SIGCOMM Internet Measurement Conference, 2007, pp. 315-320. http://dx.doi.org/10.1145/1298306.1298349
[4] H. Zhao, M. Xu, N. Zheng, J. Yao, and Q. Ho, "Malicious executables classification based on behavioral factor analysis," in Proc. IEEE Int. Conf. Machine Learning and Applications, 2010, pp. 502-507. http://dx.doi.org/10.1109/IC4E.2010.78
[5] W. Lo, S. Layeghy, M. Sarhan, M. Gallagher, and M. Portmann, "E-GraphSAINT: Edge-sampling-based efficient graph neural network for intrusion detection," IEEE Trans. Network and Service Management, vol. 19, no. 4, pp. 4493-4505, 2022. http://dx.doi.org/10.48550/arXiv.2103.16329
[6] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, "Toward developing a systematic approach to generate benchmark datasets for intrusion detection," Computers & Security, vol. 31, no. 3, pp. 357-374, 2012. http://dx.doi.org/10.1016/j.cose.2011.12.012
[7] C. Kruegel, D. Mutz, W. Robertson, and F. Valeur, "Bayesian event classification for intrusion detection," in Proc. 19th Annual Computer Security Applications Conference, 2003, pp. 14-23. http://dx.doi.org/10.1109/CSAC.2003.1254306
[8] S. Axelsson, "Intrusion detection systems: A survey and taxonomy," Department of Computer Engineering, Chalmers University of Technology, Tech. Rep. 99-15, 2000. Available at: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=7a15948bdcb530e2c1deedd8d22dd9b54788a634
[9] H. Debar, M. Dacier, and A. Wespi, "Towards a taxonomy of intrusion-detection systems," Computer Networks, vol. 31, no. 8, pp. 805-822, 1999. http://dx.doi.org/10.1016/S1389-1286(98)00017-6
[10] S. Hettich and S. D. Bay, "The UCI KDD archive," University of California, Irvine, 1999. [Online]. Available: http://kdd.ics.uci.edu
[11] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, "A detailed analysis of the KDD CUP 99 data set," in Proc. IEEE Symp. Computational Intelligence for Security and Defense Applications, 2009, pp. 1-6. http://dx.doi.org/10.1109/CISDA.2009.5356528
[12] N. Moustafa and J. Slay, "UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)," in Proc. Military Communications and Information Systems Conference, 2015, pp. 1-6.
[13] R. Pastor-Satorras and A. Vespignani, "Evolution and structure of the Internet: A statistical physics approach," Cambridge University Press, 2004. http://dx.doi.org/10.1017/CBO9780511610905
[14] M. Iliofotou, M. Faloutsos, and M. Mitzenmacher, "Exploiting dynamicity in graph-based traffic analysis: Techniques and applications," in Proc. ACM CoNEXT, 2009, pp. 241-252. http://dx.doi.org/10.1145/1658939.1658967
[15] H. Zhao, M. Xu, N. Zheng, J. Yao, and Q. Ho, "Behavioral analysis of network traffic for malware detection," Computer Communications, vol. 33, no. 10, pp. 1162-1170, 2010.
[16] N. Moustafa and J. Slay, "The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set," Information Security Journal: A Global Perspective, vol. 25, no. 1-3, pp. 18-31, 2016. http://dx.doi.org/10.1080/19393555.2015.1125974
[17] W. Lee and S. J. Stolfo, "Data mining approaches for intrusion detection," in Proc. USENIX Security Symposium, 1998, pp. 79-94.
[18] L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001. http://dx.doi.org/10.1023/A:1010933404324
[19] J. H. Friedman, "Greedy function approximation: A gradient boosting machine," Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001. http://dx.doi.org/10.1214/aos/1013203451
[20] B. Schölkopf and A. J. Smola, "Learning with kernels: Support vector machines, regularization, optimization, and beyond," MIT Press, 2002. http://dx.doi.org/10.7551/mitpress/4175.001.0001
[21] S. Haykin, "Neural networks: A comprehensive foundation," 2nd ed., Prentice Hall, 1999.
[22] M. J. D. Powell, "Radial basis functions for multivariable interpolation: A review," in Algorithms for Approximation, Oxford University Press, 1987, pp. 143-167.
[23] V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, "Fast unfolding of communities in large networks," Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008. http://dx.doi.org/10.1088/1742-5468/2008/10/P10008
Published
Issue
Section
License
Copyright (c) 2026 Richard Nwachukwu

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This journal is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).
Authors retain copyright and grant the journal the right of first publication.
The work may be shared and adapted, even for commercial purposes, as long as appropriate credit is given and any new creations are licensed under the identical terms.









