posted on 2024-07-12, 16:39authored byThuy T. T. Nguyen, Grenville Armitage
Literature on the use of Machine Learning (ML) algorithms for classifying IP traffic has relied on bi-directional full-flow statistics while assuming that flows have explicit directionality implied by the first packet captured or the Client-to-Server direction. In contrast, many real-world classifiers may miss an arbitrary number of packets from the start of a flow, and be unsure in which direction the flow started. This would lead to degradation in classification performance for application with asymmetric traffic characteristics. We propose a novel approach to train the ML classifier using statistical features calculated over multiple short sub-flows extracted from full-flow generated by the target application and their mirror-imaged replicas as if the flow is in the reverse direction. We demonstrate our optimisation when applied to the Naive Bayes and Decision Tree algorithms. Our approach results in excellent performance even when classification is initiated mid-way through a flow, without prior knowledge of the flow’s direction and using windows as small as 25 packets long.