posted on 2024-07-13, 01:34authored bySebastian Zander, Nigel Williams, Grenville Armitage
Public traffic traces are often obfuscated for privacy reasons, leaving network historians with only port numbers from which to identify past application traffic trends. However, it is misleading to make assumptions based on default port numbers for many applications (such as peer-to-peer file sharing or online games). Traffic classification based on machine learning could provide a solution. By training a classifier using representative traffic samples, we can classify (and differentiate between) distinct, but possibly similar, applications of interest in previously anonymised trace files. Using popular peer-to-peer and online game applications as examples, we show that their traffic flows can be separated after-the-fact without using port numbers or packet payload. We also address how to obtain negative training examples, propose an approach that works with any existing supervised machine-learning algorithm, and present a preliminary evaluation based on real traffic data.