Literature on the use of machine learning (ML) algorithms for classifying IP traffic has demonstrated potential to be deployed in real-world IP networks. The key challenges of timely and continuous classification are addressed in [1], in which multiple short sub-flows taken at different points within the original application's flow lifetime are used to train the classifier. The classification decision process is repeated continuously using a sliding window of the flow's most recent N packets. The work left a critical question of how to automate the identification of appropriate sub-flows for training. In this paper we propose a novel approach for sub-flows identification and selection using ML clustering algorithms. We evaluate our approach using accuracy, model build time, classification speed and physical resource consumption metrics.