Efficient management of supercomputing facilities requires estimates of future workload based on past user behaviour. For supercomputers with large numbers of users, aggregate user behaviour is commonly assumed to be best in prediction of future workloads, however for systems with smaller numbers of users the question arises as to whether it is still suitable or if benefits can be derived from monitoring individual user behaviour to predict future workload. We compare using individual user behaviour, aggregate user behaviour and a hybrid approach where we track heavy users individually and cluster aggregate light users into a small number of clusters. We find that the hybrid approach produces the best results in both mean absolute error and mean squared error. However, treating all users separately provides slightly worse predictions. We also introduce a new approach to prediction based on the hazard function which is a significant improvement on previously used schemes based on autoregressive models. The schemes are investigated numerically using a two-year workload trace from a supercomputer with a population of 136 users.
Funding
Increasing internet energy and cost efficiency by improving higher-layer protocols