posted on 2024-07-11, 07:21authored byDong Yuan, Xiao Liu, Lizhen Cui, Tiantian Zhang, Wenhao Li, Dahai Cao, Yun YangYun Yang
The proliferation of cloud computing allows scientists to deploy computation and data intensive applications without infrastructure investment, where large generated datasets can be flexibly stored with multiple cloud service providers. Due to the pay-as-you-go model, the total application cost largely depends on the usage of computation, storage and bandwidth resources, and cutting the cost of cloud-based data storage becomes a big concern for deploying scientific applications in the cloud. In this paper, we propose a novel algorithm that can automatically decide whether a generated dataset should be 1) stored in the current cloud, 2) deleted and re-generated whenever reused or 3) transferred to cheaper cloud service for storage. The algorithm finds the trade-off among computation, storage and bandwidth costs in the cloud, which are three key factors for the cost of storing generated application datasets with multiple cloud service providers. Simulations conducted with popular cloud service providers' pricing models show that the proposed algorithm is highly cost-effective to be utilised in the cloud.