Swinburne
Browse

A Novel Workflow-Level Data Placement Strategy for Data-Sharing Scientific Cloud Workflows

Download (1.43 MB)
journal contribution
posted on 2024-07-11, 13:38 authored by Xuejun Li, Lei Zhang, Yang Wu, Xiao Liu, Erzhou Zhu, Huikang Yi, Futian Wang, Cheng Zhang, Yun YangYun Yang
Cloud computing can provide a more cost-effective way to deploy scientific workflows than traditional distributed computing environments such as cluster and grid. Due to the large size of scientific datasets, data placement plays an important role in scientific cloud workflow systems for improving system performance and reducing data transfer cost. Traditional task-level data placement strategy only considers shared datasets within individual workflows to reduce data transfer cost. However, it is obvious that task-level strategy is not necessarily good enough for the situation of multiple workflows at the workflow level. In this paper, a novel workflow-level data placement model is constructed, which regards multiple workflows as a whole. Then, a two-stage data placement strategy is proposed which first pre-allocates initial datasets to proper datacenters during workflow build-time stage, and then dynamically distributes newly generated datasets to appropriate datacenters during runtime stage. Both stages use an efficient discrete particle swarm optimization algorithm to place flexible-location datasets. Comprehensive experiments demonstrate that our workflow-level data placement strategy can be more cost-effective than its task-level counterpart for data-sharing scientific cloud workflows.

Funding

ARC | LP0990393

ARC | LP130100324

History

Available versions

PDF (Accepted manuscript)

ISSN

1939-1374

Journal title

IEEE Transactions on Services Computing

Volume

12

Issue

3

Pagination

370-383

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Copyright statement

Copyright © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Language

eng

Usage metrics

    Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC