Swinburne
Browse

A cost-effective strategy for intermediate data storage in scientific cloud workflow systems

Download (790.5 kB)
conference contribution
posted on 2024-07-09, 17:00 authored by Dong Yuan, Yun YangYun Yang, Xiao Liu, Jinjun ChenJinjun Chen
Many scientific workflows are data intensive where a large volume of intermediate data is generated during their execution. Some valuable intermediate data need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science on cloud has become popular nowadays, more intermediate data can be stored in scientific cloud workflows based on a pay-for-use model. In this paper, we build an Intermediate data Dependency Graph (IDG) from the data provenances in scientific workflows. Based on the IDG, we develop a novel intermediate data storage strategy that can reduce the cost of the scientific cloud workflow system by automatically storing the most appropriate intermediate datasets in the cloud storage. We utilise Amazon's cost model and apply the strategy to an astrophysics pulsar searching scientific workflow for evaluation. The results show that our strategy can reduce the overall cost of scientific cloud workflow execution significantly.

Funding

Management of Large-Scale Models

Directorate for Computer & Information Science & Engineering

Find out more...

History

Available versions

PDF (Published version)

ISBN

9781424464425

Journal title

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Conference name

2010 IEEE International Symposium on Parallel & Distributed Processing IPDPS

Volume

24

Issue

9

Pagination

11 pp

Publisher

IEEE

Copyright statement

Copyright © 2010 IEEE. The published version is reproduced in accordance with the copyright policy of the publisher. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Language

eng

Usage metrics

    Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC