Swinburne
Browse

A data placement strategy in scientific cloud workflows

Download (560.22 kB)
journal contribution
posted on 2024-07-09, 17:01 authored by Dong Yuan, Yun YangYun Yang, Xiao Liu, Jinjun ChenJinjun Chen
In scientific cloud workflows, large amounts of application data need to be stored in distributed data centres. To effectively store these data, a data manager must intelligently select data centres in which these data will reside. This is, however, not the case for data which must have a fixed location. When one task needs several datasets located in different data centres, the movement of large volumes of data becomes a challenge. In this paper, we propose a matrix based k-means clustering strategy for data placement in scientific cloud workflows. The strategy contains two algorithms that group the existing datasets in k data centres during the workflow build-time stage, and dynamically clusters newly generated datasets to the most appropriate data centres-based on dependencies-during the runtime stage. Simulations show that our algorithm can effectively reduce data movement during the workflow's execution.

Funding

ARC | LP0990393

History

Available versions

PDF (Accepted manuscript)

ISSN

0167-739X

Journal title

Future Generation Computer Systems

Volume

26

Issue

8

Pagination

1200-1214

Publisher

Elsevier

Copyright statement

Copyright © 2010 Elsevier B.V. The accepted manuscript is reproduced in accordance with the copyright policy of the publisher.

Language

eng

Usage metrics

    Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC