Swinburne
Browse

Ontology-based constrained anonymization for domain-driven data mining outsourcing

Download (1016.06 kB)
thesis
posted on 2024-07-13, 04:12 authored by Brian LohBrian Loh
Introduction. This thesis focuses on the data mining outsourcing scenario whereby a data owner publishes data to an application service provider who returns mining results. To ensure data privacy against an un-trusted party, protection techniques are required. Anonymization, a widely used method provides the benefit of preserving true attribute values as well as the capability of supporting various data mining algorithms. Although this is so, several issues emerge when anonymization is applied in a real world outsourcing scenario. Most methods have focused on the traditional data mining paradigm, therefore they do not implement domain knowledge nor optimize data for domain-driven purposes. Furthermore, existing techniques limit users' control while assuming their natural capability of producing Domain Generalization Hierarchies (DGH). Moreover, previous utility metrics have not considered attribute correlations during generalization. Objective. The research objective is to create an ontology-based constrained anonymization framework which aims to preserve meaningful and actionable models for domain-driven data mining while protecting privacy. Framework. In contrast with existing works, this framework integrates the Unified Medical Language Systems (UMLS) as a form of domain ontology knowledge during DGH creation to preserve value meanings. Furthermore, it allows for user constraints based on attribute semantic types and relations to suit physician mining tasks. Also, attribute correlations are determined with external domain knowledge in the form of MEDLINE literatures to improve attribute selection during anonymization. Results. Experiments show that ontology-based DGHs manage to preserve semantic meaning after attribute generalization. Additionally, by setting constraints, important attributes for specific mining tasks can be preserved. Finally, utilizing a correlation-based measure can improve attribute selection during anonymization for domain-driven purposes. Conclusion. There is an urgent need for privacy preserving methods capable of anonymizing data for domain-driven usage. The proposed framework proves the benefit of integrating domain ontology knowledge and external literatures in improving utility for domain-driven purposes. Therefore, it is expected that by utilizing such a framework, data owners can protect data while maintaining utility for real world requirements.

History

Thesis type

  • Thesis (Masters by research)

Thesis note

Submitted for the degree of Master of Science, Swinburne University of Technology, 2012.

Copyright statement

Copyright © 2012 Brian Loh Chung Shiong.

Supervisors

Patrick Then

Language

eng

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC