Swinburne
Browse

Mining functionally novel association rules: a domain-driven data mining approach using biomedical literature and ontology

Download (2.04 MB)
thesis
posted on 2024-07-13, 01:49 authored by Yakub Sebastian
Introduction. Good medical hypotheses are the key to most medical breakthroughs. Association rules produced by medical data mining systems may lead to the formulation of valid medical hypotheses. An important requirement is that these rules must be novel. However, evaluating rule novelty based on the traditional pairwise approach has always been challenging. This study introduces a new non-pairwise novelty criterion termed the functional novelty. Objective. To develop a new knowledge discovery framework and techniques for discovering functionally novel association rules from cardiovascular data sets. Methods. Association rules were mined from two cardiovascular data sets using the FP-Growth algorithm with sufficiently low minimum support and confidence thresholds. The rules were semantically filtered against semantic relations defined in Unified Medical Language System ontology. By applying a modified χ2-based correlation measure called χ2 the filtered rules were then validated against lit, Pubmed literature to determine their compliance with the existing medical domain knowledge. Only the domain knowledge-compliant rules were selected for the final rule post-processing. In the conducted experiments, a cardiologist prescribed four pairs of medical hypotheses. The functional novelty of each association rule was determined based on its likelihood in mediating these hypotheses using the Minχ2 score. Results. KELAM, a novel domain-driven knowledge discovery framework was constructed. Two experiments were conducted with the cardiologist’s evaluation results as the gold standard. In the Experiment I, χ2 lit exhibited a high recall rate for domain knowledge-compliant rules even though it suffered from a low precision. In the Experiment II, Minχ2 proved to be useful for ranking the rule functional novelty because all candidate functionally novel rules were found among the top-10 rules. One interesting result showed that KELAM suggested a potential relationship between von willebrand factor and intracardiac thrombus via the association rule diabetes mellitus⇔coronary arteriosclerosis. Conclusion. This thesis successfully produced an effective domain-driven knowledge discovery framework for discovering the functionally novel rules by combining the domain knowledge from the medical literature, the ontology, and user-defined hypotheses. The proposed post-mining evaluation technique and measures proved to be useful in predicting candidate functionally novel rules as shown in experiments validated by a cardiologist. The outcome of this work is expected to become the first step towards medical knowledge discovery systems that can effectively aid medical researchers in rapidly testing and validating the potential medical hypotheses at the initial stage of a medical discovery endeavour.

History

Thesis type

  • Thesis (Masters by research)

Thesis note

A thesis submitted for the degree of Masters of Science by Research, Swinburne University of Technology, 2012.

Copyright statement

Copyright © 2012 Yakub Sebastian.

Supervisors

Patrick Then Hang Hui

Language

eng

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC