Swinburne
Browse

Instance knowledge network and its application to word sense disambiguation

Download (1.4 MB)
thesis
posted on 2024-07-13, 03:32 authored by Shangfeng Hu
Natural language processing has been a classical topic of computer science since the early stage of computer. In 1950, Alan Turing proposed the Turing test as a criterion of intelligence to test the ability of a computer program to impersonate a human in a real-time written conversation with a human judge on the basis of the conversational content alone between the program and a real human. In the past 60 years, many researchers have worked in this area and have made many important contributions. Unfortunately, until today this classical problem has not been solved well. This thesis addresses some fundamental issues of natural language processing. After careful analysis of the current state of the art in natural language processing, we find that several fundamental yet challenging problems still exist that hinder the advancement of natural language processing. In this thesis, we aim to make several initial attempts towards solving these challenging problems. One of the bottlenecks of current natural language processing research is that different knowledge sources are used for different tasks. To provide an integrated knowledge source for multiple natural language processing tasks and capture more knowledge, especially context sensitive semantic relationships between each pair of concepts, we propose a new knowledge representation model called instance knowledge network. In an instance knowledge network, both type level features and instance level context sensitive features can be modeled in a single knowledge representation model and processed using a single learning algorithm. The instance graph matching algorithm for the instance knowledge network is also developed. Word sense disambiguation is an important task for natural language processing. A serious problem with existing word sense disambiguation approaches is the precision they can achieve. This causes the 'Garbage in, garbage out' phenomenon. To deal with this issue, we propose a probabilistic word sense disambiguation approach based on the instance knowledge network model. A probabilistic training algorithm is proposed to the instance knowledge networks with context sensitive conditional probability value associated with a pair of concepts. An iterative probabilistic reasoning framework is developed for word sense disambiguation. The probabilistic result provides not only which sense is the proper sense of an ambiguous word, but also the confidence or the self-evaluation value for the precision of disambiguation. This self-evaluation functionality is important for the system to judge which parts can be understood in a high precision from a raw text corpus. We believe that the self-evaluation functionality can be used for the semi-supervised learning method in the future. Based on the Senseval-3 all-words task, we run extensive experiments to show the performance enhancements of our word sense disambiguation algorithm in different precision ranges. We also combine our word sense disambiguation algorithm with five best word sense disambiguation algorithms in senseval-3 all words tasks. The results show that the combined algorithms all outperform the corresponding algorithms. To demonstrate that an instance knowledge network can be served for different tasks of natural language processing and to further improve the performance of word sense disambiguation, we incorporate the coreference resolution results into word sense disambiguation. Our work shows that the results of coreference resolution can be used for enlarging the size of context in an instance knowledge network and the performance of word sense disambiguation can be improved accordingly. This work is the first attempt to integrate two main natural language processing tasks together and to make use of coreference resolution to help word sense disambiguation. In the future, we plan to incorporate more natural language processing tasks in a coherent process, with the help of our instance knowledge network.

History

Thesis type

  • Thesis (PhD)

Thesis note

A thesis submitted for the degree of Doctor of Philosophy, Swinburne University of Technology, 2011.

Copyright statement

Copyright © 2011 Shangfeng Hu.

Supervisors

Chengfei Liu

Language

eng

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC