Swinburne
Browse

A query system for XML data streams and its buffer reduction based on semantics

Download (756.5 kB)
thesis
posted on 2024-07-12, 23:58 authored by Chi Yang
The XML data sources are of marvellous diversity. Available XML data ranges from small Web pages to ever-growing applications, such as biological data, astronomical data, commercial data, and even to rapidly changing and possibly unbounded streams which are often encountered in Web data integration and online publish-subscribe systems. The ubiquity of XML data stream animates the importance to develop XML data stream query systems theoretically and practically. Currently, queries on XML data stream is still an active research area. The last years witnessed the work and contribution from both practitioners and theoreticians towards effective query evaluation against XML data streams. Due to the unique features of XML data stream model, several aspects of data management need reconsideration, such as buffer management, query optimization and block operation processing. With respect to current methods for query evaluation over XML data stream, adoption of certain types of buffering techniques is inevitable. Under lots of circumstances, the buffer size may increase exponentially, which can cause memory bottleneck. Some theoretic research proving a concurrency lower bound to describe this type of bottleneck has been given. Some optimization techniques have been proposed to solve the problem. However, the research on semantic query optimization (SQO) for XML data streams is still at its early stage. In particular, the application of semantic information to optimize the buffering usage during the query evaluation leaves lots of room for researchers to maneuver. The work reported in this thesis focuses on the study of query processing for XML data streams and effective buffer management by exploring semantic information. The first contribution is a SAX-based XML stream query evaluation system, which explores query optimization opportunities based on semantics to reduce the unnecessary buffer scale to a level less than the theoretic lower bounds. The second contribution is that we get some effective semantic rules according to our criterion for rule exploration. Algorithms before and after the application of the semantic query optimization rules are presented and compared. The architecture of our system which deploys semantic optimization technique according to predefined stream optimization rules is shown. The third contribution is the further discussion on the application of SQO rules for block operators and aggregation functions. Experiments are conducted to demonstrate the system performance gains after the deployment of SQO techniques. The experiment results show that the algorithms deploying semantic rules individually and collectively all significantly outperform the lower bound algorithm that does not consider semantic information.

History

Thesis type

  • Thesis (Masters by research)

Thesis note

A thesis submitted for the degree of Master by Research, Swinburne University of Technology, 2007.

Copyright statement

Copyright © 2007 Chi Yang.

Supervisors

Chengfei Liu

Language

eng

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC