Swinburne
Browse

Matching top-k answers of twig patterns in probabilistic XML

Download (403.42 kB)
conference contribution
posted on 2024-07-11, 07:14 authored by Bo Ning, Chengfei LiuChengfei Liu, Jeffrey Xu Yu, Guoren Wang, Jianxin Li
The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. The top-k matching of a twig pattern against probabilistic XML data is essential. Some classical twig pattern algorithms can be adjusted to process the probabilistic XML. However, as far as finding answers of the top-k probabilities is concerned, the existing algorithms suffer in performance, because many unnecessary intermediate path results, with small probabilities, need to be processed. To cope with this problem, we propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.

Funding

XML Views of Relational Databases: Semantics and Update Problems

Australian Research Council

Find out more...

History

Available versions

PDF (Accepted manuscript)

ISBN

9783642120251

ISSN

0302-9743

Journal title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

5981 LNCS

Issue

PART 1

Pagination

14 pp

Publisher

Springer

Copyright statement

Copyright © 2010 Springer-Verlag Berlin Heidelberg. The accepted manuscript is reproduced in accordance with the copyright policy of the publisher. The definitive version of the publication is available at www.springer.com.

Language

eng

Usage metrics

    Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC