Swinburne
Browse

A hybrid algorithm for finding top-k twig answers in probabilistic XML

Download (466.35 kB)
conference contribution
posted on 2024-07-11, 07:14 authored by Bo Ning, Chengfei LiuChengfei Liu
Uncertainty is inherently ubiquitous in data of real applications, and those uncertain data can be naturally represented by the XML. Matching twig pattern against XML data is a core problem, and on the background of probabilistic XML, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic values are useless to the users, and the users only want to get the answers with the largest k probabilistic values. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. To cope with this problem, we propose a hybrid algorithm which takes both the probability value constraint and structural relationship constraint into account. The main idea of the algorithm is that the element with larger path probability value will more likely contribute to the twig answers with larger twig probability values, and at the same time lots of useless answers that do not satisfy the structural constraint can be filtered. Therefore the proposed algorithm can avoid lots of intermediate results, and find the top-k answers quickly. Experiments have been conducted to study the performance of the algorithm.

Funding

DP878405:ARC

Effective and efficient keyword search for relevant entities over Extensible Markup Language (XML) data

Australian Research Council

Find out more...

History

Available versions

PDF (Accepted manuscript)

ISBN

9783642201486

ISSN

0302-9743

Journal title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Conference name

Database Systems for Advanced Applications (DASFAA 2011)

Location

Hong Kong

Start date

2011-04-22

End date

2011-04-25

Volume

6587 LNCS

Issue

PART 1

Pagination

14 pp

Publisher

Springer

Copyright statement

Copyright © 2011 Springer-Verlag Berlin Heidelberg. The accepted manuscript is reproduced in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com.

Language

eng

Usage metrics

    Publications

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC