Swinburne
Browse
- No file added yet -

Top-k keyword search over probabilistic XML data

Download (272.68 kB)
conference contribution
posted on 2024-07-11, 07:14 authored by Jianxin Li, Chengfei LiuChengfei Liu, Rui ZhouRui Zhou, Wei Wang
Despite the proliferation of work on XML keyword query, it remains open to support keyword query over probabilistic XML data. Compared with traditional keyword search, it is far more expensive to answer a keyword query over probabilistic XML data due to the consideration of possible world semantics. In this paper, we firstly define the new problem of studying top-k keyword search over probabilistic XML data, which is to retrieve k SLCA results with the k highest probabilities of existence. And then we propose two efficient algorithms. The first algorithm PrStack can find k SLCA results with the k highest probabilities by scanning the relevant keyword nodes only once. To further improve the efficiency, we propose a second algorithm EagerTopK based on a set of pruning properties which can quickly prune unsatisfied SLCA candidates. Finally, we implement the two algorithms and compare their performance with analysis of extensive experimental results.

Funding

DP878405:ARC

Effective and efficient keyword search for relevant entities over Extensible Markup Language (XML) data

Australian Research Council

Find out more...

Effective and Efficient Video Search

Australian Research Council

Find out more...

Effective and Efficient Keyword Search in Relational Databases

Australian Research Council

Find out more...

History

Available versions

PDF (Accepted manuscript)

ISBN

9781424489596

ISSN

1084-4627

Conference name

IEEE International Conference on Data Engineering

Location

Hannover

Start date

2011-04-11

End date

2011-04-16

Pagination

11 pp

Publisher

IEEE

Copyright statement

Copyright © 2011 IEEE. The accepted manuscript is reproduced in accordance with the copyright policy of the publisher. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Language

eng

Usage metrics

    Publications

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC