Swinburne
Browse

Quasi-SLCA based keyword query processing over probabilistic XML data

Download (553.73 kB)
journal contribution
posted on 2024-07-26, 13:48 authored by Jianxin Li, Chengfei LiuChengfei Liu, Rui ZhouRui Zhou, Jeffrey Xu Yu
The probabilistic threshold query is one of the most common queries in uncertain databases, where a result satisfying the query must be also with probability meeting the threshold requirement. In this paper, we investigate probabilistic threshold keyword queries (PrTKQ)over XML data, which is not studied before. We first introduce the notion of quasi-SLCA and use it to represent results for a PrTKQ with the consideration of possible world semantics. Then we design a probabilistic inverted (PI)index that can be used to quickly return the qualified answers and filter out the unqualified ones based on our proposed lower/upper bounds. After that, we propose two efficient and comparable algorithms: Baseline Algorithm and PI index-based Algorithm. To accelerate the performance of algorithms, we also utilize probability density function. An empirical study using real and synthetic data sets has verified the effectiveness and the efficiency of our approaches.

Funding

Effective and efficient keyword search for relevant entities over Extensible Markup Language (XML) data

Australian Research Council

Find out more...

On effectively modelling and efficiently discovering communities from large networks

Australian Research Council

Find out more...

History

Available versions

PDF (Accepted manuscript)

ISSN

1041-4347

Journal title

IEEE Transactions on Knowledge and Data Engineering

Volume

26

Issue

4

Pagination

12 pp

Publisher

Institute of Electrical and Electronics Engineers

Copyright statement

Copyright © 2013 IEEE. The accepted manuscript is reproduced in accordance with the copy right policy of the publisher. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copy righted component of this work in other works.

Language

eng

Usage metrics

    Publications

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC