Swinburne
Browse
- No file added yet -

Efficient top-k search across heterogeneous XML data sources

Download (212.24 kB)
conference contribution
posted on 2024-07-09, 16:07 authored by Jianxin Li, Chengfei LiuChengfei Liu, Jeffrey Xu Yu, Rui ZhouRui Zhou
An important issue arising from XML query relaxation is how to efficiently search the top-k best answers from a large number of XML data sources, while minimizing the searching cost, i.e., finding the k matches with the highest computed scores by only traversing part of the documents. This paper resolves this issue by proposing a bound-threshold based scheduling strategy. It can answer a top-k XML query as early as possible by dynamically scheduling the query over XML documents. In this work, the total amount of documents that need to be visited can be greatly reduced by skipping those documents that will not produce the desired results with the bound-threshold strategy. Furthermore, most of the candidates in each visited document can also be pruned based on the intermediate results. Most importantly, the partial results can be output immediately during the query execution, rather than waiting for the end of all results to be determined. Our experimental results show that our query scheduling and processing strategies are both practical and efficient.

Funding

ARC | DP0559202

History

Available versions

PDF (Accepted manuscript)

ISBN

3540785671

ISSN

1611-3349

Journal title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

4947 LNCS

Issue

PART 1

Pagination

314-329

Publisher

Springer

Copyright statement

Copyright © 2008 Springer-Verlag Berlin Heidelberg. The accepted manuscript is reproduced in accordance with the copyright policy of the publisher. The definitive version is available at www.springer.com.

Language

eng

Usage metrics

    Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC