Swinburne
Browse

Computing structural similarity of source XML schemas against domain XML schema

Download (194.64 kB)
conference contribution
posted on 2024-07-09, 22:06 authored by Jianxin Li, Chengfei LiuChengfei Liu, Jeffrey Xu Yu, Jixue Liu, Guoren Wang, Chi Yangt
In this paper, we study the problem of measuring structural similarities of large number of source schemas against a single domain schema, which is useful for enhancing the quality of searching and ranking big volume of source documents on the Web with the help of structural information. After analyzing the improperness of adopting existing edit-distance based methods, we propose a new similarity measure model that caters for the requirements of the problem. Given the asymmetric nature of the similarity comparisons of source schemas with a domain schema, similarity preserving rules and algorithm are designed to filter out uninteresting elements in source schemas for the purpose of optimizing the similarity computation. Based on the model, a basic algorithm and an improved algorithm are developed for structural similarity computation. The improved algorithm makes full use of a new coding scheme that is devised to reduce the number of comparisons. Complexities of both algorithms are analyzed and extensive experiments are conducted showing the significant performance gain achieved by the improved algorithm.

Funding

The Evolution of Structure in the Universe

Office of the Director

Find out more...

History

Available versions

PDF (Published version)

ISSN

1445-1336

Conference name

Conferences in Research and Practice in Information Technology Series

Volume

75

Issue

10

Pagination

9 pp

Publisher

Australian Computer Society

Copyright statement

Copyright © 2008 Australian Computer Society, Inc. This paper appeared at the Nineteenth Australasian Database Conference (ADC2008), Wollongong, Australia, January 2008. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 75, Alan Fekete and Xuemin Lin, Ed. Reproduction for academic, not-for profit purposes permitted provided this text is included. The published version is reproduced in accordance with this policy.

Language

eng

Usage metrics

    Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC