Swinburne
Browse

An adaptive approach of approximate substring matching

Download (597.11 kB)
conference contribution
posted on 2024-07-09, 16:21 authored by Jiaying Wang, Xiaochun Yang, Bin Wang, Chengfei LiuChengfei Liu
Approximate substring matching is a common problem in many applications. In this paper, we study approximate substring matching with edit distance constraints. Existing methods are very sensitive to query strings or query parameters like query length and edit distance. To address the problem, we propose a new approach using partition scheme. It first partitions a query into several segments, and finds matching substrings of these segments as candidates, then performs a bidirectional verification on these candidates to get final results. We devise an even partition scheme to efficiently find candidates, and a best partition scheme to find high quality candidates. Furthermore, through theoretical analysis, we find that the best partition scheme cannot always outperform the even partition scheme. Thus we propose an adaptive approach for selectively choosing scheme using statistic knowledge. We conduct comprehensive experiments to demonstrate the efficiency and quality of our proposed method.

Funding

ARC | DP140103499

History

Available versions

PDF (Accepted manuscript)

ISBN

9783319320243

ISSN

1611-3349

Journal title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Conference name

21st International Conference on Database Systems for Advanced Applications (DASFAA 2016)

Location

Dallas

Start date

2016-04-16

End date

2016-04-19

Volume

9642

Issue

3

Pagination

501-516

Publisher

Springer Nature

Copyright statement

Copyright © Springer International Publishing Switzerland 2016. The accepted manuscript is reproduced in accordance with the copyright policy of the publisher. The final version of the publication is available at Springer via https://doi.org/10.1007/978-3-319-32025-0_31.

Language

eng

Usage metrics

    Publications

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC