Swinburne
Browse
- No file added yet -

CITPM: A cluster-based iterative topical phrase mining framework

Download (534.88 kB)
conference contribution
posted on 2024-07-11, 07:19 authored by Bing Li, Bin Wang, Rui ZhouRui Zhou, Xiaochun Yang, Chengfei LiuChengfei Liu
A phrase is a natural, meaningful, essential semantic unit. In topic modeling, visualizing phrases for individual topics is an effective way to explore and understand unstructured text corpora. Unfortunately, existing approaches predominately rely on the general distributional features between topics and phrases on an entire corpus, while ignore the impact of domain-level topical distribution. This often leads to losing domain-specific terminologies, and as a consequence, weakens the coherence of topical phrases. In this paper, we present a novel framework CITPM for topical phrase mining. Our framework views a corpus as a mixture of clusters (domains), and each cluster is characterized by documents sharing similar topical distributions. The CITPM framework iteratively performs phrase mining, topical inferring and cluster updating until a satisfactory final result is obtained. The empirical verification demonstrates our framework outperforms state-of-the-art works in both aspects of interpretability and efficiency.

Funding

ARC | DP140103499

ARC | DP160102412

History

Available versions

PDF (Accepted manuscript)

ISBN

9783319320243

ISSN

1611-3349

Journal title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Conference name

21st International Conference on Database Systems for Advanced Applications (DASFAA 2016)

Location

Dallas

Start date

2016-04-16

End date

2016-04-19

Volume

9642

Pagination

197-213

Publisher

Springer Nature

Copyright statement

Copyright © Springer International Publishing Switzerland 2016. The accepted manuscript is reproduced in accordance with the copyright policy of the publisher. The final version of the publication is available at Springer via https://doi.org/10.1007/978-3-319-32025-0_13.

Language

eng

Usage metrics

    Publications

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC