Swinburne
Browse
- No file added yet -

A positional keyword-based approach to inferring fine-grained message formats

Download (1.09 MB)
journal contribution
posted on 2024-07-26, 14:52 authored by Jiaojiao Jiang, Steve Versteeg, Jun HanJun Han, M. D.Arafat Hossain, Jean-Guy Schneider
Message format extraction, the process of revealing the message syntax without access to the protocol specification, is important for a variety of applications such as service virtualization and network security. In this paper, we propose P-token, which mines fine-grained message formats from network traces. The novelty of our approach is twofold: a ‘positional keyword’ identification technique and a two-level hierarchical clustering strategy. Positional keywords are based on the insight that keywords or reserved words usually occur at relatively fixed positions in the messages. By associating positions as meta-information with keywords, we can more accurately distinguish keywords from message payload data. After identification, the positional keywords are used as features to cluster the messages using density peaks clustering. We then perform another level of clustering to refine the clusters with low homogeneity. Finally, the message format of each cluster is extracted based on the observed ordering of keywords. P-token improves on the current state-of-the-art techniques by successfully addressing two challenges that commonly afflict existing keyword based format extraction methods: message keyword mis-identification and message format over-generalization. We have conducted experiments on services and applications using various protocols, including SOAP, LDAP, IMS and a RESTful service. Our experimental results show that P-token outperforms existing methods in extracting message formats.

Funding

Virtual Environments for Improved Enterprise Software Deployment

Australian Research Council

Find out more...

History

Available versions

PDF (Accepted manuscript)

ISSN

0167-739X

Journal title

Future Generation Computer Systems

Volume

102

Pagination

12 pp

Publisher

Elsevier BV

Copyright statement

Copyright © 2019 Elsevier B.V. All rights reserved. Per publisher policy, the author's final accepted manuscript is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. https://creativecommons.org/licenses/by-nc-nd/4.0/

Language

eng

Usage metrics

    Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC