Web network information plays an important role in people's daily lives. Visualizing this kind of information is difficult due to the enormous amount of inter-related data distributed across the whole network. Graph visualization is a suitable approach for representing web network relationships. However, this approach creates some challenges in practice, such as large graph layout, and 'noisy' information removal. Filtering and clustering hold promise as being perceptually rich and efficient ways to remove unrelated information and reduce the size of the graph, but little is known about their contributions in web network visualization, especially considering the contents of web information. This research mainly focuses on the investigation of analyzing the contents of web information as well as the web network structures, and using these analysis to form an effective graph layout for the user's interactions and navigation. An approach is proposed for improving the web network information visualization, which is grounded on the structure and content-based clustering, along with the detailed explanations of each stage - from web content crawling, clustering analysis, to graph layout. In order to remove the 'noisy' information, a web content crawler is developed by integrating filtering. An interactive filtering is introduced to help users to locate information from the relationships point of view by utilizing edge weight. Three clustering analyzers for clustering web network information are developed to reduce the complexity of the generated graph. The structure and content-based analyzer considers the information from both the structure aspect and the content aspect. It gives a general view for network information, in which the nodes represent URLs in the visualization. The FCA (Formal Concept Analysis) analyzer tries to find all natural clusters by using the concept lattice. While the ontology-based analysis makes use of the created ontology network to organize information from semantic view. These two analyzers show a keyword-based view for blog/social network information by assigning the nodes with keywords. To demonstrate the above proposed methods, some experiments and case studies have been carried out. The results have shown that using these approaches could improve the performance of putting similar pages into the same group, and make the graph visualization for web network information effectively.
History
Thesis type
Thesis (PhD)
Thesis note
A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, Swinburne University of Technology, 2011.