The RDF (Resource Description Framework) data model has been developed over a decade. It is designed as flexible representation of schema-relaxable or even schema-free information for the Semantic Web. Today, RDF has received more and more attention. Semantic Web style ontologies and knowledge bases with millions of facts from Wikipedia and other sources have been created and are available online. At this moment, there are more than 19 billion RDF triples on the web. With the rapid increase of RDF data, query processing on RDF data is a very important issue to realize the semantic web vision. Recently, the W3C RDF data access group has emphasized the importance of enhancing RDF query abilities to meet the real requirements. This thesis concentrates on some important issues including selectivity estimation, query relaxation and query evaluation on probabilistic RDF data. Firstly, we study theproblem of estimating the selectivity for SPARQL graph queries, which is crucial to query optimization. For an arbitrary SPARQL query represented as a composite graph pattern, we propose algorithms for maximally combining the statistics of chain paths and starpaths that we have precomputed to estimate the overall selectivity of the graph pattern. The experiments validate the effectiveness of our methods. Secondly, a user is usually frustrated by no answers returned when he/she pose a query on a RDF database. For this purpose, we investigate how to relax a SPARQL query to obtain approximate answers. We address two problems in efficient query relaxation. To ensure the quality of answers, we compute the similarities of relaxed queries with regard to the original query and use them to score the potential relevant answers. To obtain top-k approximate answers, we develop two efficient algorithms and conduct experiments to evaluate the efficiency of them. Thirdly, since there are a large number of RDF triples with probabilities, we study the problem of query evaluation for SPARQL queries on probabilistic RDF data. A general framework for supporting SPARQL queries on the probabilistic RDF database is presented. To enable query answering with some basic reasoning capabilities, we also consider transitive inference capability for RDF queries and propose an approximate algorithm for accelerating the query evaluation. The experimental results show that our method is effective and efficient.
History
Thesis type
Thesis (PhD)
Thesis note
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, Swinburne University of Technology, 2011.