Approximation of Conjunctive Query Evaluation
Approximation of Conjunctive Query Evaluation
Disciplines
Computer Sciences (80%); Mathematics (20%)
Keywords
-
Approximate Evaluation of Intractable Conjunctive,
Static Over-Approximation of Conjunctive Queries w,
Dynamic Approximation of Conjunctive Queries with,
Development of New Approximation Techniques for Co
Problems that cannot be solved by classical computers in reasonable time due to their high computational cost arise in many research areas. In general, the evaluation of conjunctive queries over relational databases belongs to those problems. Conjunctive queries form the core of the Structured Query Language (SQL) which became a de facto standard for querying and maintaining relational databases. This work is about developing new approximation techniques for conjunctive queries which cannot be evaluated in reasonable time. Our new approximation techniques should lead to significant improvements for data aided decision making, e.g., for early warning system which are based on the analysis of big data or to make business- critical decisions by analyzing big data. In the last decades, a very good understanding of the classes of conjunctive queries which can be evaluated in reasonable time has been gained and it has been proven that an under-approximation of a query always exists within each of those classes. However this approach is rather strict and some of the under-approximations can be rather uninformative, i.e., the under-approximation might return the empty result set while the original query would not. over-approximations might be helpful when this happens, as they return all answers to a query. One of our goals is to study the foundational aspects of over-approximations, including the existence problem and the problem of computing an approximation. Unfortunately, over-approximations do not always exist (within a class of queries which can be evaluated in reasonable time), and it is not even known to be decidable whether a conjunctive query admits an over- approximation. Therefore, another goal of the proposed work is the development of more liberal approximation techniques that yield some kind of quantitative guarantees. This means that they should guarantee that the result of the approximation is not too far from the result of the original query over a set of databases of interest. Therefore we need to define a measure of disagreement between queries and/or results. For conjunctive query evaluation, such measures do not exist up until now. Based on that measure, we study approximations whose disagreement with the result of the query they approximate is below a certain threshold. Furthermore, we investigate how the underlying data of a database can help us to find better approximations. It has been shown that there are close relations between the approximation of conjunctive queries over relational databases and some classes of Semantic Web queries over semi-structured data. We also study possible connections between our approximation techniques and approximating Semantic Web queries.
Supervised learning is a machine-learning technique that aims at learning a general model from input-output examples. The learned model can then be used to make decisions (in situations that never occurred before) based on some input data. A classifier is a function that partitions the input data, i.e., it makes the decisions. We investigated the case of classifying a set of constants, called entities, of a relational database into positive and negative cases. The learned classifier is based on a set of automatically generated database queries, called features. A training database is a databases with a partition of the entities into positive and negative examples. Using database queries allows us to utilize the entire knowledge of the raw data in the training database (i.e., all its relations). This knowledge is crucial to automatically generate the features. We consider various query languages, study the complexity of classifying the input examples and generating the feature queries, and give explicit algorithms for classification and feature generation. Moreover, we handle the case of approximate classification where the training database might contain a small amount of noise that makes an exact classification impossible. We identify tractable cases where classification and feature generation can be done efficiently. We show that, even though the feature generation problem is intractable for certain query lan- guages, the classification problem might be tractable (e.g., for conjunctive queries of a bounded generalized hypertree width), and we give an explicit algorithm for that case. This is possible because we do not need to materialize the feature queries in order to evaluate them. For all the identified tractable cases, we show that approximate classification is also tractable and give explicit algorithms.