CGStream: Continuous correlated graph query for data streams

Abstract

In this paper, we propose to query correlated graph in a data stream scenario, where given a query graph q an algorithm is required to retrieve all the subgraphs whose Pearson’s correlation coefficients with q are greater than a threshold θ over some graph data flowing in a stream fashion. Due to the dynamic changing nature of the stream data and the inherent complexity of the graph query process, treating graph streams as static datasets is computationally infeasible or ineffective. In the paper, we propose a novel algorithm, CGStream, to identify correlated graphs from data stream, by using a sliding window which covers a number of consecutive batches of stream data records. Our theme is to regard stream query as the traversing along a data stream and the query is achieved at a number of outlooks over the data stream. For each outlook, we derive a lower frequency bound to mine a set of frequent subgraph candidates, where the lower bound guarantees that no pattern is missing from the current outlook to the next outlook. On top of that, we derive an upper correlation bound and a heuristic rule to prune the candidate size, which helps reduce the computation cost at each outlook. Experimental results demonstrate that the proposed algorithm is several times, or even an order of magnitude, more efficient than the straightforward algorithm. Meanwhile, our algorithm achieves good performance in terms of query precision.

Publication
CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Shirui Pan
Shirui Pan
Professor | ARC Future Fellow

My research interests include data mining, machine learning, and graph analysis.