Limited random walk algorithm for big graph data clustering

Honglei Zhang, Jenni Raitoharju, Serkan Kiranyaz, Moncef Gabbouj

    Research output: Contribution to journalArticleScientificpeer-review

    18 Citations (Scopus)
    119 Downloads (Pure)

    Abstract

    Graph clustering is an important technique to understand the relationships between the vertices in a big graph. In this paper, we propose a novel random-walk-based graph clustering method. The proposed method restricts the reach of the walking agent using an inflation function and a normalization function. We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems. Previous random-walk-based algorithms depend on the chosen fitness function to find the clusters around a seed vertex. The proposed algorithm tackles the problem in an entirely different manner. We use the limited random walk procedure to find attractor vertices in a graph and use them as features to cluster the vertices. According to the experimental results on the simulated graph data and the real-world big graph data, the proposed method is superior to the state-of-the-art methods in solving graph clustering problems. Since the proposed method uses the embarrassingly parallel paradigm, it can be efficiently implemented and embedded in any parallel computing environment such as a MapReduce framework. Given enough computing resources, we are capable of clustering graphs with millions of vertices and hundreds millions of edges in a reasonable time.
    Original languageEnglish
    Pages (from-to)26
    Number of pages1
    JournalJournal of Big Data
    Volume3
    Issue number1
    DOIs
    Publication statusPublished - 2016
    Publication typeA1 Journal article-refereed

    Publication forum classification

    • Publication forum level 1

    Fingerprint

    Dive into the research topics of 'Limited random walk algorithm for big graph data clustering'. Together they form a unique fingerprint.

    Cite this