Comparative evaluation of gene set analysis approaches for RNA-Seq data

Yasir Rahmatallah, Frank Emmert-Streib, Galina Glazko

    Research output: Contribution to journalArticleScientificpeer-review

    17 Citations (Scopus)


    Background: Over the last few years transcriptome sequencing (RNA-Seq) has almost completely taken over microarrays for high-throughput studies of gene expression. Currently, the most popular use of RNA-Seq is to identify genes which are differentially expressed between two or more conditions. Despite the importance of Gene Set Analysis (GSA) in the interpretation of the results from RNA-Seq experiments, the limitations of GSA methods developed for microarrays in the context of RNA-Seq data are not well understood. Results: We provide a thorough evaluation of popular multivariate and gene-level self-contained GSA approaches on simulated and real RNA-Seq data. The multivariate approach employs multivariate non-parametric tests combined with popular normalizations for RNA-Seq data. The gene-level approach utilizes univariate tests designed for the analysis of RNA-Seq data to find gene-specific -values and combines them into a pathway -value using classical statistical techniques. Our results demonstrate that the Type I error rate and the power of multivariate tests depend only on the test statistics and are insensitive to the different normalizations. In general standard multivariate GSA tests detect pathways that do not have any bias in terms of pathways size, percentage of differentially expressed genes, or average gene length in a pathway. In contrast the Type I error rate and the power of gene-level GSA tests are heavily affected by the methods for combining -values, and all aforementioned biases are present in detected pathways. Conclusions: Our result emphasizes the importance of using self-contained non-parametric multivariate tests for detecting differentially expressed pathways for RNA-Seq data and warns against applying gene-level GSA tests, especially because of their high level of Type I error rates for both, simulated and real data.

    Original languageEnglish
    Article number397
    JournalBMC Bioinformatics
    Issue number1
    Publication statusPublished - 5 Dec 2014
    Publication typeA1 Journal article-refereed

    ASJC Scopus subject areas

    • Applied Mathematics
    • Structural Biology
    • Biochemistry
    • Molecular Biology
    • Computer Science Applications


    Dive into the research topics of 'Comparative evaluation of gene set analysis approaches for RNA-Seq data'. Together they form a unique fingerprint.

    Cite this