On the distribution of isometric log-ratio coordinates under extra-multinomial count data

  • Noora Kartiosuo*
  • , Joni Virta
  • , Jaakko Nevalainen
  • , Olli Raitakari
  • , Kari Auranen
  • *Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

1 Citation (Scopus)
12 Downloads (Pure)

Abstract

Compositional data can be mapped from the simplex to the Euclidean space through the isometric log-ratio (ilr) transformation. When the underlying counts follow a multinomial distribution, the distribution of the ensuing ilr coordinates has been shown to be asymptotically multivariate normal. We derive conditions under which the asymptotic normality of the ilr coordinates holds under a compound multinomial distribution inducing overdispersion in the counts. We derive a normal approximation and investigate its practical applicability under extra-multinomial variation using a simulation study under the Dirichlet-multinomial distribution. The approximation works well, except with a small total count or high amount of overdispersion. Our work is motivated by microbiome data, which exhibit extra-multinomial variation and are increasingly treated as compositions. We conclude that if empirical data analysis relies on the normality of ilr coordinates, it may be advisable to choose a taxonomic level with less sparsity so that the distribution of taxon-specific class probabilities remains unimodal.

Original languageEnglish
Article number113
JournalStatistical Papers
Volume66
Issue number5
DOIs
Publication statusPublished - Aug 2025
Publication typeA1 Journal article-refereed

Keywords

  • 62E20
  • Asymptotic approximation
  • Compositional data analysis
  • Dirichlet-multinomial
  • Isometric log-ratio transformation
  • Sequencing count data

Publication forum classification

  • Publication forum level 1

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'On the distribution of isometric log-ratio coordinates under extra-multinomial count data'. Together they form a unique fingerprint.

Cite this