Long-range correlations and burstiness in written texts: Universal and language-specific aspects

Vassilios Constantoudis, Maria Kalimeri, Fotis Diakonos, Konstantinos Karamanos, Constantinos Papadimitriou, Manolis Chatzigeorgiou, Harris Papageorgiou

    Research output: Contribution to journalArticleScientificpeer-review

    3 Citations (Scopus)

    Abstract

    Recently, methods from the statistical physics of complex systems have been applied successfully to identify universal features in the long-range correlations (LRCs) of written texts. However, in real texts, these universal features are being intermingled with language-specific influences. This paper aims at the characterization and further understanding of the interplay between universal and language-specific effects on the LRCs in texts. To this end, we apply the language-sensitive mapping of written texts to word-length series (wls) and analyse large parallel (of same content) corpora from 10 languages classified to four families (Romanic, Germanic, Greek and Uralic). The autocorrelation functions of the wls reveal tiny but persistent LRCs decaying at large scales following a power-law with a language-independent exponent ∼0.60–0.65. The impact of language is displayed in the amplitude of correlations where a relative standard deviation >40% among the analyzed languages is observed. The classification to language families seems to play a significant role since, the Finnish and Germanic languages exhibit more correlations than the Greek and Roman families. To reveal the origins of the LRCs, we focus on the long words and perform burst and correlation analysis in their positions along the corpora. We find that the universal features are linked more to the correlations of the inter-long word distances while the language-specific aspects are related more to their distributions.
    Original languageEnglish
    JournalInternational Journal of Modern Physics B
    Volume30
    Issue number15
    Early online date20 Aug 2015
    DOIs
    Publication statusPublished - 2016
    Publication typeA1 Journal article-refereed

    Publication forum classification

    • Publication forum level 1

    Fingerprint

    Dive into the research topics of 'Long-range correlations and burstiness in written texts: Universal and language-specific aspects'. Together they form a unique fingerprint.

    Cite this