Skip to main navigation Skip to search Skip to main content

EpiSmokEr2: a robust epigenetic classifier for smoking status inference using Illumina EPIC methylation data

  • Tianyu Zhu*
  • , Teodóra Faragó
  • , Sailalitha Bollepalli
  • , Aino Heikkinen
  • , Mikaela Hukkanen
  • , Olli Raitakari
  • , Terho Lehtimäki
  • , Tellervo Korhonen
  • , Jaakko Kaprio
  • , Fang Fang
  • , Kaitlyn G. Lawrence
  • , Dale P. Sandler
  • , Mari Roberts Spildrejorde
  • , Kristina Gervin
  • , Yanyu Pan
  • , Ricardo Costeira
  • , Jordana T. Bell
  • , Miina Ollikainen*
  • *Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

2 Downloads (Pure)

Abstract

Aim: Tobacco smoking induces persistent DNA methylation (DNAm) changes in blood that can serve as long-term biomarkers for smoking exposure. We aimed to develop and validate a DNAm classifier of smoking status using Illumina EPIC array data. Methods: We built Epigenetic Smoking status Estimator2 (EpiSmokEr2), a Least Absolute Shrinkage and Selection Operator (LASSO) regression-based DNAm classifier using 511 CpGs from Illumina Infinium MethylationEPIC array (EPIC) data. The model was trained on 1343 samples from the Young Finns Study cohort and validated across six independent datasets from four cohorts and two array platforms (EPIC and EPICv2). Results: EpiSmokEr2 achieved an average sensitivity of 0.87 and specificity of 0.86 in distinguishing current from never smokers. Predicted smoking status correlated strongly with established DNAm smoking scores and GrimAge, indicating its ability to capture biologically relevant smoking effects. Simulation analysis showed EpiSmokEr2 was robust for up to 10% missing CpGs. Conclusion: EpiSmokEr2 provides a reliable DNAm-based estimator of smoking status. It is available as an open-source R package on GitHub, facilitating broad use in epidemiological and clinical research.

Original languageEnglish
Pages (from-to)205-215
Number of pages11
JournalEpigenomics
Volume18
Issue number2
DOIs
Publication statusPublished - 2026
Publication typeA1 Journal article-refereed

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • biomarkers
  • classifier
  • DNA methylation
  • illumina EPIC array
  • LASSO regression
  • smoking status

Publication forum classification

  • Publication forum level 1

ASJC Scopus subject areas

  • Genetics
  • Cancer Research

Fingerprint

Dive into the research topics of 'EpiSmokEr2: a robust epigenetic classifier for smoking status inference using Illumina EPIC methylation data'. Together they form a unique fingerprint.

Cite this