Impact of deep learning-determined smoking status on mortality of cancer patients: never too late to quit

A. Karlsson, A. Ellonen, H. Irjala, V. Väliaho, K. Mattila, L. Nissi, E. Kytö, S. Kurki, R. Ristamäki, P. Vihinen, T. Laitinen, A. Ålgars, S. Jyrkkiö, H. Minn, E. Heervä

Tutkimustuotos: ArtikkeliScientificvertaisarvioitu


BACKGROUND: Persistent smoking after cancer diagnosis is associated with increased overall mortality (OM) and cancer mortality (CM). According to the 2020 Surgeon General's report, smoking cessation may reduce CM but supporting evidence is not wide. Use of deep learning-based modeling that enables universal natural language processing of medical narratives to acquire population-based real-life smoking data may help overcome the challenge. We assessed the effect of smoking status and within-1-year smoking cessation on CM by an in-house adapted freely available language processing algorithm. MATERIALS AND METHODS: This cross-sectional real-world study included 29 823 patients diagnosed with cancer in 2009-2018 in Southwest Finland. The medical narrative, International Classification of Diseases-10th edition codes, histology, cancer treatment records, and death certificates were combined. Over 162 000 sentences describing tobacco smoking behavior were analyzed with ULMFiT and BERT algorithms. RESULTS: The language model classified the smoking status of 23 031 patients. Recent quitters had reduced CM [hazard ratio (HR) 0.80 (0.74-0.87)] and OM [HR 0.78 (0.72-0.84)] compared to persistent smokers. Compared to never smokers, persistent smokers had increased CM in head and neck, gastro-esophageal, pancreatic, lung, prostate, and breast cancer and Hodgkin's lymphoma, irrespective of age, comorbidities, performance status, or presence of metastatic disease. Increased CM was also observed in smokers with colorectal cancer, men with melanoma or bladder cancer, and lymphoid and myeloid leukemia, but no longer independently of the abovementioned covariates. Specificity and sensitivity were 96%/96%, 98%/68%, and 88%/99% for never, former, and current smokers, respectively, being essentially the same with both models. CONCLUSIONS: Deep learning can be used to classify large amounts of smoking data from the medical narrative with good accuracy. The results highlight the detrimental effects of persistent smoking in oncologic patients and emphasize that smoking cessation should always be an essential element of patient counseling.

JulkaisuEsmo Open
DOI - pysyväislinkit
TilaJulkaistu - kesäk. 2021
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä


  • Jufo-taso 1

!!ASJC Scopus subject areas

  • Oncology
  • Cancer Research


Sukella tutkimusaiheisiin 'Impact of deep learning-determined smoking status on mortality of cancer patients: never too late to quit'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä