Abstrakti
Background: Rare disease diagnoses are often delayed by years, including multiple doctor visits, and potential
imprecise or incorrect diagnoses before receiving the correct one. Machine learning could solve this problem by
flagging potential patients that doctors should examine more closely.
Methods: Making the prediction situation as close as possible to real situation, we tested different masking sizes.
In the masking phase, data was removed, and it was applied to all data points following the first rare disease
diagnosis, including the day when the diagnosis was received, and in addition applied to selected number of days
before initial diagnosis. Performance of machine learning models were compared with positive predictive value
(PPV), negative predictive value (NPV), prevalence PPV (pPPV), prevalence NPV (pNPV), accuracy (ACC) and
area under the receiver operation characteristics curve (AUC).
Results: XGBoost had PPVs over 90 % in all masking settings, and InceptionVasGloMyotides had most of the PPVs
over 90 %, but not as consistently. When the prevalence of the diseases was considered XGBoost achieved highest
value of 8.8 % in binary classification with 30 days masking and InceptionVasGloMyotides achieved the best
value of 6 % in the binary classification as well, but with 2160 days and 4320 days masking. ACC were varying
between 89 % and 98 % with XGBoost and InceptionVasGloMyotides having variation between 79 % and 94 %.
AUC on the other hand varied between 72.6 % and 94.5 % with InceptionVasGloMyotides and for XGBoost it
varied between 69.9 % and 96.4 %.
Conclusions: XGBoost and InceptionVasGloMyotides could successfully predict rare diseases for patients at least
30 days prior to initial rare disease diagnose. In addition, we managed to build performative custom deep
learning model.
imprecise or incorrect diagnoses before receiving the correct one. Machine learning could solve this problem by
flagging potential patients that doctors should examine more closely.
Methods: Making the prediction situation as close as possible to real situation, we tested different masking sizes.
In the masking phase, data was removed, and it was applied to all data points following the first rare disease
diagnosis, including the day when the diagnosis was received, and in addition applied to selected number of days
before initial diagnosis. Performance of machine learning models were compared with positive predictive value
(PPV), negative predictive value (NPV), prevalence PPV (pPPV), prevalence NPV (pNPV), accuracy (ACC) and
area under the receiver operation characteristics curve (AUC).
Results: XGBoost had PPVs over 90 % in all masking settings, and InceptionVasGloMyotides had most of the PPVs
over 90 %, but not as consistently. When the prevalence of the diseases was considered XGBoost achieved highest
value of 8.8 % in binary classification with 30 days masking and InceptionVasGloMyotides achieved the best
value of 6 % in the binary classification as well, but with 2160 days and 4320 days masking. ACC were varying
between 89 % and 98 % with XGBoost and InceptionVasGloMyotides having variation between 79 % and 94 %.
AUC on the other hand varied between 72.6 % and 94.5 % with InceptionVasGloMyotides and for XGBoost it
varied between 69.9 % and 96.4 %.
Conclusions: XGBoost and InceptionVasGloMyotides could successfully predict rare diseases for patients at least
30 days prior to initial rare disease diagnose. In addition, we managed to build performative custom deep
learning model.
Alkuperäiskieli | Englanti |
---|---|
Artikkeli | 107917 |
Sivumäärä | 7 |
Julkaisu | Computer Methods and Programs in Biomedicine |
Vuosikerta | 243 |
Varhainen verkossa julkaisun päivämäärä | 8 marrask. 2023 |
DOI - pysyväislinkit | |
Tila | Julkaistu - tammik. 2024 |
OKM-julkaisutyyppi | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä |
Julkaisufoorumi-taso
- Jufo-taso 1