Abstrakti
Text data in the form of natural language is a valuable resource that contains domain-specific information applicable to various applications. An example are electronic health records (eHR) offering comprehensive insights into patients’ health histories, enabling knowledge extraction for clinical diagnosis and treatment. In this paper, we study multi-label text classification (MLTC) of eHR data by introducing two novel MLTC methods based on a threshold-learned convolutional neural network (CNN). We conduct comprehensive comparisons with other multi-label models and binary relevance (BR). Importantly, we do not only optimize the architecture of multi-label classifiers but also of the baseline BR model. As a result, our findings indicate that the adaptive-threshold CNN (AT-CNN) and implicit-threshold CNN (IT-CNN) provide a favorable approximation of a binary CNN (B-CNN) with the added benefit of improved runtime efficiency. The latter is crucial when the number of classes grows larger because the runtime of classifiers based on one-vs-rest mappings becomes increasingly prohibitive for such configurations.
Alkuperäiskieli | Englanti |
---|---|
Sivut | 93402 - 93419 |
Julkaisu | IEEE Access |
Vuosikerta | 11 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 28 elok. 2023 |
OKM-julkaisutyyppi | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä |
Julkaisufoorumi-taso
- Jufo-taso 1
!!ASJC Scopus subject areas
- Yleinen tietojenkäsittelytiede
- Yleinen materiaalitiede
- Yleinen tekniikka