Abstract
Retrospective data harmonization is an open issue in healthcare due to the emerging need to interlink data from multiple clinical centers with the absence of standardized data collection protocols. In this work, we present an automated data harmonization workflow which utilizes lexical and semantic analysis based on word embeddings and relational modeling to detect terminologies with common lexical and conceptual basis. The method is built on top of a knowledge base to enable the interlinking of heterogeneous cross-domain data. A case study is applied in two clinical domains, namely the cardiovascular disease (CVD) and the mental disorders, where the proposed method yielded matched terminologies with 85% precision in less execution time than the application of lexical analysis and manual mapping which yielded 10% less precision.
| Original language | English |
|---|---|
| Title of host publication | 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI) |
| Publisher | IEEE |
| Number of pages | 4 |
| ISBN (Electronic) | 978-1-6654-0358-0 |
| DOIs | |
| Publication status | Published - 2021 |
| Publication type | A4 Article in conference proceedings |
| Event | IEEE EMBS International Conference on Biomedical and Health Informatics (BHI) - Athens, Greece Duration: 27 Jul 2021 → 30 Jul 2021 |
Publication series
| Name | |
|---|---|
| ISSN (Electronic) | 2641-3604 |
Conference
| Conference | IEEE EMBS International Conference on Biomedical and Health Informatics (BHI) |
|---|---|
| Country/Territory | Greece |
| City | Athens |
| Period | 27/07/21 → 30/07/21 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Protocols
- Terminology
- Mental disorders
- Semantics
- Knowledge based systems
- Medical services
- Manuals
- data harmonization
- lexical matching
- semantic matching
- cardiovascular diseases
- mental disorders
Publication forum classification
- Publication forum level 1
Fingerprint
Dive into the research topics of 'A hybrid data harmonization workflow using word embeddings for the interlinking of heterogeneous cross-domain clinical data structures'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver