Retrospective data harmonization is an open issue in healthcare due to the emerging need to interlink data from multiple clinical centers with the absence of standardized data collection protocols. In this work, we present an automated data harmonization workflow which utilizes lexical and semantic analysis based on word embeddings and relational modeling to detect terminologies with common lexical and conceptual basis. The method is built on top of a knowledge base to enable the interlinking of heterogeneous cross-domain data. A case study is applied in two clinical domains, namely the cardiovascular disease (CVD) and the mental disorders, where the proposed method yielded matched terminologies with 85% precision in less execution time than the application of lexical analysis and manual mapping which yielded 10% less precision.
|Conference||IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)|
|Ajanjakso||27/07/21 → 30/07/21|