A hybrid data harmonization workflow using word embeddings for the interlinking of heterogeneous cross-domain clinical data structures

Vasileios C. Pezoulas, Antonis Sakellarios, Marcus Kleber, Jos A. Bosch, Sander W. van der Laan, Femke Lamers, Terho Lehtimäki, Winfried März, Dimitrios I. Fotiadis

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Retrospective data harmonization is an open issue in healthcare due to the emerging need to interlink data from multiple clinical centers with the absence of standardized data collection protocols. In this work, we present an automated data harmonization workflow which utilizes lexical and semantic analysis based on word embeddings and relational modeling to detect terminologies with common lexical and conceptual basis. The method is built on top of a knowledge base to enable the interlinking of heterogeneous cross-domain data. A case study is applied in two clinical domains, namely the cardiovascular disease (CVD) and the mental disorders, where the proposed method yielded matched terminologies with 85% precision in less execution time than the application of lexical analysis and manual mapping which yielded 10% less precision.
Original languageEnglish
Title of host publication2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)
PublisherIEEE
Number of pages4
ISBN (Electronic)978-1-6654-0358-0
DOIs
Publication statusPublished - 2021
Publication typeA4 Article in a conference publication
EventIEEE EMBS International Conference on Biomedical and Health Informatics (BHI) - Athens, Greece
Duration: 27 Jul 202130 Jul 2021

Publication series

Name
ISSN (Electronic)2641-3604

Conference

ConferenceIEEE EMBS International Conference on Biomedical and Health Informatics (BHI)
Country/TerritoryGreece
CityAthens
Period27/07/2130/07/21

Keywords

  • Protocols
  • Terminology
  • Mental disorders
  • Semantics
  • Knowledge based systems
  • Medical services
  • Manuals
  • data harmonization
  • lexical matching
  • semantic matching
  • cardiovascular diseases
  • mental disorders

Publication forum classification

  • Publication forum level 1

Fingerprint

Dive into the research topics of 'A hybrid data harmonization workflow using word embeddings for the interlinking of heterogeneous cross-domain clinical data structures'. Together they form a unique fingerprint.

Cite this