Siirry päänavigointiin Siirry hakuun Siirry pääsisältöön

BIOMAT-NER: A Domain-Specific Corpus for Named Entity Recognition of Chemical Substances and Biomaterials

  • Judith Rosell (Barcelona Supercomputing Center) (Creator)
  • Minna Veiranto (Creator)
  • Maiju Juusela (Creator)
  • Agnieszka Piegat (West Pomeranian University of Technology in Szczecin) (Creator)
  • Juan Uribe-Gomez (Creator)
  • Carles Mas (Creator)
  • MARTA PEGUEROLES (Universitat Politècnica Catalunya-Barcelona Tech ) (Creator)
  • Jan Rodríguez Miret (Barcelona Supercomputing Center) (Creator)
  • Miguel Rodríguez Ortega (Barcelona Supercomputing Center) (Creator)
  • Martin Krallinger (Barcelona Supercomputing Center) (Creator)

Tietoaineisto

Kuvaus

BIOMAT-NER Corpus

BIOMAT-NER is a corpus developed within the scope of the Horizon Europe BIOMATDB project to support the extraction and classification of biomaterials-related concepts from the scientific literature. It focuses on the annotation of chemical substances, compounds, and material types—including trade names—relevant to the field of biomaterials. The corpus was created through a collaborative effort involving domain experts, who were tasked with the establishment of comprehensive and accurate annotation guidelines for the manual annotation of the final gold standard corpus. On this basis, PubMed abstracts were carefully selected based on relevant MeSH (Medical Subject Headings) categories associated with biomaterials and related disciplines to reflect the terminology commonly used in biomaterials research and manually annotated according to the predefined rules in the annotation guidelines.

The BIOMAT-NER corpus is one of four developed within the project and is divided into three subsets: a training set (4,553 documents), a test set (911 documents), and a validation set (607 documents), available in multiple formats, including brat, CSV and CoNLL.

This corpus is part of a broader initiative to support the development of an advanced, searchable biomaterials database with integrated analytical tools and digital advisors. It is also intended for use in training Named Entity Recognition (NER) models, enabling the automatic identification and extraction of biomaterials-related concepts from scientific texts.

Resources



Project Website

Biomaterials Marketplace

Biomaterials Database
Koska saatavilla24 huhtik. 2025
JulkaisijaZenodo

Field of science, Statistics Finland

  • 318 Lääketieteen bioteknologia

Siteeraa tätä