A Multi-Stain Breast Cancer Histological Whole-Slide-Image Data Set from Routine Diagnostics

Philippe Weitz, Masi Valkonen, Leslie Solorzano, Circe Carr, Kimmo Kartasalo, Constance Boissin, Sonja Koivukoski, Aino Kuusela, Dusan Rasic, Yanbo Feng, Sandra Sinius Pouplier, Abhinav Sharma, Kajsa Ledesma Eriksson, Leena Latonen, Anne Vibeke Laenkholm, Johan Hartman, Pekka Ruusuvuori, Mattias Rantalainen

    Research output: Contribution to journalData articlepeer-review

    3 Downloads (Pure)


    The analysis of FFPE tissue sections stained with haematoxylin and eosin (H&E) or immunohistochemistry (IHC) is essential for the pathologic assessment of surgically resected breast cancer specimens. IHC staining has been broadly adopted into diagnostic guidelines and routine workflows to assess the status of several established biomarkers, including ER, PGR, HER2 and KI67. Biomarker assessment can also be facilitated by computational pathology image analysis methods, which have made numerous substantial advances recently, often based on publicly available whole slide image (WSI) data sets. However, the field is still considerably limited by the sparsity of public data sets. In particular, there are no large, high quality publicly available data sets with WSIs of matching IHC and H&E-stained tissue sections from the same tumour. Here, we publish the currently largest publicly available data set of WSIs of tissue sections from surgical resection specimens from female primary breast cancer patients with matched WSIs of corresponding H&E and IHC-stained tissue, consisting of 4,212 WSIs from 1,153 patients.

    Original languageEnglish
    Article number562
    Number of pages6
    JournalScientific Data
    Publication statusPublished - Aug 2023
    Publication typeA1 Journal article-refereed

    Publication forum classification

    • Publication forum level 1

    ASJC Scopus subject areas

    • Statistics and Probability
    • Information Systems
    • Education
    • Computer Science Applications
    • Statistics, Probability and Uncertainty
    • Library and Information Sciences


    Dive into the research topics of 'A Multi-Stain Breast Cancer Histological Whole-Slide-Image Data Set from Routine Diagnostics'. Together they form a unique fingerprint.

    Cite this