TY - JOUR
T1 - Permutation-based significance analysis reduces the type 1 error rate in bisulfite sequencing data analysis of human umbilical cord blood samples
AU - Laajala, Essi
AU - Halla-aho, Viivi
AU - Grönroos, Toni
AU - Kalim, Ubaid Ullah
AU - Vähä-Mäkilä, Mari
AU - Nurmio, Mirja
AU - Kallionpää, Henna
AU - Lietzén, Niina
AU - Mykkänen, Juha
AU - Rasool, Omid
AU - Toppari, Jorma
AU - Orešič, Matej
AU - Knip, Mikael
AU - Lund, Riikka
AU - Lahesmaa, Riitta
AU - Lähdesmäki, Harri
N1 - Funding Information:
We are grateful to the personnel of Turku University Hospital. We thank Riitta Veijola, Jorma Ilonen, and Heikki Hyöty for providing the data from the Diabetes Prediction and Prevention (DIPP) study. We thank Mikko Konki and Roosa Kattelus for assistance in the Pyrosequencing. We are grateful to Bishwa R. Ghimire, Asta Laiho, and Laura L. Elo for their insight into the RRBS data analysis. We acknowledge the Turku Bioscience Centre’s core facility, the Finnish Functional Genomics Centre (FFGC) supported by Biocenter Finland, for their assistance. We acknowledge the Finnish Centre for Scientific Computing (CSC) and the computational resources provided by the Aalto Science-IT project.
Publisher Copyright:
© 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
PY - 2022/3
Y1 - 2022/3
N2 - DNA methylation patterns are largely established in-utero and might mediate the impacts of in-utero conditions on later health outcomes. Associations between perinatal DNA methylation marks and pregnancy-related variables, such as maternal age and gestational weight gain, have been earlier studied with methylation microarrays, which typically cover less than 2% of human CpG sites. To detect such associations outside these regions, we chose the bisulphite sequencing approach. We collected and curated clinical data on 200 newborn infants; whose umbilical cord blood samples were analysed with the reduced representation bisulphite sequencing (RRBS) method. A generalized linear mixed-effects model was fit for each high coverage CpG site, followed by spatial and multiple testing adjustment of P values to identify differentially methylated cytosines (DMCs) and regions (DMRs) associated with clinical variables, such as maternal age, mode of delivery, and birth weight. Type 1 error rate was then evaluated with a permutation analysis. We discovered a strong inflation of spatially adjusted P values through the permutation analysis, which we then applied for empirical type 1 error control. The inflation of P values was caused by a common method for spatial adjustment and DMR detection, implemented in tools comb-p and RADMeth. Based on empirically estimated significance thresholds, very little differential methylation was associated with any of the studied clinical variables, other than sex. With this analysis workflow, the sex-associated differentially methylated regions were highly reproducible across studies, technologies, and statistical models.
AB - DNA methylation patterns are largely established in-utero and might mediate the impacts of in-utero conditions on later health outcomes. Associations between perinatal DNA methylation marks and pregnancy-related variables, such as maternal age and gestational weight gain, have been earlier studied with methylation microarrays, which typically cover less than 2% of human CpG sites. To detect such associations outside these regions, we chose the bisulphite sequencing approach. We collected and curated clinical data on 200 newborn infants; whose umbilical cord blood samples were analysed with the reduced representation bisulphite sequencing (RRBS) method. A generalized linear mixed-effects model was fit for each high coverage CpG site, followed by spatial and multiple testing adjustment of P values to identify differentially methylated cytosines (DMCs) and regions (DMRs) associated with clinical variables, such as maternal age, mode of delivery, and birth weight. Type 1 error rate was then evaluated with a permutation analysis. We discovered a strong inflation of spatially adjusted P values through the permutation analysis, which we then applied for empirical type 1 error control. The inflation of P values was caused by a common method for spatial adjustment and DMR detection, implemented in tools comb-p and RADMeth. Based on empirically estimated significance thresholds, very little differential methylation was associated with any of the studied clinical variables, other than sex. With this analysis workflow, the sex-associated differentially methylated regions were highly reproducible across studies, technologies, and statistical models.
KW - analysis workflow
KW - bisulphite sequencing
KW - differential methylation
KW - DNA methylation
KW - pregnancy
KW - RRBS
KW - sex
KW - spatial correlation
KW - type 1 error
KW - umbilical cord blood
U2 - 10.1080/15592294.2022.2044127
DO - 10.1080/15592294.2022.2044127
M3 - Article
C2 - 35246015
AN - SCOPUS:85126064394
SN - 1559-2294
JO - EPIGENETICS
JF - EPIGENETICS
ER -