Siirry päänavigointiin Siirry hakuun Siirry pääsisältöön

ChatGPT-4o in Risk-of-Bias Assessments in Neonatology: A Validity Analysis

  • Ilari Kuitunen*
  • , Lauri Nyrhi
  • , Daniele De Luca
  • *Tämän työn vastaava kirjoittaja

Tutkimustuotos: ArtikkeliTieteellinenvertaisarvioitu

5 Sitaatiot (Scopus)
9 Lataukset (Pure)

Abstrakti

Introduction: Only a few studies have addressed the potential of large language models (LLMs) in risk-of-bias assessments and the results have been varying. The aim of this study was to analyze how well ChatGPT performs in risk-ofbias assessments of neonatal studies. Methods: We searched all Cochrane neonatal intervention reviews published in 2024 and extracted all risk-of-bias assessments. Then the full reports were retrieved and uploaded alongside the guidance to perform a Cochrane original risk-of-bias analysis in ChatGPT- 4o. The concordance between the original assessment and that provided by ChatGPT-4o was evaluated by inter-class correlation coefficients and Cohen's kappa statistics (with 95%confidence intervals) for each risk-of-bias domain and for the overall assessment. Results: From 9 reviews, a total of 61 randomized studies were analyzed. A total of 427 judgments were compared. The overall κ was 0.43 (95% CI: 0.35-0.51) and the overall intraclass correlation coefficient was 0.65 (95% CI: 0.59-0.70). The Cohen's κ was assessed for each domain and the best agreement was observed in the allocation concealment (κ = 0.73, 95% CI: 0.55-0.90), whereas the poorest agreement was found in incomplete outcome data (κ = -0.03, 95% CI: -0.07-0.02). Conclusion: ChatGPT-4o failed to achieve sufficient agreement in the risk-of-bias assessments. Future studies should examine whether the performance of other LLM would be better or whether the agreement in ChatGPT-4o could be further enhanced by better prompting. Currently, the use of ChatGPT-4o in risk-ofbias assessments should not be promoted.

AlkuperäiskieliEnglanti
Sivut360–365
Sivumäärä6
JulkaisuNeonatology
Vuosikerta122
Numero3
DOI - pysyväislinkit
TilaJulkaistu - 2025
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Julkaisufoorumi-taso

  • Jufo-taso 2

!!ASJC Scopus subject areas

  • Pediatrics, Perinatology, and Child Health
  • Developmental Biology

Sormenjälki

Sukella tutkimusaiheisiin 'ChatGPT-4o in Risk-of-Bias Assessments in Neonatology: A Validity Analysis'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä