Abstract
In recent years, there has been a growing interest in analyzing text data from different scientific fields. The significant advancement of Artificial Intelligence in Natural Language Processing enables a systematic categorization of the wealth of scientific papers into fundamental thematic clusters. In this context, topic modeling is playing a crucial role. Unfortunately, the comparative analysis between traditional and advanced topic modeling methods, including well-established techniques like Latent Dirichlet Allocation (LDA) and newer approaches like BERTopic, remains significantly underexplored. This study addresses this gap by conducting a comprehensive analysis of extensive text data focused on sustainable energy research. To achieve this, we compile a unique dataset consisting of thousands of abstracts sourced from PubMed, Scopus, and Web of Science. Our analysis involves a comparison between LDA and the transformer model BERTopic. Importantly, we introduce a novel approach to determine the optimal number of topics, achieved through the maximization of combined semantic scores, and show that the number of topics is considerably lower than from previous approaches. Overall, our study not only contributes methodologically but also enhances our understanding of the principal topics in sustainable energy research.
Original language | English |
---|---|
Article number | 108877 |
Journal | Engineering Applications of Artificial Intelligence |
Volume | 136 |
DOIs | |
Publication status | Published - Oct 2024 |
Publication type | A1 Journal article-refereed |
Keywords
- Latent Dirichlet Allocation
- Natural language processing
- Sustainable energy
- Topic modeling
- Transformer model
Publication forum classification
- Publication forum level 2
ASJC Scopus subject areas
- Control and Systems Engineering
- Artificial Intelligence
- Electrical and Electronic Engineering