A cluster-assisted differential evolution-based hybrid oversampling method for imbalanced datasets

  • Muhammed Abdulhamid Karabiyik
  • , Bahaeddin Turkoglu
  • , Tunc Asuroglu*
  • *Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

2 Downloads (Pure)

Abstract

Class imbalance remains a significant challenge in machine learning, leading to biased models that favor the majority class while failing to accurately classify minority instances. Traditional oversampling methods, such as Synthetic Minority Over-sampling Technique (SMOTE) and its variants, often struggle with class overlap, poor decision boundary representation, and noise accumulation. To address these limitations, this study introduces ClusterDEBO, a novel hybrid oversampling method that integrates K-Means clustering with differential evolution (DE) to generate synthetic samples in a more structured and adaptive manner. The proposed method first partitions the minority class into clusters using the silhouette score to determine the optimal number of clusters. Within each cluster, DE-based mutation and crossover operations are applied to generate diverse and well-distributed synthetic samples while preserving the underlying data distribution. Additionally, a selective sampling and noise reduction mechanism is employed to filter out low-impact synthetic samples based on their contribution to classification performance. The effectiveness of ClusterDEBO is evaluated on 44 benchmark datasets using k-Nearest Neighbors (kNN), decision tree (DT), and support vector machines (SVM) as classifiers. The results demonstrate that ClusterDEBO consistently outperforms existing oversampling techniques, leading to improved class separability and enhanced classifier robustness. Moreover, statistical validation using the Friedman test confirms the significance of the improvements, ensuring that the observed gains are not due to random variations. The findings highlight the potential of cluster-assisted differential evolution as a powerful strategy for handling imbalanced datasets.

Original languageEnglish
Article numbere3177
JournalPeerJ Computer Science
Volume11
DOIs
Publication statusPublished - 2025
Publication typeA1 Journal article-refereed

Keywords

  • Differential evolution
  • Imbalanced datasets
  • K-Means clustering
  • Oversampling
  • Synthetic sample generation

Publication forum classification

  • Publication forum level 1

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'A cluster-assisted differential evolution-based hybrid oversampling method for imbalanced datasets'. Together they form a unique fingerprint.

Cite this