RARE: a labeled dataset for cloud-native memory anomalies

Francesco Lomio, Diego Martínez Baselga, Sergio Moreschini, Heikki Huttunen, Davide Taibi

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

4 Citations (Scopus)
8 Downloads (Pure)


Anomaly detection has been attracting interest from both the industry and the research community for many years, as the number of published papers and services adopted grew exponentially over the last decade. One of the reasons behind this is the wide adoption of cloud systems from the majority of players in multiple industries, such as online shopping, advertisement or remote computing. In this work we propose a Dataset foR cloud-nAtive memoRy anomaliEs: RARE. It includes labelled anomaly time-series data, comprising of over 900 unique metrics. This dataset has been generated using a microservice for injecting artificial byte stream in order to overload the nodes, provoking memory anomalies, which in some cases resulted in a crash. The system was built using a Kafka server deployed on a Kubernetes system. Moreover, in order to get access and download the metrics related to the server, we utilised Prometheus. In this paper we present a dataset that can be used coupled with machine learning algorithms for detecting anomalies in a cloud based system. The dataset will be available in the form of CSV file through an online repository. Moreover, we also included an example of application using a Random Forest algorithm for classifying the data as anomalous or not. The goal of the RARE dataset is to help in the development of more accurate and reliable machine learning methods for anomaly detection in cloud based systems.
Original languageEnglish
Title of host publicationMaLTeSQuE 2020
Subtitle of host publicationProceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation
EditorsFoutse Khomh, Pasquale Salza, Gemma Catolino
Number of pages6
ISBN (Print)978-1-4503-8124-6
Publication statusPublished - 2020
Publication typeA4 Article in conference proceedings
EventACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation -
Duration: 1 Jan 2000 → …


ConferenceACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation
Period1/01/00 → …


  • Dataset
  • anomaly detection
  • kubernetes
  • self healing
  • machine learning

Publication forum classification

  • Publication forum level 1


Dive into the research topics of 'RARE: a labeled dataset for cloud-native memory anomalies'. Together they form a unique fingerprint.

Cite this