Demystifying Data Science Projects: a Look on the People and Process of Data Science Today

Timo Aho, Outi Sievi-Korte, Terhi Kilamo, Sezin Gizem Yaman, Tommi Mikkonen

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Processes and practices used in data science projects have
been reshaping especially over the last decade. These are different from
their software engineering counterparts. However, to a large extent, data
science relies on software, and, once taken to use, the results of a data
science project are often embedded in software context. Hence, seeking
synergy between software engineering and data science might open
promising avenues. However, while there are various studies on data science
ows and data science project teams, there have been no attempts
to combine these two very interlinked aspects. Furthermore, existing
studies usually focus on practices within one company. Our study
will fill these gaps with a multi-company case study, concentrating both
on the roles found in data science project teams as well as the process.
In this paper, we have studied a number of practicing data scientists to
understand a typical process
flow for a data science project. In addition,
we studied the involved roles and the teamwork that would take place
in the data context. Our analysis revealed three main elements of data
science projects: Experimentation, Development Approach, and Multidisciplinary
team(work). These key concepts are further broken down to
13 different sub-themes in total. The found themes pinpoint critical elements
and challenges found in data science projects, which are still often
done in an ad-hoc fashion. Finally, we compare the results with modern
software development to analyse how good a match there is.
Original languageEnglish
Title of host publication Product-Focused Software Process Improvement
Subtitle of host publication21st International Conference, PROFES 2020, Turin, Italy, November 25–27, 2020, Proceedings
ISBN (Electronic)978-3-030-64148-1
Publication statusPublished - Nov 2020
Publication typeA4 Article in conference proceedings
EventInternational Conference on Product-Focused Software Process Improvement -
Duration: 1 Jan 2000 → …

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceInternational Conference on Product-Focused Software Process Improvement
Period1/01/00 → …


  • data science
  • data engineering
  • software process
  • prototyping
  • case study

Publication forum classification

  • Publication forum level 1


Dive into the research topics of 'Demystifying Data Science Projects: a Look on the People and Process of Data Science Today'. Together they form a unique fingerprint.

Cite this