Abstract
Processes and practices used in data science projects have
been reshaping especially over the last decade. These are different from
their software engineering counterparts. However, to a large extent, data
science relies on software, and, once taken to use, the results of a data
science project are often embedded in software context. Hence, seeking
synergy between software engineering and data science might open
promising avenues. However, while there are various studies on data science
work
ows and data science project teams, there have been no attempts
to combine these two very interlinked aspects. Furthermore, existing
studies usually focus on practices within one company. Our study
will fill these gaps with a multi-company case study, concentrating both
on the roles found in data science project teams as well as the process.
In this paper, we have studied a number of practicing data scientists to
understand a typical process
flow for a data science project. In addition,
we studied the involved roles and the teamwork that would take place
in the data context. Our analysis revealed three main elements of data
science projects: Experimentation, Development Approach, and Multidisciplinary
team(work). These key concepts are further broken down to
13 different sub-themes in total. The found themes pinpoint critical elements
and challenges found in data science projects, which are still often
done in an ad-hoc fashion. Finally, we compare the results with modern
software development to analyse how good a match there is.
been reshaping especially over the last decade. These are different from
their software engineering counterparts. However, to a large extent, data
science relies on software, and, once taken to use, the results of a data
science project are often embedded in software context. Hence, seeking
synergy between software engineering and data science might open
promising avenues. However, while there are various studies on data science
work
ows and data science project teams, there have been no attempts
to combine these two very interlinked aspects. Furthermore, existing
studies usually focus on practices within one company. Our study
will fill these gaps with a multi-company case study, concentrating both
on the roles found in data science project teams as well as the process.
In this paper, we have studied a number of practicing data scientists to
understand a typical process
flow for a data science project. In addition,
we studied the involved roles and the teamwork that would take place
in the data context. Our analysis revealed three main elements of data
science projects: Experimentation, Development Approach, and Multidisciplinary
team(work). These key concepts are further broken down to
13 different sub-themes in total. The found themes pinpoint critical elements
and challenges found in data science projects, which are still often
done in an ad-hoc fashion. Finally, we compare the results with modern
software development to analyse how good a match there is.
Original language | English |
---|---|
Title of host publication | Product-Focused Software Process Improvement |
Subtitle of host publication | 21st International Conference, PROFES 2020, Turin, Italy, November 25–27, 2020, Proceedings |
Pages | 153-164 |
ISBN (Electronic) | 978-3-030-64148-1 |
DOIs | |
Publication status | Published - Nov 2020 |
Publication type | A4 Article in conference proceedings |
Event | International Conference on Product-Focused Software Process Improvement - Duration: 1 Jan 2000 → … |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer, Cham |
Volume | 12562 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | International Conference on Product-Focused Software Process Improvement |
---|---|
Period | 1/01/00 → … |
Keywords
- data science
- data engineering
- software process
- prototyping
- case study
Publication forum classification
- Publication forum level 1