## Abstract

Rank or the minimal number of generators is a natural invariant attached to any n-dimensional persistent vector space. However, rank is highly unstable. Building an algorithmic framework for stabilizing the rank in one-dimensional persistence and proving its usefulness in concrete data analysis are the main objectives of this thesis. Studied stabilization process relies on choosing a pseudometric between tame persistent vector spaces. This allows to minimize the rank of a persistent vector space in larger and larger neighbourhoods around it with respect to the chosen pseudometric. The result is the stable rank invariant, a simple non-increasing function from non-negative reals to non-negative reals.

We show how the needed pseudometrics arise from so called persistence contours. Contour is a certain function system which can be generated very efficiently and in implementable way by integrating a so called density function from non-negative reals to strictly positive reals. We prove an algorithmic way of computing the stable rank invariant with respect to a chosen contour. The result of the theoretical development is an embedding theorem showing that persistent vector spaces embed into Lebesgue measurable functions through stable rank.

The success of persistent homology in data analysis has been largely due to the barcode decomposition and its efficient computation. One result of this thesis is that the barcode decomposition can be proved using the monotonicity of the rank with respect to taking a subspace of persistent vector space. This property of the rank only holds in one-dimensional case. We claim that rank is more fundamental for persistence and barcode is but a technical artifact of its properties. Even though barcode is a powerful tool, progress in persistence theory requires invariants generalizing to multi-dimensional persistence and not relying on decomposition theorems.

Recent years have seen active research around mapping barcodes to some representation that enables statistics of results from persistent homology analysis and connects naturally to machine learning algorithms. Our embedding theorem shows that the stable rank provides a connection to machine learning. One of our main results is the full applicability of our pipeline in practical data analysis. We demonstrate how choosing an appropriate contour can enhance results of supervised learning. Contour can also be seen to act as a form of feature selection on the bar decomposition.

We show how the needed pseudometrics arise from so called persistence contours. Contour is a certain function system which can be generated very efficiently and in implementable way by integrating a so called density function from non-negative reals to strictly positive reals. We prove an algorithmic way of computing the stable rank invariant with respect to a chosen contour. The result of the theoretical development is an embedding theorem showing that persistent vector spaces embed into Lebesgue measurable functions through stable rank.

The success of persistent homology in data analysis has been largely due to the barcode decomposition and its efficient computation. One result of this thesis is that the barcode decomposition can be proved using the monotonicity of the rank with respect to taking a subspace of persistent vector space. This property of the rank only holds in one-dimensional case. We claim that rank is more fundamental for persistence and barcode is but a technical artifact of its properties. Even though barcode is a powerful tool, progress in persistence theory requires invariants generalizing to multi-dimensional persistence and not relying on decomposition theorems.

Recent years have seen active research around mapping barcodes to some representation that enables statistics of results from persistent homology analysis and connects naturally to machine learning algorithms. Our embedding theorem shows that the stable rank provides a connection to machine learning. One of our main results is the full applicability of our pipeline in practical data analysis. We demonstrate how choosing an appropriate contour can enhance results of supervised learning. Contour can also be seen to act as a form of feature selection on the bar decomposition.

Original language | English |
---|---|

Publisher | Tampere University |

Number of pages | 120 |

Volume | 88 |

ISBN (Electronic) | 978-952-03-1153-7 |

ISBN (Print) | 978-952-03-1152-0 |

Publication status | Published - 2 Aug 2019 |

Publication type | G4 Doctoral dissertation (monograph) |

### Publication series

Name | Tampere University Dissertations |
---|---|

Volume | 88 |

ISSN (Print) | 2489-9860 |

ISSN (Electronic) | 2490-0028 |