Analysis of Chromatin and Proteins in Cancer

Research output: Book/ReportDoctoral thesisCollection of Articles


Gene expression is a thoroughly regulated process. The cooperation between proximal and/or distal regulative genomic elements allows precise positioning of the transcription machinery on gene’s promoter and modulates the synthesis of transcripts. Transcription factors (TFs) are proteins able to bind these regulative loci. The availability of these sites is in turn regulated by chromatin structure. In cancer the delicate equilibrium between accessible and precluded TF binding sites gets altered. In prostate cancer (PCa), androgen stimulation plays a central role in sustaining cancer growth. Primary PCa, after treatment, recurs in about a third of cases with a more aggressive, androgen insensitive phenotype. Specific genetic alterations have been re- ported to drive primary cancer development and the transition to castration resistant prostate cancer (CRPC). From these notions, the connection between chromatin state, gene expression and PCa development can be hypothesized. The assay for transposase-accessible chromatin coupled with sequencing (ATAC-seq) was used to study the chromatin organization of samples representing different PCa progression stage collected at the Tampere University Hospital. This dataset was analyzed together with previously generated transcriptomic and publicly available chromatin immunoprecipitation followed by sequencing (ChIP-seq) data. From ATAC-seq data, peaks and differentially accessible regions (DARs) were detected. Correlation be- tween ATAC-seq features and gene expression was calculated to assign each gene to a proximal or distal regulative region. At a global level, this analysis reported weak correlation between the two measurements. Nevertheless, expression of differentially expressed genes (DEG) showed a stronger correlation with accessible features. This observation supports the idea of alternative binding pattern utilization across PCa progression. To understand which transcriptional programs are involved in this process, TF binding sites were searched in candidate regulatory regions using ChIP- seq peaks. The transcription factor with highest number of binding sites across all ATAC-seq features is the androgen receptor (AR). Moreover, FOXA1 and HOXB13 were observed to co-localize with AR in two distinct sets of DARs with increased accessibility in PC or reduced accessibility in CRPC. This observation supports the idea of AR central role in driving PCa and lead to ask which TF co-modulate its activity in CRPC. To investigate this aspect and identify clusters of TF sharing target genes, a regulative network was built. Hierarchical clustering yielded two components: first a core, heavily connected module composed of AR, ERG, FOXA1 and ESR1, second a group of 43 TF sharing less target genes. This result confirms the central role of AR and highlights other TF, e.g. SP1, FLI1 and TP63 as its co-modulators.

All the identified TF share a fundamental structural organization: all of them have a DNA-binding domain and at least one regulatory domain. Moreover, the molecular structure of all these proteins show at least one intrinsically disordered region (IDR). These regions are flexible, display reduced hydrophobicity and net charge along their surface. In solution, intrinsically disordered proteins (IDPs) exist as a continuum of conformers with a structure that fluctuates from random coil to folded. To collect and organize literature-derived evidences of this phenomenon, the DisProt database was developed in 2006. Unfortunately, its updates were discontinued in 2013. To lead its manual annotation process, a dedicated web-service was created together with a completely re-designed web-application. While DisProt data is of the highest quality, the database size is limited. To extend intrinsic protein disorder annotation to the whole protein universe, MobiDB was created. This database collects data from eleven specialized external data sources and fifteen different tools for ID, secondary structure and low-complexity regions prediction. Using data from these resources the structure of above mentioned TFs was characterized and the emergent pattern of DNA-binding domain and IDRs detected.

Altogether these results demonstrate how integrated data analysis of multiple high throughput sequencing (HTS) measurements can help in dissecting the regulatory complexity of PCa by identifying sets of TFs involved cancer progression. Moreover, by utilizing these computational resources, structural features of identified proteins can be inferred. In general, these results provide a clear overview of the complexity of cellular phenomena, showcasing a data-driven workflow for detection of TFs involved in a disease and their structural characterization.
Original languageEnglish
Place of PublicationTampere
PublisherTampere University
ISBN (Electronic)978-952-03-1795-9
ISBN (Print)978-952-03-1794-2
Publication statusPublished - 2020
Publication typeG5 Doctoral dissertation (article)

Publication series

NameTampere University Dissertations - Tampereen yliopiston väitöskirjat
ISSN (Print)2489-9860
ISSN (Electronic)2490-0028


Dive into the research topics of 'Analysis of Chromatin and Proteins in Cancer'. Together they form a unique fingerprint.

Cite this