Characterisation of the CRISPR/Cas system of the hyperthermophilic Archaeum Thermoproteus tenax
Clustered regularly interspaced short palindromic repeats (CRISPR) found in prokaryotes are non-contiguous direct repeats with a length of 24-48 nt. The sequence repeats are weakly palindromic at the 5'- and 3'-termini and separated by variable spacer sequences of similar size (Jansen et al., 2002). CRISPR loci are flanked on one side by an AT-rich leader sequence of 200-350 bp length. CRISPR sequences are widespread in the two prokaryotic domains; they were identifed in most archaeal genomes, in 40 % of the bacterial genomes and on some plasmids (Grissa et al., 2007). Similarity searches of spacer sequences of CRISPR showed that some spacer match to phages and other extrachromosomal elements, such as plasmids or transposons, but also to chromosomal DNA (Mojica et al., 2005; Bolotin et al., 2005). A group of cas genes is always located near to a CRISPR locus and the encoded proteins are assumed to be the essential actors for function and assembly of CRISPR (Haft et al., 2005). Recently, it was demonstrated that, in response to phage infection, Bacteria integrate new spacers into their CRISPR arrays, which results in CRISPR-mediated phage resistance. The new integrated spacers were derived from the genome of the challenging phage, due to 100 % identity of spacers and phage sequences (Barrangou et al., 2007). However, the role of CRISPR arrays in microbial genomes and the mechanisms that underlie CRISPR function are mostly uncharacterized, especially in Archaea. Also, whether different CRISPR systems contain different functionalities is still puzzling (Sorek et al., 2008). Therefore, the function of the archaeal CRISPR/Cas system is studied in more detail using the hyperthermophilic organism Thermoproteus tenax as an example. In the course of this work seven CRISPR loci could be identified in the genome of T. tenax. The spacer sequences showed significant similarity not only to archaeal phages, but also to genes of the T. tenax genome. Northern blot analyses of small RNA species prepared from T. tenax cells with spacer probes from all seven CRISPR arrays, revealed transcript length of ~130 nt, ~110 nt, ~70 nt and ~50 nt, suggesting that large CRISPR RNA transcripts are stepwise endo- and exonucleolytically processed. The repeat sequence within the transcript has probably two functions: i) stabilising the CRISPR transcript by the formation of hairpin-structures and ii) binding motif for nuclease reaction. The transcription start site is located in front of the first repeat within the leader sequence distinguished by typical BRE-sites and TATA-boxes. Over 20 cas genes were identified in the genome of T. tenax and their organisation is similar to other crenarchaeal organisms. The core cas genes are organised in two operon structures, named casa1 (cas4, cas1/2 and csa1) and casa2 (csa5, csa2, cas5a, cas3, cas3hd and csa4). In front of both operons TATA-boxes and BRE-sites could be identified and the functional genes within the operons showed overlapping start and stop codons. Furthermore, the polycistronic transcripts of both operons could be identified by RT-PCR analyses. The polycistronic transcripts of casa1 and casa2 are clearly leaderless, whereas for all genes within the operons consensus SD motifs could be identified. Abiotic stress parameters, such as UV-light or high ionic strength, modulate the transcription of cas genes. The cas3 gene showed a more than threefold increased and the cas4 gene a twofold increased transcript level in cells treated 2 min with UV-light in comparison to the control cells. Furthermore, cas3 mRNA levels of cells treated with 100 mM NaCl were more than tenfold increased in comparison to the control. The protein Csa3 can be considered as a good candidate for a transcription regulator, whose coding gene is located in between the two cas operons. This protein is structurally characterised by a typical HTH-motif and possesses high affinity to DNA. However, the conditions for a required specific binding (e.g. addition of trehalose, metal ions or cAMP) to the promoter regions of casa1 and casa2 has still to be found. The functional characterisation of single Cas proteins encoded by the operons casa1 and casa2 was complicated by the formation of inclusion bodies of the recombinant proteins in E. coli. For casa1, reconstitution experiments revealed that the three proteins encoded by the operon strongly interacted in the refolding process. The resulting tripartite CasA1 protein complex showed ribonuclease activity with ssRNA CRISPR transcripts as a target. Also four of the six proteins encoded by the operon casa2 could be expressed only in an insoluble form. For the other two proteins (Csa5 and Csa2) “normal” purification protocols could be established. Also in that case, the reconstitution of the multipartite complex (“CasA2”) is favoured in the presence of all proteins encoded by the cas gene operon. Remarkably, addition of T. tenax RNA supports the reconstitution of the CasA2 complex significantly. The RNA binding capacity could also be documented in EMSA studies. The studies of the CRISPR/Cas system of T. tenax clearly showed that the processing of CRISPR transcripts is an enzymatically catalysed reaction. A relevant component in T. tenax is certainly the CasA1 complex due to its ribonuclease activity. The CasA2 complex may be a candidate for accomplishing interference reactions, due to its strong interaction with RNA, detected in the reconstituion assays and supported by the presence of defined RNA and DNA binding domains of components. The present data confirm the common hypothesis that the CRISPR/Cas system of prokaryotic cells represent a defense system against mobile genetic elements, considering the observation that some spacer sequences show clear sequence similarity to known viruses. But there are also hints to additional functions: i) Regulatory functions may also be possible, since spacers of the T. tenax CRISPR are not only similar to foreign genetic sequences, but also to the organism’s own chromosomomal sequences. ii) Moreover, functions of the CRISPR/Cas systems for stabilising the genome structure or nucleic acids are also likely, as the number of CRISPR loci and cas genes correlates with an increased optimal growth temperature and cas genes are modulated by environmental factors.