Phylogenetic Analysis of the SecA gene in Phytoplasmas
Dissertation submitted to The University of Nottingham in partial fulfilment of the requirements for the degree of Bachelor of Science with Hons Biology
figures available on request
Phytoplasmas are small bacteria lacking a cell wall, they cannot be cultured in vitro, and have been identified as the causative agents of a number of plant diseases resulting in a wide range of symptoms. These pathogens have a large number of plant hosts covering much of the tropics, and include some economically important crop plants. This investigation aims to further research into the phylogenetic position of this group of mollicutes and assess the current methods of classification as well as potentially lead on to the development of more accurate and practical methods for the detection and diagnosis of phytoplasma infections in the field. Using nested PCR techniques the SecA gene was amplified from phytoplasma DNA samples and cloned into the pGEM-T Easy vector system using E. coli. The SecA gene was sequenced and a phylogenetic analysis conducted using DNASTAR. Eleven phytoplasma SecA gene initial sequences are published, 9 of which are from the group I phytoplasmas. Analysis of the phylogenetic relationships between these sequences concurs with the sub-groupings in the existing classification system. This is the first time a phylogenetic analysis of the phytoplasmas classification has been based on a non-ribosomal gene sequence. Results confirm the classification system currently based on 16S rRNA sequence data. The low guanine and cytosine levels of the phytoplasma genome were confirmed. Additional primers for the SecA gene were designed giving the potential to sequence the SecA gene in many more phytoplasmas in all the groups.
Figure 3.17 illustrates the phylogenetic relationships between the SecA gene sequences of the phytoplasmas in Table 3.2. The tree includes some published sequences to compare how the sequences are arranged in groups using the SecA gene sequences so that this can be compared with the groups and subgroups in the classification based on the 16S rRNA gene. The Lactobacillus sequence acts as an outlier. PWB and BVK (shown in grey in Table 3.2) were not included as these were not good sequences. AYA, AYC and DIV sequences were also omitted from this phylogenetic tree as they presented problems when aligning the sequences together. The ‘onion publish’ sequence in the tree is a group I-B phytoplasma. The ‘aster publish’ sequence is a group I-A phytoplasma. The top four sequences (red) have all been grouped closely together in the tree and these sequences are all from group I, subgroup B phytoplasmas. RIV and KVE (blue) are both group I, sub group C phytoplasmas and these sequences have been grouped closely together. CHRYM and the published aster sequence (green) have also been grouped together which is also in accordance with their grouping as I-A phytoplasmas based on the 16S rRNA gene classification. However LWB (group II) and STOL-IT (group XII) have been grouped together in this tree but this is not in accordance with the classification using the 16SrRNA gene.
Figure 3.17; Phylogenetic tree showing the relationships between the SecA gene sequences obtained during this project (Appendix B, Table 0.3)
In colour are the group I phytoplasmas, in red are sub group B, in blue, C, and in green, A.
Between Group Analysis
The primers used have enabled the sequencing of many group I phytoplasmas and only two good sequences from other groups (Table 3.2) so a meaningful comparison of the phylogenetic analysis based on the SecA gene sequence with the existing phytoplasma classification system based on 16S rRNA genes was not possible. It can be seen from the phylogenetic tree in Figure 3.17 that all the group I phytoplasmas are more closely related than the other phytoplasma SecA sequences here. In this analysis LWB and STOL IT have been stuck together but since they are from groups II and XII respectively this is unexpected. It is possible that LWB should be classified as a group XII phytoplasma. Alternatively this could have arisen from errors in the sequences as discussed further in paragraph 4.1.2.
Within Group Analysis
It was possible to sequence the SecA gene from 9 phytoplasma samples in group I which enabled a within group analysis of the phylogeny and classification of phytoplasmas, visualised in Figure 3.17. This phylogenetic tree groups the phytoplasma SecA gene from the group I phytoplasmas according to their sub groups as defined by the 16S rRNA gene classification (Lee et al., 2000). There were some problems encountered when attempting to align the sequences for phylogenetic analysis (Appendix B, Table 0.3). Some difficulties in alignments may have been due to miss reads in the sequencing process. Figure 4.1 shows how discerning the correct sequence can be difficult and mistakes can be introduced. For example between bases 590 and 600 it is hard to determine exactly how many bases of thymine there are, and later it is difficult to say whether base 609 should be guanine or adenosine. The sequences displayed in Appendix A are the crude data from the initial sequencing. In order to improve the accuracy of sequences the entire gene must be sequenced in both directions and then compared together. The sequences for AYA, AYC and DIV were relatively unclear and were therefore excluded from the phylogenetic analysis.
Figure 4.1; Chromas 2.31 visualisation of part of the STOL IT sequence which is difficult to discern
Comparison of Phylogenetic Trees
Figure 3.17 shows that the SecA gene sequences separate nicely into the three subgroups (A, B and C) of the group I phytoplasmas. This grouping concurs with the classification of the phytoplasmas based on the 16S rRNA gene (Seemüller et al., 1998). This is the first time a phylogenetic analysis of the phytoplasmas classification has been based on a non-ribosomal gene sequence and these results confirm the use of the 16S rRNA gene as suitable for phylogenetic classification. These findings also confirm the high degree of relatedness between the aster yellows group of phytoplasmas (group I) and may add to the debate over whether there is scope for further divisions in the classification system for these phytoplasmas (Firrao et al., 2005).
Low Guanine and Cytosine Levels
Analysis of the sequences obtained in this study (in full in Appendix A) confirms that the G+C content of the phytoplasma genome is extremely low (Christensen et al., 2005). The data gathered from the sequences obtained in this study (Table 3.2) shows that the average G+C content of the SecA gene is 35.1% (excluding the PWB and BVK sequences which were questionable, Table 3.2). This is probably the lowest level of G+C in any known functioning genome and could be the lowest threshold for viability. It is interesting to note that the Aster Yellows phytoplasmas (group I-B) tend to have around 5% higher levels of G+C in their genome. This may be linked to the fact that there are many phytoplasmas in this group with very closely related genomes. Perhaps this higher amount of G+C allows for the more rapid evolution of successful variants through random mutations. The increasingly global movement and ever faster transport of plant species around the world and the fact that phytoplasmas are hosted by both plants and insects puts ever changing selection pressures on phytoplasmas in geographically isolated areas (Lee et al., 2000). These factors may account for the numbers of similar phytoplasmas calling for the sub-grouping of the classification system of phytoplasmas.
PCR and Primers
Polymerase Chain Reaction
Some problems encountered in getting the primers to amplify the desired fragment of DNA could be overcome given more time to adjust and perfect the PCR conditions. This may account for the sequencing of a plant heat shock protein rather than the PWB SecA gene as desired (Table 2).
The new primers designed during this project showed promise but there was not sufficient time to get positive results. It is recommended that further combinations of primers are used in nested PCR reactions to try to optimise the results (3.2.4). Judging by the initial success of the results here (Table 3.2) using the first primers based on only three phytoplasmas sequences (Appendix A, Table 0.1) it should be possible to sequence the SecA gene of phytoplasmas from all the groups.
Conclusions and Future Prospects
These data confirm the validity of the classification system currently based on the 16S rRNA gene. The phylogenetic sub groupings resulting from analysis of the relatively less well conserved SecA gene sequences obtained in this investigation are in agreement with the sub groupings resulting from analysis of the 16S rRNA gene. Results will be more conclusive when SecA gene sequences are obtained from all the different Candidatus phytoplasma groups. Further work with the new primers designed in this study (Section 3.2) will play a key role in this process. Additional sequencing of the SecA gene in both directions will provide more precise sequences and will allow a more accurate phylogenetic analysis of the SecA gene sequences of the phytoplasmas. Further understanding of the SecA gene in phytoplasmas may well lead to the development of more efficient and accurate diagnostic techniques for phytoplasma infections of plants.