Sequence comparison and environmental adaptation of a bacterial endonuclease

Share Embed


Descripción

Computational Biology and Chemistry 31 (2007) 163–172

Sequence comparison and environmental adaptation of a bacterial endonuclease Bjørn Altermark a , Steinar Thorvaldsen b , Elin Moe a , Arne O. Smal˚as a , Nils P. Willassen a,c,∗ a Norwegian Structural Biology Centre, Faculty of Science, University of Tromsø, N-9037 Tromsø, Norway Department of Mathematics and Statistics, Faculty of Science, University of Tromsø, N-9037 Tromsø, Norway c Department of Molecular Biotechnology, Faculty of Medicine, University of Tromsø, N-9037 Tromsø, Norway

b

Received 13 March 2007; accepted 18 March 2007

Abstract The periplasmic/extracellular bacterial enzyme endonuclease I was chosen as a model system to identify features that might be responsible for temperature- and salt adaptation. A statistical study of amino acid sequence properties belonging to endonuclease I enzymes from three mesophilic habitats (non-marine, brackish water and marine), and three marine temperature groups (psychrophile, intermediate and mesophile) has been conducted. Ten new endonuclease I genes have been sequenced in order to increase the sample size. A bioinformatical method of property dependent statistical analysis of alignments has been applied. To our knowledge this is the first time these methods have been used in order to investigate environmental adaptation of enzymes. Adaptation to low temperature seems to involve increased surface isoelectric point and hydrophobicity in contrast to salt adaptation in which the isoelectric point and hydrophobicity at the surface decreases. Redistribution of charge and hydrophobicity might be the most important signature for cold adaptation and salt adaptation of this enzyme class. The results indicate that general trends of adaptation are possible to elucidate from the amino acid sequences. Also in this paper a new scale of stratified B-factors, derived from the Protein Data Bank, is presented. © 2007 Elsevier Ltd. All rights reserved. Keywords: Cold adaptation; Halophilic enzymes; Sequence analysis; Endonuclease I; Amino acid property

1. Introduction Prokaryotes are found almost everywhere on earth. The hottest place where actively growing cells have been found is around the hydrothermal vents on the ocean floor. Here prokaryotes can proliferate at temperatures up to 121 ◦ C (Kashefi and Lovley, 2003). Cell growth has also been detected in Arctic sea ice at temperatures down to −20 ◦ C (Junge et al., 2004). These two extremes marks the current temperature borders for what dividing cells can endure. To survive and thrive at extreme temperatures, the organisms have to face different challenges. The most important task is to maintain adequate enzymatic activity. At high temperatures the chemical reaction rates are very fast, and the cells main challenge is to adapt their enzymes, mem∗ Corresponding author at: Department of Molecular Biotechnology, Faculty of Medicine, University of Tromsø, N-9037 Tromsø, Norway. Tel.: +47 77644651; fax: +47 77645350. E-mail address: [email protected] (N.P. Willassen).

1476-9271/$ – see front matter © 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2007.03.003

branes and molecules so that they can cope with the heat. At low temperatures chemical reaction rates slow down and hence in order to maintain adequate metabolic fluxes at low temperature, the selective pressure favors enzymes which are more efficient than high-temperature counterparts. Many enzymes from cold sources are shown to be more thermo labile and have a higher catalytic efficiency when compared to orthologous enzymes from warmer sources (Feller and Gerday, 1997). Marine periplasmic- or extracellular enzymes will experience the full salinity of the environment as opposed to the intracellular enzymes. Many studies regarding cold adaptation has been conducted on secreted marine enzymes, however possible cold adaptation features may in some cases be mixed up with salt adaptation as indicated by Smal˚as et al. (2000) and Siddiqui and Cavicchioli (2006). To study the mechanisms involved in both temperature and salt adaptation on a molecular level, the enzyme endonuclease I has been chosen as a model system. The crystal structures of Vibrio vulnificus (Li et al., 2003) and Vibrio cholerae (Altermark

164

B. Altermark et al. / Computational Biology and Chemistry 31 (2007) 163–172

et al., 2006) endonuclease I has been published and the two enzymes are also well characterized. In addition, the endonuclease I from the psychrophilic bacterium Vibrio salmonicida has also been extensively studied (Altermark et al., 2007). Endonuclease I is a ∼25 kDa periplasmic or extracellular, monomeric enzyme known to cleave both RNA and DNA in a sequence independent manner, at internal sites. The cleavage takes place at the 3 side of the phosphodiester bond. The structure has a mixed alpha/beta topology containing nine beta strands, five short helices and five long ones. A magnesium ion is located in the active site (Li et al., 2003; Altermark et al., 2006). In addition, a buried chloride is present in the structure of V. cholerae endonuclease I and most probably also in the V. vulnificus structure (Altermark et al., 2006). The endonuclease I from V. salmonicida is shown to be both cold adapted and salt adapted when compared with its mesophilic brackish water orthologue enzyme from V. cholerae (Altermark et al., 2007). 1.1. Cold adaptation Several articles, which aim to find common structural determinants for cold adaptation, have been published (Russell, 2000; Smal˚as et al., 2000; Feller and Gerday, 2003). Cold adapted enzymes have been reported to have some general features such as; a reduction in the number of Arg, Pro and Glu and an increase in the number of Asn, Gln, Ser and Met. A low Ala/Leu ratio, and lower fraction of larger aliphatic residues expressed by the (Ile + Leu)/(Ile + Leu + Val) ratio. A lowered Arg/(Arg + Lys) ratio and a reduction in the hydrophobicity of the enzyme indicated by a reduced core packing and an increase in negative charge which favors interaction with the solvent. More polar and less hydrophobic residues, fewer hydrogen bonds, aromatic interactions and ion pairs, and additional surface loops with more polar residues, and surface loops with lower Pro content. Cold-active enzymes are characterized by an increased catalytic efficiency at low temperatures, and it is suggested that this increased efficiency is caused by a more flexible structure at low temperature (Hochachka and Somero, 1984) which in turn is responsible for the lower thermal stability. Increased molecular flexibility would facilitate conformational changes and substrate turnover, hence preventing the enzymes from becoming too rigid at low temperatures and consequently unable to function. 1.2. Salt adaptation Adaptation to salt is mostly studied for enzymes from extremely halophilic Archaea but one exception is the periplasmic enzymes from the moderate halophile bacterium Chromohalobacter salexigens (Oren et al., 2005). The periplasmic binding components of its ABC transport systems showed a decrease in isoelectric point brought about by an enrichment of both Asp and Glu and a decrease in Lys. The accumulation of acidic amino acids on the surface is the most prominent feature found among enzymes adapted to high salinity (Madern et al., 2000). It is thought that the increase in acidic residues facilitates solubility, and the repulsive forces between the negative charges provide flexibility for protein function at high salt concentration

(Mevarech et al., 2000). An increase in Ser and Thr residues has also been reported for halophilic enzymes (Lanyi, 1974). By using a bioinformatics approach, the general trends in both temperature- and NaCl adaptation of the endonuclease I enzyme class are analyzed. 2. Materials and methods 2.1. Sequence data and alignments Genome sequences from organisms belonging to the gamma subdivision of proteobacteria were investigated, and it was found that the DNA region around the gene for endonuclease I (endA) was very well conserved. Degenerate primers (Table 1) in neighbouring genes to endA were created and a 3–5 kbp region from various bacterial species was amplified by PCR using genomic DNA as a template. The PCR products were purified from gel with the QiaexII gel purification kit from Qiagen (Germany) following the manufactures protocol. All DNA fragments were sequenced, by primer walking, using the PE Biosystems BigDye Terminator Cycle Sequencing kit, ABI 377 Genetic Analyzer and ABI Sequence Analysis software according to the protocol supplied by Applied Biosystems (USA). The genomic DNA used as a template in the PCR was purified from overnight cultures using the Wizard kit from Promega (USA) following the manufactures protocol for Gram negative bacteria. The marine bacteria were grown in marine broth at their respective temperature optima (see Table 2 in Section 3 for specie names and temperature optima). V. cholerae were grown in nutrient broth. The endA sequences from all of the Vibrio species, except V. cholerae, was obtained by conducting an initial PCR with Table 1 List of degenerate and specific primers #

Degenerate primer

Sequence

dp1 dp2 dp3 dp4 dp5 dp6 dp7 dp8 dp9 dp10

AdoMet F AdoMet R Gen metK F Div vibrio metK F VC0469 R1 VC0469 R2 VC0469 R3 FumA F Nuc F Nuc R

CCrAAAGCrCGCGTTGCyTG CAGTACCrAAAGTTTCCACCATGA CGTCACGGTGGyGGwGCwTTCTC GGTAAAGATCCwTCAAAAGTTGA GrTAAATkCGrGGGATrCGCAT GATyGCrATTTTyTGCCACTG GATTTyTGrATrGTrAAyTCCAT ACTTCrATTTTCCAGATsGCTTCCAT TGGCAAGGCAArAAAGGCAT TTTGAAGTTkACyTGCATTTCACA

#

Specific primer

Sequence

spl sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10 sp11 sp12

V. salmonicida R V. diazotrophicus R V. corallilyticus R V. diabolicus R V. scophthalmi R V. lentus R V. wodanus R V. sp strain 26 R1 V. sp strain 26 R2 V. sp strain 26 R3 V. sp strain 26 R4 V. sp strain 26 R5

GAATTGCTTGTCGTTTCTAGTGC GATAGGTTCTGATAGTTATCG CATATCGCTACAATTCATAAGG TGAAACGCTTCAAGGTAGAGG GGGTGATAAATTCGTGGGATTC CTCTTGTTTGAGATGGAGGC GCTTTCTAAGTCAGGAATGCC GTGAAATCACTTGGCCTAAATGT AATAGGCGAGAGATCTTATCCT GAGTGTTCAAAAGCATTTTTAGT CTTATGGGTATATCACTAACT CAGAAATTTATGGATGTAAACC

B. Altermark et al. / Computational Biology and Chemistry 31 (2007) 163–172

the degenerate primers (dp1, dp2), yielding a fragment of the upstream gene, S-adenosylmethionine synthetase (metK). Conditions used for PCR were: (A) 2 min initial denaturing at 94 ◦ C, followed by 30 cycles of 30 s denaturing at 94 ◦ C, 30 s annealing at 50 ◦ C and 2 min extension at 72 ◦ C. Reaction was terminated by a 5 min extension step at 72 ◦ C. The total reaction volume of 50 ␮l contained 1 unit of Tac polymerase, 5 ␮l of 10× polymerase buffer, 1 ␮l of 10 mM dNTP, 3 ␮l of 25 mM MgCl2 , 1 ␮l each of forward and reverse primer at a concentration of 10 mM, and 1 ␮l genomic DNA template. These fragments were sequenced using the same degenerate primers. A new set of less degenerate forward primers was created from the partial metK sequence. These forward primers (dp3, dp4) was used in a second PCR in combination with degenerate reverse primers (dp5–dp7) constructed in the downstream gene coding for a putative RNA methyl transferase. Conditions used for PCR were: (B) 30 s initial denaturing at 98 ◦ C, followed by 30 cycles of 10 s denaturing at 98 ◦ C, 30 s annealing at 50 ◦ C and 1.5 min extension at 72 ◦ C. Reaction was terminated by a 7 min extension step at 72 ◦ C and cooled to 4 ◦ C. The total reaction volume of 50 ␮l contained 1 unit of phusion polymerase, 10 ␮l of 5× phusion polymerase buffer, 1 ␮l of 10 mM dNTP, 1 ␮l each of forward and reverse primer at a concentration of 10 ␮M, and 1 ␮l genomic DNA as template. The resulting PCR products were sequenced with the reverse degenerate primers (dp5–dp7). New specific reverse primers (sp1–sp12, Table 1) were constructed thereafter, to sequence the rest of the nuclease genes by primer walking. The endA gene from the Serratia sp. was obtained in a similar way. Here the upstream degenerate primer (dp8) was constructed in the fumA gene, and the downstream primer (dp5) was, as for the Vibrio strains, in the gene for a putative RNA methyl transferase. PCR was conducted as in PCR condition B above. The V. cholerae ATCC 14035 endA gene was obtained by first performing a PCR with the degenerate primers (dp9, dp6) and (dp1, dp10) using PCR condition B. Primer dp9 and dp10 is constructed to bind in the endA gene itself. The PCR products were afterwards sequenced. In addition to the sequences above, orthologues of endonuclease I protein sequences were gathered from various genome sequencing project world-wide. All the sequences were imported into ClustalX (Thompson et al., 1997) and aligned. Subsequently the alignment was corrected manually using the crystal structure of V. vulnificus endonuclease I as guide. The optimum growth temperature, Topt , of each organism was found by studying the literature, and by searching The Prokaryotic Growth Temperature Database, PGTdb (Huang et al., 2004). The marine or non-marine habitats of the bacteria were also noted. Sequences suspected to have been horizontally transferred (xenologues), originate on plasmids or transposons, were discarded from further analysis. Sequences from organisms with extraordinary low GC% were also discarded from the analysis due to the fact that low GC% also affect the codon usage and hence the amino acid preferences (Lobry, 1997). Sequences with an identity above 97% to another member in the alignment and sequences that have lower than 35% identity to the V. vulnificus endonuclease are also not taken into account in the dataset. Two structural based alignments of remaining

165

sequences were created using the crystal structure of V. vulnificus nuclease (Vvn) as guide. To investigate temperature adaptation only the marine sequences were aligned and, in the analysis, divided into three populations defined by the organisms temperature optimum for growth: mesophilic (Topt ≥ 30 ◦ C, 5 sequences), intermediate (30 ◦ C > Topt > 20 ◦ C, 10 sequences), and psycrophilic (Topt ≤ 20 ◦ C, 8 sequences). In order to investigate salt adaptation all the mesophilic sequences were aligned and, in the analysis, divided into three populations defined by the salinity of the organism’s habitats: non-marine (12 sequences), brackish water (5 sequences) and marine (5 sequences). The five mesophilic marine sequences are the same in both alignments. For the sake of a more specific analysis, the amino acid data was decomposed into structural elements in three different ways: secondary structure elements (helix, ␤-strand, and coil), water accessibility (core, twilight zone, and surface) and proximity to the binding DNA. For this purpose the secondary structure was downloaded from the DSSP server, and solvent accessible surface area (ASA) was calculated using the program GETAREA (Fraczkiewicz and Braun, 1998) with the crystal structure of Vvn (PDB id: 1OUO) as template with default settings. The spatial location was attached according to solvent accessible surface area of the side-chain, where core is defined as 0–9% exposed side-chain, twilight zone 9–36%, and surface 36–100%. With these cut-offs the number of amino acids is distributed approximately equally between the three bins. Proximity to substrate was attached by computing the distance between each residue and the binding DNA (PDB id: 1OUP), where binding was ˚ and non-binding otherwise. This defined as distance ≤10 A, method makes it possible to analyze the data relative to both its secondary structure and some of its three dimensional structure. Amino acid composition differences were analyzed using a data set where only the N-terminal periplasmic signal has been removed according to the prediction from the SignalP web server (Bendtsen et al., 2004). 2.2. Comparing sequence properties by statistical methods Amino acid has many different physicochemical, steric and other numerical properties ranging from molecular mass to helix formation parameters (Gromiha et al., 1999). The properties are assumed to be linear additive in the protein structure so that values may be calculated in a direct manner. To examine how different physicochemical properties may contribute to salt- and cold adaptation, we analyzed 16 different properties. This comprehensive set of properties cover many of the functional and structural aspects of proteins. The propensities related to secondary structure are alpha helical propensity, beta sheet propensity, and coil propensity (Chou and Fasman, 1978). The amino acids were divided in three main groups: charged, polar and hydrophobic. For the property charge Asp, Glu, Arg, and Lys were assigned a charge magnitude of 1, His was assigned a charge of 0.3, and all other amino acids were assigned a charge of 0. The amino acids Ala, Val, Phe, Pro, Met, Ile and Leu were classified as hydrophobic, with Trp assigned a hydrophobic magnitude of 0.3. The

166

B. Altermark et al. / Computational Biology and Chemistry 31 (2007) 163–172

rest of the amino acids were classified as polar, with magnitude 0.7 for His and Trp. Other properties that were investigated are: isoelectric point and polarity (Zimmerman et al., 1968), hydrophobicity (Ponnuswamy, 1993), heat capacity (Hutchens, 1970), molecular weight (Fasman, 1976), denaturated accessible surface area (Oobatake and Ooi, 1993), bulkiness (Zimmerman et al., 1968), shape as position of branch point in side-chain (van Gunsteren and Mark, 1992), compressibility (Iqbal and Verrall, 1988) and flexibility. The bulkiness of an amino acid is defined as the ratio of the side-chain volume to length, which provides a measure of average cross-section of the amino acid, thus having a relevance to packing considerations. The flexibility of the amino acids is commonly measured by the temperature factors (B-factors) of the C␣ atoms found in X-ray structures in the Protein Data Bank. The amino acids are very unevenly distributed within a protein structure, and B-factors of residues on the surface and core will differ depending on the cut-off determining whether a residue is exposed or not. To be able to use B-factors in a comparative structural analysis, the mean values have to be determined in each separate strata of the molecule. We are not aware of any such index existing in the literature, and a new calculation of stratified B-factors was made on the basis of a non-redundant set of PDB structures with resolutions below ˚ by the method described in Schlessinger and Rost (2005) 2.5 A, with these cut-off values: core (0–9% exposed side-chain), twilight zone (9–36%), and surface (36–100%). We also made the same computation of B-factors with stratification from the secondary structure. Secondary structure stratification is based on the DSSP assignment (Kabsch and Sander, 1983) and three groups were made: helix = {G, I, H}, beta = {B, E}, and coil = {T, S, (blank)}. A protein sequence can, at least as a first approximation, be considered as random sampling from a pool of amino acids with a specified probability distribution. A structural alignment is a set of matched pairs or blocks where there is a meaningful and highly correlated correspondence between the data points in one population and those in the other. Thus the possibility to investigate the physiochemical properties in the sequences by statistical methods is plausible. A non-parametric cumulative Mann–Kendall trend test was carried out for systematic comparison of the various properties traits, as described in (Thorvaldsen et al., 2006). The conserved sequence sites were excluded from the test to ensure that the significant differences found between the populations (or not found) are due to the different conditions of the populations, and not to the organization and conservation of the particular enzyme in the study. By this comparative statistical method it is possible to test the significance of property alterations, e.g. an increased surface hydrophobicity in cold adapted sequence populations. This gives a reliable comparison and promotes detection of changes. To present the differences between the sequence populations graphically a smoothing technique was developed, in order to recover and visualize underlying structure in the data set. This smoothing of the data uses a box filter window where the vertical filter size is all the amino acids in the aligned sequence position of the population, and the horizontal window size can be varied. The data were also visualized as customary box-plots

to represent variance within and between the three populations defined above. Ordinary regression analysis was also applied in particular cases. By these methods, the data material was first reduced to the mean value for each sequence, and later on used as response variables in a linear regression model. The substitution patterns between the populations were also calculated to study the replacement of residues. The method presented by Jones et al. (1992) was followed as a safeguard against obtaining bias in the comparison. The most similar pairs of sequences from each population (e.g. mesophile/psychrophile) were aligned, and observed amino acid exchanges tallied in a matrix. In this case the intermediate temperature and brackish water populations were not included in the analyses. All analyses reported in this work were implemented in Matlab (MathWorks-Inc., 1994/2005), and the toolbox DeltaProt which can be downloaded at http://www.math.uit.no/bi/ deltaprot/. 3. Results 3.1. Distribution of endonuclease I gene family A total of 40 different sequences were obtained by extensive data search and in-house sequencing as shown in Table 2. Many of the initially identified sequences were discarded due to extraordinary low GC% or because of longer C-terminus and suspected horizontal transfer. Sequences with an identity above 97% to another member in the alignments or sequences that have lower than 35% identity to the structural reference (V. vulnificus endonuclease I) were also discarded. The endonuclease I sequence from V. sp strain 26 (acc. no. ABB72358) was 98% identical to the V. lentus DSM 13757 orthologue and is therefore not included in our analysis. The phylogenetic distribution of the genes encoding endonuclease I are up to date limited to the proteobacterial- and fibrobacter family. It is most abundant among members of the gamma subdivision of proteobacteria. In the beta subdivision, the genera Bordetella and Chromobacterium harbors the endonuclease I gene. In the epsilon subdivision, Campylobacter jejuni RM1221 contains an orthologue of the gene. Desulfotalea psychrophila and Desulfobacterium autotrophicum, in the delta subdivision, also have a copy of the gene, likewise the Magnetococcus MC-1, which is a probable member of the alpha subdivision of proteobacteria. In addition the ruminal bacterium Fibrobacter succinogenes harbors the gene. Seven species were found to have more than one version of the gene. These are (with copy number in parentheses): Proteus mirabilis (2), Vibrio cholerae (2), Photobacterium profundum (2), Azotobacter vinelandii (2), Magnetococcus MC1 (3), Pseudomonas putida (2). 3.2. Stratified temperature factor index A new index of stratified B-factors was calculated based on ˚ The data 1699 unique X-ray structures with resolution ≤2.5 A. set included about 380,000 amino acid residues, and the final results are shown in Table 3.

B. Altermark et al. / Computational Biology and Chemistry 31 (2007) 163–172

167

Table 2 Specie name and strain, sequence accession numbers, temperature optimum (Topt ) and habitat of the studied bacteria and the corresponding nuclease sequences Specie name and strain

Accession number

Topt (◦ C)

Habitata

V. diabolicus HE800 CNCM I-1629 V. salmonicida LFI1238 V. wodanis NCIMB 13584 V. lentus DSM 13757 V. diazotrophicus DSM 2604 V. coralliilyticus LMG 20984 V. scophthalmi LMG 19158 V. cholerae ATCC14035 V. fischeri ES114 V. vulnificusYJ016 V. parahaemolyticus RIMD 2210633 V. alginolyticus Pseu. tunicata D2 Pseu. haloplanctis TAC125 Sh. denitrificans OS217 Sh. amazonensis SB2B Sh. baltica OS1155 Sh. putrefaciens CN-32 Sh. oneidensis MR-1 Sh. sp PV-4 Sh. frigidimarina NCIMB400 Sh. violacea DSS12 JCM10179 Sh. sp ANA-3 Oceanospirillum sp. MED92 Se. sp Strain96 Ci. rodentium E. coli K12 K. pneumoniae subsp. pneumoniae MGH 7857 Shi. flexneri 2a str. 2457T Sa. enterica subsp. enterica serovar Typhi Ty2 Sa. bongori 12419 Bo. pertussis Tohama I Bo. parapertussis 12822 Ch. violaceum ATCC 12472 Az. vinelandii AvOP Az. vinelandii AvOP Ps. aeruginosa PAO1 Ph. profundum SS9 Ph. sp SKA34 De. psychrophila LSv54

ABB72357 ABB72352 ABB72353 ABB72359 ABB72354 ABB72355 ABB72356 ABB72360 YP 203820 NP 935659 NP 798988 ZP 01262593 ZP 01136063 YP 341357 YP 563974 ZP 00585704 ZP 00584314 ZP 00814204 NP 716464 ZP 00836619 ZP 00637755 (A) ZP 00849897 ZP 01165898 ABB72361 (B) NP 417420 (C) NP 838432 NP 806697 (B) NP 881193 NP 884837 NP 901658 ZP 00416711 ZP 00418108 NP 251439 CAG21451 EAR55562 YP 064027

37 15 15 26 25 28 25 37 28 37 37 >30 28 20 23 35 23 30 30 18 21 8 30 >30 28 37 37 37 37 37 37 37 37 33 37 37 37 18
Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.